OpenEuphoria: Forum: OCR Part II

OCR Part II

new topic » topic index » view thread » older message » newer message
Posted by Norm Goundry <bonk1000 at HOTMAIL.COM> May 17, 1999
499 views
Ah Yes!  Not only do things now get a bit more attention, attention gets
more bits.  For it is bits and their interpretation that is the crux of the
matter here.

On the assembly line are passing a constant stream of 32X32 bit squares.
The first worker (the one who only has to act according to the simplest
directive of 'Reject the Impossible') has a fairly straightforward job; all
of the squares that are White or are Black are pulled from the line.  This
work has also been told to remove any units that have less then 3 black
squares out of the total of 1024 available.  Again, this is a simple and
direct operation, and because these are few and far between, the worker has
little to do as the units come down the assembly line, because they have
been carefully chosen in the first place.  In another factory this may not
be the same state of affairs, but in this one it is.

Further down the line are the Main Dividers, Probable, and next to it,
Don't Know.  Probable does actual checking for Likeness.  This is down on
the main assembly line, but at the same postition is a branch curve which
loops away from the main line and then rejoins it a bit later.  On this
portion operates the Don't Know worker.

Of all the workers in the factory, Probable is not only the most highly
trained (in its past), but also the most updated (in the present) technical
worker.  Here are some of the ground rule training it has learned to
understand about content.

How many states can it determine (observe)?  It has to know that not only
can the following be observed: Black-Next-to-Black (BBB; one state), White-
Next-to-Black (WBB, BBW, WBW; three states) and its reverse (WWW; and
BWW,WWB,BWB).  Besides these, there are further things it has to recognize:
positions of the afore mentioned states in relationship to North-and-South
(Top to Bottom) and East-and-West (Left side and Right side). From these
basic Recognition Components all of the other higher, more complex
operations derive.

And Probable has been given another set of fundamental directives, all of
which gravitate around the simple rule that 'It is most desirable to
achieve Success in the simplest possible way and the shortest possible
time'.  In other words, its takes direct action first.

This is called Testing.  For example, it checks to see if th unit in front
of it contains any BWB first; in the case of the example it is looking down
at (note that it is actually the Capital letter 'L'), it would notice that
this indeed does not contain any BWB or its opposite WBW-- but it would see
that it DOES contain W...B...W.  This immediately cuts out all 'looped'
characters, such as 'O' and 'P', and 'a' and 'b'.  Furthermore, Probable
also now checks to see that, since it does NOT contain any BWB, does its
converse, WBW apply?  As a matter of fact, in this particular example, it
does not.  So this also cuts out 'J', 'W', 'N' and so on.  What it leaves
is a much smaller block of unknowns to choose from, which is good.  It can
be said that Probable has now cut out much of the Noise, which means that
Uncertainty is diminished.

Now, Probable has several actions it can take next, because there are still
more than one posibble item to choose against.  I checks its Priority List
and 'decides' to check North-and-South, East-and-West next.  North-and-
South tells it that, much as is True in the initial appraisal, There are
NOT any BWB or WBW; there IS W...B...W, but NO B...W...B.  This cuts out
'i', but leaves 'l' and 'I' (note that we are supposed to be using the
plain, not-serif characters here), that is, Capital 'i' or Small 'L'.

So the worker is left with a High Certainty, but also a residual amount of
Uncertainty.  Rather than apply a whole pile of intensive rules and methods
(such as Vector Weighing or Neural Netting, etc.), it just Compares the
remaining possible choices against the unit at hand.  By actually counting
the Success/Failure tests of L's W...B...W content (how MANY 'B's are
adjacent to each other per line of 32 lines Top-to-Bottom) it finds out
what it needs to know.  Such a direct method gives an immediate answer: the
unit in front of it is indeed an 'L', and nothing else!  So it sends it
down the main line to the Data Department at its end, where it is properly
placed in the Build String.

Of course, this is a vastly oversimplified situation; in the real world
actual conditions are much more complex than this.  And as that is the
case, I will cover these further considerations in Part III, wherein Dont'
Know finally gets to do its job...
OpenEuphoria

OCR Part II

Search

Include:

Quick Links

User menu

Misc Menu