OCR Part II
- Posted by Norm Goundry <bonk1000 at HOTMAIL.COM> May 17, 1999
- 499 views
Ah Yes! Not only do things now get a bit more attention, attention gets more bits. For it is bits and their interpretation that is the crux of the matter here. On the assembly line are passing a constant stream of 32X32 bit squares. The first worker (the one who only has to act according to the simplest directive of 'Reject the Impossible') has a fairly straightforward job; all of the squares that are White or are Black are pulled from the line. This work has also been told to remove any units that have less then 3 black squares out of the total of 1024 available. Again, this is a simple and direct operation, and because these are few and far between, the worker has little to do as the units come down the assembly line, because they have been carefully chosen in the first place. In another factory this may not be the same state of affairs, but in this one it is. Further down the line are the Main Dividers, Probable, and next to it, Don't Know. Probable does actual checking for Likeness. This is down on the main assembly line, but at the same postition is a branch curve which loops away from the main line and then rejoins it a bit later. On this portion operates the Don't Know worker. Of all the workers in the factory, Probable is not only the most highly trained (in its past), but also the most updated (in the present) technical worker. Here are some of the ground rule training it has learned to understand about content. How many states can it determine (observe)? It has to know that not only can the following be observed: Black-Next-to-Black (BBB; one state), White- Next-to-Black (WBB, BBW, WBW; three states) and its reverse (WWW; and BWW,WWB,BWB). Besides these, there are further things it has to recognize: positions of the afore mentioned states in relationship to North-and-South (Top to Bottom) and East-and-West (Left side and Right side). From these basic Recognition Components all of the other higher, more complex operations derive. And Probable has been given another set of fundamental directives, all of which gravitate around the simple rule that 'It is most desirable to achieve Success in the simplest possible way and the shortest possible time'. In other words, its takes direct action first. This is called Testing. For example, it checks to see if th unit in front of it contains any BWB first; in the case of the example it is looking down at (note that it is actually the Capital letter 'L'), it would notice that this indeed does not contain any BWB or its opposite WBW-- but it would see that it DOES contain W...B...W. This immediately cuts out all 'looped' characters, such as 'O' and 'P', and 'a' and 'b'. Furthermore, Probable also now checks to see that, since it does NOT contain any BWB, does its converse, WBW apply? As a matter of fact, in this particular example, it does not. So this also cuts out 'J', 'W', 'N' and so on. What it leaves is a much smaller block of unknowns to choose from, which is good. It can be said that Probable has now cut out much of the Noise, which means that Uncertainty is diminished. Now, Probable has several actions it can take next, because there are still more than one posibble item to choose against. I checks its Priority List and 'decides' to check North-and-South, East-and-West next. North-and- South tells it that, much as is True in the initial appraisal, There are NOT any BWB or WBW; there IS W...B...W, but NO B...W...B. This cuts out 'i', but leaves 'l' and 'I' (note that we are supposed to be using the plain, not-serif characters here), that is, Capital 'i' or Small 'L'. So the worker is left with a High Certainty, but also a residual amount of Uncertainty. Rather than apply a whole pile of intensive rules and methods (such as Vector Weighing or Neural Netting, etc.), it just Compares the remaining possible choices against the unit at hand. By actually counting the Success/Failure tests of L's W...B...W content (how MANY 'B's are adjacent to each other per line of 32 lines Top-to-Bottom) it finds out what it needs to know. Such a direct method gives an immediate answer: the unit in front of it is indeed an 'L', and nothing else! So it sends it down the main line to the Data Department at its end, where it is properly placed in the Build String. Of course, this is a vastly oversimplified situation; in the real world actual conditions are much more complex than this. And as that is the case, I will cover these further considerations in Part III, wherein Dont' Know finally gets to do its job...