OCR Part I...
- Posted by Norm Goundry <bonk1000 at HOTMAIL.COM> May 16, 1999
- 598 views
According to Information Theory (very loosely defined herewithin), at a point where Objects ('Things') begin to lose their distinctness-- their individual definement-- this this the threshold beneath whcihc these and all other Things cannot be identified. This activity can also be called Noise. To the ancient Greeks* this lower portion was known as Limbo; to the early Christians**, this was called Purgatory. We will refer to it as Noise. Are we still talking about OCR here. Yes, we most certainly are. The point is that Information Theory covers many large areas of what we use computers for: Encryption, and its counterpart Decryption; Modulation -- Demodulation [what a modem does]; Signal-To-Noise Ratio [what makes telephones and satellite communications, Video and CD players work]. All of these things are totally dependent upon two things-- Signal-To-Noise and Identity. I will haphazardly define Indentity here as not just something which is 'TRUE' (in the sense that AND, OR, NOT, NOR and other refinements of Boolean Logic are True or False) qualifies Things as such, but also how this rigid, binary set of choices transforms into our own analog world of shades of Truth-- what we call Discernment. And Discernment will get into the territory of OCR. [Note: those of you out there who still have the mistaken idea that the study of Philosophy and Language are useless-- throw that outdated concept out and rearrange your thinking processes. Much as in the old dictum, 'Software Runs Hardware', and not the other way round as we usually believe, 'Thinking Runs Engineering'; one cannot create Sense without Thought. Already, those of you who disagree with conceptualization of mine and are shaking your heads, the actual fact of the matter is that you have to use something more than just Cold Logic to do so.] What does all of this have to do with Optical Character Recognition (OCR)? Plenty, but not everything. OCR is a child of its parent: Pattern Recognition. Much as we want to persist in thinking about our computers as having mental capabilities... they don't. Well, you reply, animals can recognize patterns, don't they? Yes, that is true-- but only like us, only when it interests them. Only when it is important to them. Computers, on the other hand, don't 'care' about anything. So there is still a gigantic gap between us and them. But computers can count (actually, they don't 'count'; they 'compute', which is a different thing), and this is very important. How does this great canyon of digital darkness get bridged? Indeed, at a completely logical level there are no such things as 'Patterns', and so also there can be no such thing as 'Recognition'-- because one pattern is as good (its importance) as any and all of the other possible. Hopeless? Never hopeless unless helpless. To illustrate the answere to all of this I beg your indulgence in my using a parable or story to continue with an explanation: There is a group of helpers called Recognition. They stand on a long assembly line of strings of data. The first of these tireless workers, much in the spirit of Sherlock Holmes, picks over all of the items which it has been ORDERED (we cannot talk about 'Trained' yet), to reject items as being 'Impossible' and cull them from the line. Good! Further down the line, stand several other workers, who have been Ordered to divide this flowing stream of what are now 'Possible' items into two branches... 'Probable' and 'Don't Know'. This is a very active task, and that is why there are the several workers standing there at the bifurcation ('split in two') point, and applying their specialized talents of Sorting to this part of the job. And to make certain that all of these workers are doing their jobs correctly, standing right behind them are quality Control Officers who check the first worker's rejects as truly being such, and nothing more, and also check the second group of workers's actions to make certain that they are correct. It is this 'Partitioning' of the Probables and Don't Knows that needs prick up our attention here (and also all of you who have so been able to put up with my rambling-- to paraphrase Chaucer, this is The Code-Mangler's Tale; thank you!), and onto Part II, wherein, Things Get a Bit More Attention. Norm Goundry *(perhaps read the great Greek Classic, 'The Nature of Things') **(as defined in the New Testament)