OCR Part I...

new topic     » topic index » view thread      » older message » newer message

According to Information Theory (very loosely defined herewithin), at a
point where Objects ('Things') begin to lose their distinctness-- their
individual definement-- this this the threshold beneath whcihc these and
all other Things cannot be identified.  This activity can also be called
Noise.  To the ancient Greeks* this lower portion was known as Limbo; to
the early Christians**, this was called Purgatory.  We will refer to it as
Noise.

Are we still talking about OCR here.  Yes, we most certainly are.  The
point is that Information Theory covers many large areas of what we use
computers for: Encryption, and its counterpart Decryption; Modulation --
Demodulation [what a modem does]; Signal-To-Noise Ratio [what makes
telephones and satellite communications, Video and CD players work].  All
of these things are totally dependent upon two things-- Signal-To-Noise and
Identity.

I will haphazardly define Indentity here as not just something which is
'TRUE' (in the sense that AND, OR, NOT, NOR and other refinements of
Boolean Logic are True or False) qualifies Things as such, but also how
this rigid, binary set of choices transforms into our own analog world of
shades of Truth-- what we call Discernment.  And Discernment will get into
the territory of OCR.

[Note: those of you out there who still have the mistaken idea that the
study of Philosophy and Language are useless-- throw that outdated concept
out and rearrange your thinking processes.  Much as in the old dictum,
'Software Runs Hardware', and not the other way round as we usually
believe, 'Thinking Runs Engineering'; one cannot create Sense without
Thought.  Already, those of you who disagree with conceptualization of mine
and are shaking your heads, the actual fact of the matter is that you have
to use something more than just Cold Logic to do so.]

What does all of this have to do with Optical Character Recognition (OCR)?
Plenty, but not everything.

OCR is a child of its parent: Pattern Recognition.  Much as we want to
persist in thinking about our computers as having mental capabilities...
they don't.  Well, you reply, animals can recognize patterns, don't they?
Yes, that is true-- but only like us, only when it interests them.  Only
when it is important to them.  Computers, on the other hand, don't 'care'
about anything.  So there is still a gigantic gap between us and them.  But
computers can count (actually, they don't 'count'; they 'compute', which is
a different thing), and this is very important.

How does this great canyon of digital darkness get bridged?  Indeed, at a
completely logical level there are no such things as 'Patterns', and so
also there can be no such thing as 'Recognition'-- because one pattern is
as good (its importance) as any and all of the other possible.

Hopeless?  Never hopeless unless helpless. To illustrate the answere to all
of this I beg your indulgence in my using a parable or story to continue
with an explanation:

There is a group of helpers called Recognition.  They stand on a long
assembly line of strings of data.  The first of these tireless workers,
much in the spirit of Sherlock Holmes, picks over all of the items which it
has been ORDERED (we cannot talk about 'Trained' yet), to reject items as
being 'Impossible' and cull them from the line.  Good!  Further down the
line, stand several other workers, who have been Ordered to divide this
flowing stream of what are now 'Possible' items into two branches...
'Probable' and 'Don't Know'.  This is a very active task, and that is why
there are the several workers standing there at the bifurcation ('split in
two') point, and applying their specialized talents of Sorting to this part
of the job.  And to make certain that all of these workers are doing their
jobs correctly, standing right behind them are quality Control Officers who
check the first worker's rejects as truly being such, and nothing more, and
also check the second group of workers's actions to make certain that they
are correct.

It is this 'Partitioning' of the Probables and Don't Knows that needs prick
up our attention here (and also all of you who have so been able to put up
with my rambling-- to paraphrase Chaucer, this is The Code-Mangler's Tale;
thank you!), and onto Part II, wherein, Things Get a Bit More Attention.

Norm Goundry


*(perhaps read the great Greek Classic, 'The Nature of Things')
**(as defined in the New Testament)

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu