Re: OCR Part I...

new topic     » goto parent     » topic index » view thread      » older message » newer message

Although this is a very interesting subject, on which I will respond at some
other time, perhaps.
However, I would like to respond immidiately on the OCR ( & compression ) part.

I will start with the basic concept of compression.
Compression means liking one type of data more at the cost of another.
What cost ?
With lossless, the cost of the other type of data, would be that is larger.
With lossy, the cost of the other type od data, would be the decrease in
quality/significance.

So, why does compression work ?
Because most of all files are in some way related to our human world full of
pattern-based, math-based and other type of
relationships.
Certain things really are more common than others.

My point is, for example, our human voice, can be much different in certain
aspects, but eventually is based upon some
relationships of tone and volume and order. This is relationship due to nature.
Due to the way our voice works. The best
definition of noise would be that part of the data we couldn't care less about.
The procces would be to seperate the noice from
the data we do want. How do we do this ? On the very same way lossy compression
works. In theory the two algorithms would be
identical, they would both discard an amount of data. (the OCR would discard
even more, though)

However, discarding such noise, is more than just 'rounding' the values.
Unfortunately we work with absolute values, while
patters are by definition relative. Now, here's a good figurative explenation of
what kind of 'noise' needs to be 'floored' /
'discarded'..

Say a person was driving a car, from point X to point Y.
We don't care from where to where, but what angles, which velocities, etc.
Think vectors.

I suggest, looking at how tone and volume change as in a vector. ( we'll call
this vectorA )
Afterwhich I would round (floor) the vector - values to minimize the number of
vectors that represent the 'route' our voice took
from point X to Y.
Now we have a new less 'noisy' image of our voice data.
This time instead of splitting up the voice data in tone and volume, use the
whole data in a vector based upon time & data.
Again floor the vector values.

I never tried it, but I also figured the above should work. Off course, not all
'flooring' should be done so early in the
interpretation procces.
Some of it, I guess, should be done, to make the sentence right. But this is
more normal wildcard stuff. If you can get as far,
as to recognize different mouth movement (something, the above should), you're
pretty far.

And yes, I know, some highly trained scientist and professionals are dealing
with this issues and that it isn't that simple as I
make it look.
But, I'm am not able to know what I don't know, so any approximation into when I
do and don't know would be purely guessed and
therefor useless anyway. (Hmm, I don't want to defend ignorance though.. hmm,
difficult .. )

Ralf

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu