Re: OCR Part I...
- Posted by Ralf Nieuwenhuijsen <nieuwen at XS4ALL.NL> May 16, 1999
- 542 views
Although this is a very interesting subject, on which I will respond at some other time, perhaps. However, I would like to respond immidiately on the OCR ( & compression ) part. I will start with the basic concept of compression. Compression means liking one type of data more at the cost of another. What cost ? With lossless, the cost of the other type of data, would be that is larger. With lossy, the cost of the other type od data, would be the decrease in quality/significance. So, why does compression work ? Because most of all files are in some way related to our human world full of pattern-based, math-based and other type of relationships. Certain things really are more common than others. My point is, for example, our human voice, can be much different in certain aspects, but eventually is based upon some relationships of tone and volume and order. This is relationship due to nature. Due to the way our voice works. The best definition of noise would be that part of the data we couldn't care less about. The procces would be to seperate the noice from the data we do want. How do we do this ? On the very same way lossy compression works. In theory the two algorithms would be identical, they would both discard an amount of data. (the OCR would discard even more, though) However, discarding such noise, is more than just 'rounding' the values. Unfortunately we work with absolute values, while patters are by definition relative. Now, here's a good figurative explenation of what kind of 'noise' needs to be 'floored' / 'discarded'.. Say a person was driving a car, from point X to point Y. We don't care from where to where, but what angles, which velocities, etc. Think vectors. I suggest, looking at how tone and volume change as in a vector. ( we'll call this vectorA ) Afterwhich I would round (floor) the vector - values to minimize the number of vectors that represent the 'route' our voice took from point X to Y. Now we have a new less 'noisy' image of our voice data. This time instead of splitting up the voice data in tone and volume, use the whole data in a vector based upon time & data. Again floor the vector values. I never tried it, but I also figured the above should work. Off course, not all 'flooring' should be done so early in the interpretation procces. Some of it, I guess, should be done, to make the sentence right. But this is more normal wildcard stuff. If you can get as far, as to recognize different mouth movement (something, the above should), you're pretty far. And yes, I know, some highly trained scientist and professionals are dealing with this issues and that it isn't that simple as I make it look. But, I'm am not able to know what I don't know, so any approximation into when I do and don't know would be purely guessed and therefor useless anyway. (Hmm, I don't want to defend ignorance though.. hmm, difficult .. ) Ralf