Re: Pattern Recognition
- Posted by Kat <gertie at PELL.NET> Jan 17, 2001
- 408 views
On 16 Jan 2001, at 23:10, Al Getz wrote: > Pattern detection is an interesting subject. > Its very hard to ask the question: > > "is there a function that will return 1 if my sequence contains a pattern > and return 0 if it doesn't?" > > because of at least the following reasons: > > 1. When you say "pattern" your not saying much at all about the data > except that there be some sort of order present, and order is a > very complicated criteria. For example: > > A="tip" --(reference string) > B="atip" --clearly contains 'tip' > C="pit" --is this really 'tip' backwards? > D="pti" --is this really 'tip' with the p moved to the front of > the word 'tip'? > > Some forms of "pattern recognition" will easily determine that B > contains 'tip'. > If you think that D (or C) is too far encrypted to be of any use, think > again, because there are well known mathematical forms that will easily > discover that C and D are simply phase shifted versions of A. I was going to look at that too, one thing at a time ..... *bad* dyslexia seems to not be a problem, small transpositions are tho. Trying to solve problems that aren't there will really drag the system, but i agree, i do need to look at this. > 2. Normally when you try to recognize a pattern, you have other > functions which attempt to compare other constraints to the data > and determine if there is any validity to the outcome, even if > these functions are "built in" when the code is written and appear > transparent. For the examples A,B,C and D above, if you were looking > for only English language words then you could first compare entries > to an expandable dictionary, and that would eliminate D, unless you > included abbreviations, which might then include D. Doing that tomorrow, i hope,, or next week when it's colder out. > Because of at least these two reasons, it would be a good idea to define > your problem a little more distinctly, or even specify the problem you are > trying to solve completely, if you intend to create a good algorithm. Natural language processing. All those tokens get really messed up, Tiggr has 150,000 unknowns in several megs of irc text, and 154,000 known words in the dictionary. Like "kta" and "Tigr". That's a lot of missed understandings. Kat