1. Pattern Recognition

Pattern detection is an interesting subject.
Its very hard to ask the question:

 "is there a function that will return 1 if my sequence contains a pattern
  and return 0 if it doesn't?"

because of at least the following reasons:

1. When you say "pattern" your not saying much at all about the data
   except that there be some sort of order present, and order is a
   very complicated criteria.  For example:

   A="tip"  --(reference string)
   B="atip" --clearly contains 'tip'
   C="pit"  --is this really 'tip' backwards?
   D="pti"  --is this really 'tip' with the p moved to the front of
              the word 'tip'?

   Some forms of "pattern recognition" will easily determine that B
   contains 'tip'.
   If you think that D (or C) is too far encrypted to be of any use, think
   again, because there are well known mathematical forms that will easily
   discover that C and D are simply phase shifted versions of A.

2. Normally when you try to recognize a pattern, you have other
   functions which attempt to compare other constraints to the data
   and determine if there is any validity to the outcome, even if
   these functions are "built in" when the code is written and appear
   transparent.  For the examples A,B,C and D above, if you were looking
   for only English language words then you could first compare entries
   to an expandable dictionary, and that would eliminate D, unless you
   included abbreviations, which might then include D.

Because of at least these two reasons, it would be a good idea to define
your problem a little more distinctly, or even specify the problem you are
trying to solve completely, if you intend to create a good algorithm.

Good luck with it,
Al

new topic     » topic index » view message » categorize

2. Re: Pattern Recognition

On 16 Jan 2001, at 23:10, Al Getz wrote:

> Pattern detection is an interesting subject.
> Its very hard to ask the question:
>
>  "is there a function that will return 1 if my sequence contains a pattern
>   and return 0 if it doesn't?"
>
> because of at least the following reasons:
>
> 1. When you say "pattern" your not saying much at all about the data
>    except that there be some sort of order present, and order is a
>    very complicated criteria.  For example:
>
>    A="tip"  --(reference string)
>    B="atip" --clearly contains 'tip'
>    C="pit"  --is this really 'tip' backwards?
>    D="pti"  --is this really 'tip' with the p moved to the front of
>               the word 'tip'?
>
>    Some forms of "pattern recognition" will easily determine that B
>    contains 'tip'.
>    If you think that D (or C) is too far encrypted to be of any use, think
>    again, because there are well known mathematical forms that will easily
>    discover that C and D are simply phase shifted versions of A.

I was going to look at that too, one thing at a time ..... *bad* dyslexia
seems to not be a problem, small transpositions are tho. Trying to solve
problems that aren't there will really drag the system, but i agree, i do need
to look at this.


> 2. Normally when you try to recognize a pattern, you have other
>    functions which attempt to compare other constraints to the data
>    and determine if there is any validity to the outcome, even if
>    these functions are "built in" when the code is written and appear
>    transparent.  For the examples A,B,C and D above, if you were looking
>    for only English language words then you could first compare entries
>    to an expandable dictionary, and that would eliminate D, unless you
>    included abbreviations, which might then include D.

Doing that tomorrow, i hope,, or next week when it's colder out.

> Because of at least these two reasons, it would be a good idea to define
> your problem a little more distinctly, or even specify the problem you are
> trying to solve completely, if you intend to create a good algorithm.

Natural language processing. All those tokens get really messed up, Tiggr
has 150,000 unknowns in several megs of irc text, and 154,000 known
words in the dictionary. Like "kta" and "Tigr". That's a lot of missed
understandings.

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu