Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 18, 2001
- 465 views
On 17 Jan 2001, at 22:15, David Cuny wrote: > Kat wrote: > > > Why is [changing MaxGap] a bad idea? > > What i was trying to do was to hit all > > possible resync points. > > It's not, actually. For example, David Cope uses a similar pattern matcher in > his EMI program, only it works with musical pitches instead of letters. In his > application, the parameters are self tuning: he feeds it music by a composer, > and it automatically adjusts parameters of the pattern matcher until the music > it regenerates matches the same statistics as the source material. > > Something else you might want to consider - I recall reading that pattern > matching for speech recognition wasn't all that great, until someone decided > to use Markov chains to 'guess' what the next word might be. The utterance > would be first compared to that set, and if there were no good candidates, a > brute force match would be done. Perhaps something similar might work with > Tiggr? I considered that many yrs ago, but the research i have run across since has shown me that prediction works only in knowledge/theme domains the script has knowledge of. The trick is to either know all the domains under discussion (which is how most of the Turing tests held in Australia are done: the script's authors get to choose the domain for that script's interaction with the judges), or don't use much prediction. Since i use variables for nearly everything, turning on/off the prediction on the fly isn't a problem, and some word pre/postdiction is already coded into the database, as well as syntactic pre/post"diction". Even the domain is encoded for each word on a per-use basis, altho a *lot* of data is still not entered. What i have been concentrating on mostly, since i discovered harddrives no longer cost $1000/megabyte, is to collect the data as sets within a domain, with text concerning relavant words in that domain, for Tiggr or Gertie to use as pre/postdictors in that domain, as well as associative words. If she didn't have "nephrology" listed in the dictionary as medical, a broad search would find it in the knowledgebase in the grouping with other medical terms, and she could update the dictionary herself. At least i *hope* she has these eureka moments, cause i *sure* don't want to finish that job for her! I don't know why some things are so difficult for me to understand. It took me years to learn checkers, but i won my first chess game the same day i first saw the chess board. I'm wierd like that. So i am really glad to have this list here to help me out in difficult coding. Thanks again David. Graeme, i simply haven't gotten to your code yet, but i bet i use David's 2nd and 3rd posting with your's in parallel threads. Kat