OpenEuphoria: Forum: Re: data analysis

Re: data analysis

new topic » goto parent » topic index » view thread » older message » newer message

Posted by Kat <gertie at PELL.NET> Jan 18, 2001
656 views

On 17 Jan 2001, at 22:15, David Cuny wrote:

> Kat wrote:
>
> > Why is [changing MaxGap] a bad idea?
> > What i was trying to do was to hit all
> > possible resync points.
>
> It's not, actually. For example, David Cope uses a similar pattern matcher in
> his EMI program, only it works with musical pitches instead of letters. In his
> application, the parameters are self tuning: he feeds it music by a composer,
> and it automatically adjusts parameters of the pattern matcher until the music
> it regenerates matches the same statistics as the source material.
>
> Something else you might want to consider - I recall reading that pattern
> matching for speech recognition wasn't all that great, until someone decided
> to use Markov chains to 'guess' what the next word might be. The utterance
> would be first compared to that set, and if there were no good candidates, a
> brute force match would be done. Perhaps something similar might work with
> Tiggr?

I considered that many yrs ago, but the research i have run across since
has shown me that prediction works only in knowledge/theme domains the
script has knowledge of. The trick is to either know all the domains under
discussion (which is how most of the Turing tests held in Australia are
done: the script's authors get to choose the domain for that script's
interaction with the judges), or don't use much prediction. Since i use
variables for nearly everything, turning on/off the prediction on the fly isn't
a
problem, and some word pre/postdiction is already coded into the
database, as well as syntactic pre/post"diction". Even the domain is
encoded for each word on a per-use basis, altho a *lot* of data is still not
entered. What i have been concentrating on mostly, since i discovered
harddrives no longer cost $1000/megabyte, is to collect the data as sets
within a domain, with text concerning relavant words in that domain, for
Tiggr or Gertie to use as pre/postdictors in that domain, as well as
associative words. If she didn't have "nephrology" listed in the dictionary as
medical, a broad search would find it in the knowledgebase in the grouping
with other medical terms, and she could update the dictionary herself. At
least i *hope* she has these eureka moments, cause i *sure* don't want to
finish that job for her!

I don't know why some things are so difficult for me to understand. It took
me years to learn checkers, but i won my first chess game the same day i
first saw the chess board. I'm wierd like that. So i am really glad to have
this list here to help me out in difficult coding. Thanks again David.
Graeme, i simply haven't gotten to your code yet, but i bet i use David's
2nd and 3rd posting with your's in parallel threads.

Kat

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: data analysis

Search

Include:

Quick Links

User menu

Misc Menu