Re: Robert...EDS - questons/comments

new topic     » goto parent     » topic index » view thread      » older message » newer message

On 13 Feb 2001, at 8:25, SR.Williamson wrote:

> Darn it, Kat, you went and made my brain work (such as it is) this 
> morning.

Glad to be of service. smile

> Many many years ago, when I was taking some Operations Research classes, 
> we talked about search methods, especially search that may or may not 
> have an end point. I don't remember much unfortunately.
> 
> You probably know more about this than I do, and if not, *somebody* here 
> does, since I'm sure it's in some 200 level undergrad course somewhere. 
> Anyway, one thing we talked about is narrowing the scope of the search 
> with assumptions. For example, if the word starts with a consonant as 
> spelled, assume it really starts with a consonant. Secondly, assume that 
> beginning consonants are phonetic clones or adjacent on the keyboard. 
> Right there you've slashed your search quite a bit. 

Well, not really, because you are making an assumption with a degree of
correctness,
possibly a correctness factor of zero. If you have run down the phonetic clones
list,
and found a possible match, you could still be in error, and should check keybd 
layouts, different languages, personal typoing habits, regional spelling habits,
etc..
And the word which is typoed isn't in the list you have. So you start a new
search with
a new assumption, repeatedly, untill you get a match you can accept, which still
may
not be correct, and you will never know, cause you didn't run the other
algorithms and
lists.

> Another option might be to use heuristics. Replace letter at random from 
> a set of rules like those above, compare it to a dictionary of real 
> words, and if it fits, go on. If not, try another letter. This can be 
> optimized by deciding the letter to be replaced based on a table of 
> probabilities of letter consecutiveness and word length. For example, 
> the most common English word is "the". A combination that is 3 letters 
> and has "he" at the end is highly likely to be a "t", and there are a 
> limited number of other possibilities like "then", "they", "thee", 
> "she", "he", etc.

Are you speaking of using the suffix tree? Problem is, as i see it, is that
eventually you
will have every letter of the alphabet followed/preceeded  by every other 
letter/sequence, so everything matches to some extent, somewehere, even if it's 
wrong. Finding a word already in the tree is easy, but then so is a binary
search in a
list. I expect you'll start with the unknown, and search a sequence tree for
every
subsequence contained int he unknown, and end up with a multitude of results in
the
form Graeme's diff() code outputs? Ack! (No offence to Graeme, it's just a case
of
different solutions for different problems.) 
 
> Of course, I'm only adding to the processor overhead, but it seems to me 
> that there is some breakeven point where the smaller dictionary size 
> makes up for the fact that the processor has to choose dictionaries and 
> do multiple guesses. Like I said, I'm sure you know more about this than 
> I do though.

There are several methods for reducing the search scope, such as knowledge 
(medical, physics) domain limits, language (english, spanish) limits, playing
dumb (if
the word isn't known, ignore it, or ask about it, or change the topic), etc.,
but i do not
wish to impose limits, especially limits of understanding the words, even typoed
words.
 
Kat

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu