Re: Robert...EDS - questons/comments
- Posted by Euman <euman at bellsouth.net> Feb 13, 2001
- 401 views
Check this out..... http://www.mathtools.net/MATLAB/ Good info on statistical mapping / problem solving that may/may not help in some sortof way..... Has a wealth of information on just about anything that could relate to programming...... Euman ----- Original Message ----- From: "Kat" <gertie at PELL.NET> To: "EUforum" <EUforum at topica.com> Sent: Tuesday, February 13, 2001 13:04 Subject: Re: Robert...EDS - questons/comments | On 13 Feb 2001, at 8:25, SR.Williamson wrote: | | > Darn it, Kat, you went and made my brain work (such as it is) this | > morning. | | Glad to be of service. | | > Many many years ago, when I was taking some Operations Research classes, | > we talked about search methods, especially search that may or may not | > have an end point. I don't remember much unfortunately. | > | > You probably know more about this than I do, and if not, *somebody* here | > does, since I'm sure it's in some 200 level undergrad course somewhere. | > Anyway, one thing we talked about is narrowing the scope of the search | > with assumptions. For example, if the word starts with a consonant as | > spelled, assume it really starts with a consonant. Secondly, assume that | > beginning consonants are phonetic clones or adjacent on the keyboard. | > Right there you've slashed your search quite a bit. | | Well, not really, because you are making an assumption with a degree of correctness, | possibly a correctness factor of zero. If you have run down the phonetic clones list, | and found a possible match, you could still be in error, and should check keybd | layouts, different languages, personal typoing habits, regional spelling habits, etc.. | And the word which is typoed isn't in the list you have. So you start a new search with | a new assumption, repeatedly, untill you get a match you can accept, which still may | not be correct, and you will never know, cause you didn't run the other algorithms and | lists. | | > Another option might be to use heuristics. Replace letter at random from | > a set of rules like those above, compare it to a dictionary of real | > words, and if it fits, go on. If not, try another letter. This can be | > optimized by deciding the letter to be replaced based on a table of | > probabilities of letter consecutiveness and word length. For example, | > the most common English word is "the". A combination that is 3 letters | > and has "he" at the end is highly likely to be a "t", and there are a | > limited number of other possibilities like "then", "they", "thee", | > "she", "he", etc. | | Are you speaking of using the suffix tree? Problem is, as i see it, is that eventually you | will have every letter of the alphabet followed/preceeded by every other | letter/sequence, so everything matches to some extent, somewehere, even if it's | wrong. Finding a word already in the tree is easy, but then so is a binary search in a | list. I expect you'll start with the unknown, and search a sequence tree for every | subsequence contained int he unknown, and end up with a multitude of results in the | form Graeme's diff() code outputs? Ack! (No offence to Graeme, it's just a case of | different solutions for different problems.) | | > Of course, I'm only adding to the processor overhead, but it seems to me | > that there is some breakeven point where the smaller dictionary size | > makes up for the fact that the processor has to choose dictionaries and | > do multiple guesses. Like I said, I'm sure you know more about this than | > I do though. | | There are several methods for reducing the search scope, such as knowledge | (medical, physics) domain limits, language (english, spanish) limits, playing dumb (if | the word isn't known, ignore it, or ask about it, or change the topic), etc., but i do not | wish to impose limits, especially limits of understanding the words, even typoed words. | | Kat | | |