Re: Spell checker
- Posted by Mike Burrell <mikpos at SOFTHOME.NET> Mar 16, 1998
- 760 views
Robert B Pilkington wrote: > I'm trying to write a spell checker, but there is one problem, and I > don't know if anybody can answer it, but I figured I'd ask anyway: > > How do I find the words in the dictionary that are close enough to the > mispelled word to be put into the "Replace" list? > > (And it would be nice if it were fast, and if it checked for words that > might be two words but the spacebar wasn't pressed hard enough, and if it > loaded faster than the current version I have.....) well there are quite a number of ways you could do this, and the way i'm going to tell you is just the way i would do (i've only thought about this for about 5 seconds so this is the first thing that popped into my head) okay first of all you need a function which returns the list of possible replacements for that word (i'm assuming you want to return a list of a number of choices, not just /THE/ best choice). let's call it replace_word function replace_word(string word, sequence word_list, integer num_recursions, integer max_recursions) btw a good idea is to lower_case() (or whatever it's called) 'word' and 'word_list'. you'll also have to define type string if you're using my example literally. when you call this routine, make sure you pass a value of 0 to num_recursions; max_recursions >= 1 (the higher the value of max_recursions, the slower it is, but the higher the number of words it returns); sequence word_list is list of an unknown number of strings (a string is a sequence of an unknown number of bytes) which contains all words in the dictionary. the first thing to do is check to see if the word is just spelled wrong (i.e. there's not just a space missing). forgive me if i use some functions wrong in the following example, as i don't have the euphoria documentation in front of me right now. i'm also using euphoria pre-processor, so if you don't use pre-processor, you'll just have to guess what this code means :) function replace_word(string word, sequence word_list, integer num_recursions, integer max_recursions) integer good sequence ret string w1, w2 ret = {} for c = 1 to length(word) do with each dic_word in word_list do if c = 1 then good = wildcard_match("*" & word[2..end], dic_word) elsif c = length(word) then good = wildcard_match(word[1..c - 1] & "*", dic_word) else good = wildcard_match(word[1..c - 1] & "*" & word[c + 1..end], dic_word) end if if good then ret = append(ret, dic_word) end if end with end for -- check for missing spaces if num_recursions < max_recursions then c =+ 1 for c = 2 to length(word) - 1 do w1 = word[1..c - 1] w2 = word[c + 1..end] ret =& replace_word(w1, word_list, num_recursions, max_recursions) ret =& replace_word(w2, word_list, num_recursions, max_recursions) end for end if return ret end function if you're the slightest bit foggiest on anything mentioned, don't hesitate to re-post with your questions. keep in mind i haven't tried this (or thought too much about it heh) so it's bound not to work exactly as it is. i suppose i should have documented a bit but i think it's pretty self-explanatory (and you can always post again with your questions). the only problem is, that not only does it break it up into different words to see if the user was just missing a space, but it even spell checks the words that it breaks it up into! if you're really serious about this (and really stuck) tell me and i'll poke around a bit in euphoria and see if can't get a (semi-)working model. -- . m i k e b u r r e l l . . h t t p : / / m i k p o s . h o m e . m l . o r g . . m i k p o s @ s o f t h o m e . n e t . . ftp://ftp.scene.org /pub/music/artists/mikpos/ .