Re: Spell checker optimization
- Posted by Ralf Nieuwenhuijsen <nieuwen at XS4ALL.NL> Mar 25, 1998
- 870 views
I ment, a sequence with 27 entries, which all have 27 entries, etc. For example: The word apple can be looked up: ['a']['p']['p']['l']['e'] Only remember to subtract 'a' and add 1 to use a more efficient tree. The 27th element, is the end of word element, if it contains a true value than the word ends there, if it contains a false value than it means there is no word yet. Because euphoria's sequence can handle this in memory, but not in file, you need indexes. The file would start with a record of 27 fields. All fields contain an index in the file (maybe a relative index works nicer) Then it goes there, reads the next record of fields. If it has no more characters to look up, it checks the 27th field and see if it's a false or not. If not, the word is correct, else not. This would mean that you only need to read from disk as many times as the word's length. Althrough you could have the first couple of levels in memory (in one sequence). As many levels as it takes to make the list of corrected words less than a 1000 or 250 or something. Then you can read the 1000's words up in once (one hd light if you use Jaques Deschenes DOS i/o routines, so you can have it put in an preallocated place in one stroke) And then you can binairy search the list of words off course. THe binairy tree does however take up more memory than a normal list, but it makes lookups much faster. You can also offcourse use hashing for every two or three characters and put that in a simerlair tree... Ralf