Re: Spell checker optimization

new topic     » goto parent     » topic index » view thread      » older message » newer message

I ment, a sequence with 27 entries, which all have 27 entries, etc.
For example:
The word apple
can be looked up: ['a']['p']['p']['l']['e']

Only remember to subtract 'a' and add 1 to use a more efficient tree.
The 27th element, is the end of word element, if it contains a true value
than the word ends there, if it contains a false value than it means there
is no word yet.

Because euphoria's sequence can handle this in memory, but not in file, you
need indexes.
The file would start with a record of 27 fields. All fields contain an index
in the file (maybe a relative index works nicer)
Then it goes there, reads the next record of fields.
If it has no more characters to look up, it checks the 27th field and see if
it's a false or not. If not, the word is correct, else not.
This would mean that you only need to read from disk as many times as the
word's length.
Althrough you could have the first couple of levels in memory (in one
sequence).
As many levels as it takes to make the list of corrected words less than a
1000 or 250 or something.
Then you can read the 1000's words up in once (one hd light if you use
Jaques Deschenes DOS i/o routines, so you can have it put in an preallocated
place in one stroke)
And then you can binairy search the list of words off course.

THe binairy tree does however take up more memory than a normal list, but it
makes lookups much faster.
You can also offcourse use hashing for every two or three characters and put
that in a simerlair tree...

Ralf

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu