1. Binary searching in spell checker

Well, I got a chance to try the binary searching in my spell checker, and
although it is nice to not have to wait for the dictionary to load,
waiting for about half as long for the replacement list to generate for a
misspelled word defeats the purpose... :/

Let's see, what were those calculations I made earlier.... Something like
500 checks to generate a replacement list.. With about an average of
about 15 checks per word to see if it exists. (Most of them won't exist,
only about 8 will be found, and the word will definitely be there in 18
checks, or it doesn't exist.) This would be about 7500 checks... Ouch.
That's more than 1/10th of the dictionary. 9 misspelled words and the
whole dictionary could be loaded. (BTW, this is all guesswork, but
reasonable enough.)

I guess I'll keep trying. I'm trying to completely understand the
checking stuff but the weird, short variable names aren't helping.. :/

Is there any way to get really FAST reads from a WORDS.TXT style
dictionary? (Faster than gets() with SMARTDRV or other caching software
enabled.)

Any other ideas that could work? Hybrid ideas? (I tried several things to
try to get the dictionary filesize smaller, but decoding it all took
longer than reading the larger dictionary...) Or should I just give up
and stick to making Vector better so it can have a 'final' release?
(Well, nevermind that last one.. I'm sure I'll get a couple responses
telling me not to worry about the spell checker anymore.... :)

_____________________________________________________________________
You don't need to buy Internet access to use free Internet e-mail.
Get completely free e-mail from Juno at http://www.juno.com
Or call Juno at (800) 654-JUNO [654-5866]

new topic     » topic index » view message » categorize

2. Re: Binary searching in spell checker

>Is there any way to get really FAST reads from a WORDS.TXT style
>dictionary? (Faster than gets() with SMARTDRV or other caching software
>enabled.)

    Use Jaques Deschenes int21.e or something.
    There he uses an DOS interrupt call to load a file into a memory block.
    The overhead of converting everything from bytes to 4-byte-machine
integers will be lost, and it will be more memory efficient.

    You could specify the precize amount of data you want to load, not more,
not less, at this point Euphoria caches for ya, even if you only want to
lookup 1 character. Then it writes everything it has cached back out.
    And you may want to use some advanced form of hashing.
    Like having buckets in buckets in buckets.. a set of buckets for every 4
characters, so a word of length 12, can be calculated in once, althrough you
need to lookup the 'location' of the next bucket 3 times.
    Hashing with buckets is a good compromize between a normal up-t'ill-down
search and an over-exsessive binary search.

Ralf Nieuwenhuijsen
nieuwen at xs4all.nl

new topic     » goto parent     » topic index » view message » categorize

3. Re: Binary searching in spell checker

>    Use Jaques Deschenes int21.e or something.
>    There he uses an DOS interrupt call to load a file into a memory block.


WARNING: I'm not sure about this, but be careful when using DOS call under
Windows 95 OSR2. If you have FAT32 you may get weird results. Under Win32
it's safer to use WinAPI.

Regards,
    Daniel Berstein.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu