Re: Spell checker optimization

new topic     » goto parent     » topic index » view thread      » older message » newer message

Ralf Nieuwenhuijsen wrote:
>
> I ment, a sequence with 27 entries, which all have 27 entries, etc.
> For example:
> The word apple
> can be looked up: ['a']['p']['p']['l']['e']
>
<Snip>

I understand what you're saying now. Remember though that what I
was basically proposing was that Mr. Pilkington store the soundex
codes as the key for his database. These are represented as a
letter followed by a 3 digit value, not as the word itself. By
comparing the soundex code generated by the word being checked
against the already existing keys in the database, you are abled
to obtain a list of alternate choices for any word not spelled
correctly. Using a search algorithm based on the words
themselves would re-incur the redundancy that the soundex
algorithm had eliminated.

Choosing the algorithm with which to access it quickly is still
in question though. Although Mr. Phillips pointed out that
it would be very useful to be abled to add words to the
dictionary, I still am not sold on the binary tree
idea. I think the volatility of the database would be
too low to warrant the more complex implementation. The user
would have to add 500 words to affect the 50000 word database
by 1 percent... something which is more likely to occur over
a great number of sessions than in one sitting. In a situation
with volatility this low, i'd opt to go with the faster (and
simpler) binary search algorithm, and use an alternate
(even if slower than binary trees) method when adding items to
the database since the operation is "relatively" infrequent.

Also, he could very well choose to stick with the hashing
algorithm if it suits his needs. Junko Miura has a spell checker
listed in Euphoria's archive page that uses some variation of
a hashing algorithm. It seems to run at a decent speed on my
computer, although I only used it on a short document I had
handy. I barely glanced at the code so I don't know the
specifics on what it's doing, but it's probably worth
investigating if he hasn't done so already.

Christopher D. Hickman

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu