Re: Spell checker optimization
- Posted by MAP <ddhinc at ALA.NET> Mar 25, 1998
- 748 views
Ralf Nieuwenhuijsen wrote: > > I ment, a sequence with 27 entries, which all have 27 entries, etc. > For example: > The word apple > can be looked up: ['a']['p']['p']['l']['e'] > <Snip> I understand what you're saying now. Remember though that what I was basically proposing was that Mr. Pilkington store the soundex codes as the key for his database. These are represented as a letter followed by a 3 digit value, not as the word itself. By comparing the soundex code generated by the word being checked against the already existing keys in the database, you are abled to obtain a list of alternate choices for any word not spelled correctly. Using a search algorithm based on the words themselves would re-incur the redundancy that the soundex algorithm had eliminated. Choosing the algorithm with which to access it quickly is still in question though. Although Mr. Phillips pointed out that it would be very useful to be abled to add words to the dictionary, I still am not sold on the binary tree idea. I think the volatility of the database would be too low to warrant the more complex implementation. The user would have to add 500 words to affect the 50000 word database by 1 percent... something which is more likely to occur over a great number of sessions than in one sitting. In a situation with volatility this low, i'd opt to go with the faster (and simpler) binary search algorithm, and use an alternate (even if slower than binary trees) method when adding items to the database since the operation is "relatively" infrequent. Also, he could very well choose to stick with the hashing algorithm if it suits his needs. Junko Miura has a spell checker listed in Euphoria's archive page that uses some variation of a hashing algorithm. It seems to run at a decent speed on my computer, although I only used it on a short document I had handy. I barely glanced at the code so I don't know the specifics on what it's doing, but it's probably worth investigating if he hasn't done so already. Christopher D. Hickman