RE: print.e and Other Questions
- Posted by bensler at mail.com Mar 04, 2002
- 378 views
I don't se how EumsHash() could be useful for the Contest problem. The problem requires you to match the pattern not the word. That hashtable relies on the first letter of the word, and the hash value of the ascii letters. It would be useful for a spellchecker yes, but I don't see how it would work for looking for a pattern. I can see how you could just hash the pattern value, but then what do you do when you need to match a letter within the pattern? Chris Derek Parnell wrote: > 5/03/2002 2:51:28 PM, "C. K. Lester" <cklester at yahoo.com> wrote: > > > > >Where can I get print.e? > > > >From the RDS Contributions page, I guess. It is a replacement to > >Euphoria's print() routine. > > >How do I use Euman's hash table? > > It gives you a fast way on knowing if a given word is in the words.txt > file or not. > > You use it by calculating the 'hash' value for the word you are checking > on, then scan through the > sequence referenced by the first letter of the word and its length, > looking to see if the hashvalue > is there. If so, then the word is in the dictionary, otherwise it is > not. > > Eg. > > theWord = upper(theWord) > hv = EumsHash(theWord) > l = theWord[1] - 'A' + 1 > s = length(theWord) > inDict = 0 > for i = 1 to length(hash_table[l][s]) do > if hash_table[l][s][i] = hv then > -- Word is in dict. > inDict = 1 > exit > end if > end for > > >Or, rather, what structure is it? > > It is a three-level sequence. The first level represents the letters of > the alphabet. It is used to > group the dictionary words by their initial letter. The second level, > that's the level within each > initial letter, represents word length. This sorts all words that start > with the same letter into > word size. The third level is just a list of hash values for the > dictionary words. > > >I'll be looking at it tonight, > > Have fun. > > >but maybe somebody can give me a general idea about hash tables. > > The general idea behind hashing is to calculate a single value for an > item, based on attributes of > the item. Then use this value as a sort of index to speed up searches > for the item. It is often used > in compilers and other word-processing programs that have to keep track > of individual words. > > A common method is to add up the ASCII values of each letter, divide > this by the number of "bins" > you have and use the remainder to select a bin to put the word into. If > you have a huge list of > words, this is one method of effectively reducing the number of words > you have to scan through to > find the one you're after. > > Example: Assume I have 5 bins, numbered 0 to 4. > > word hashvalue bin > "CAT" 24 4 > "KANGAROO" 82 2 > "DOG" 26 1 > "ELEPHANT" 80 0 > "LEOPARD" 71 1 > thus "DOG" and "LEOAPARD" would both go in bin#1, but the others would > only have one word in them. > > The hard part is getting a good enough hashing algorithm that spreads > the indexes evenly over the > bins. > > --------- > Cheers, > Derek Parnell > ICQ# 7647806 > >