1. print.e and Other Questions
- Posted by "C. K. Lester" <cklester at yahoo.com> Mar 04, 2002
- 396 views
Where can I get print.e? How do I use Euman's hash table? Or, rather, what structure is it? I'll be looking at it tonight, but maybe somebody can give me a general idea about hash tables. Thanks! ck
2. Re: print.e and Other Questions
- Posted by Ken Rhodes <ken_rhodes30436 at yahoo.com> Mar 04, 2002
- 387 views
print.e http://www.rapideuphoria.com/print.zip binary print/get http://www.rapideuphoria.com/bget.zip http://www.rapideuphoria.com/gbinary.zip
3. Re: print.e and Other Questions
- Posted by Derek Parnell <ddparnell at bigpond.com> Mar 04, 2002
- 386 views
5/03/2002 2:51:28 PM, "C. K. Lester" <cklester at yahoo.com> wrote: > >Where can I get print.e? > >From the RDS Contributions page, I guess. It is a replacement to Euphoria's >print() routine. >How do I use Euman's hash table? It gives you a fast way on knowing if a given word is in the words.txt file or not. You use it by calculating the 'hash' value for the word you are checking on, then scan through the sequence referenced by the first letter of the word and its length, looking to see if the hashvalue is there. If so, then the word is in the dictionary, otherwise it is not. Eg. theWord = upper(theWord) hv = EumsHash(theWord) l = theWord[1] - 'A' + 1 s = length(theWord) inDict = 0 for i = 1 to length(hash_table[l][s]) do if hash_table[l][s][i] = hv then -- Word is in dict. inDict = 1 exit end if end for >Or, rather, what structure is it? It is a three-level sequence. The first level represents the letters of the alphabet. It is used to group the dictionary words by their initial letter. The second level, that's the level within each initial letter, represents word length. This sorts all words that start with the same letter into word size. The third level is just a list of hash values for the dictionary words. >I'll be looking at it tonight, Have fun. >but maybe somebody can give me a general idea about hash tables. The general idea behind hashing is to calculate a single value for an item, based on attributes of the item. Then use this value as a sort of index to speed up searches for the item. It is often used in compilers and other word-processing programs that have to keep track of individual words. A common method is to add up the ASCII values of each letter, divide this by the number of "bins" you have and use the remainder to select a bin to put the word into. If you have a huge list of words, this is one method of effectively reducing the number of words you have to scan through to find the one you're after. Example: Assume I have 5 bins, numbered 0 to 4. word hashvalue bin "CAT" 24 4 "KANGAROO" 82 2 "DOG" 26 1 "ELEPHANT" 80 0 "LEOPARD" 71 1 thus "DOG" and "LEOAPARD" would both go in bin#1, but the others would only have one word in them. The hard part is getting a good enough hashing algorithm that spreads the indexes evenly over the bins. --------- Cheers, Derek Parnell ICQ# 7647806
4. Re: print.e and Other Questions
- Posted by euman at bellsouth.net Mar 04, 2002
- 391 views
Sorry bout that CK, I only used print.e to look at the output in an external file. so its safe to comment this line out. or you can download the include here: http://www.rapideuphoria.com/print.zip Derek more than answered the Q? about hash routines. BIG D's the man! Euman euman at bellsouth.net Q: Are we monetarily insane? A: YES ----- Original Message ----- From: "Derek Parnell" <ddparnell at bigpond.com> To: "EUforum" <EUforum at topica.com> Sent: Tuesday, March 05, 2002 12:08 AM Subject: Re: print.e and Other Questions > > 5/03/2002 2:51:28 PM, "C. K. Lester" <cklester at yahoo.com> wrote: > > > > >Where can I get print.e? > > > >From the RDS Contributions page, I guess. It is a replacement to Euphoria's > >print() routine. > > >How do I use Euman's hash table? > > It gives you a fast way on knowing if a given word is in the words.txt file or > not. > > You use it by calculating the 'hash' value for the word you are checking on, > then scan through the > sequence referenced by the first letter of the word and its length, looking to > see if the hashvalue > is there. If so, then the word is in the dictionary, otherwise it is not. > > Eg. > > theWord = upper(theWord) > hv = EumsHash(theWord) > l = theWord[1] - 'A' + 1 > s = length(theWord) > inDict = 0 > for i = 1 to length(hash_table[l][s]) do > if hash_table[l][s][i] = hv then > -- Word is in dict. > inDict = 1 > exit > end if > end for > > >Or, rather, what structure is it? > > It is a three-level sequence. The first level represents the letters of the > alphabet. It is used to > group the dictionary words by their initial letter. The second level, that's > the level within each > initial letter, represents word length. This sorts all words that start with > the same letter into > word size. The third level is just a list of hash values for the dictionary > words. > > >I'll be looking at it tonight, > > Have fun. > > >but maybe somebody can give me a general idea about hash tables. > > The general idea behind hashing is to calculate a single value for an item, > based on attributes of > the item. Then use this value as a sort of index to speed up searches for the > item. It is often used > in compilers and other word-processing programs that have to keep track of > individual words. > > A common method is to add up the ASCII values of each letter, divide this by > the number of "bins" > you have and use the remainder to select a bin to put the word into. If you > have a huge list of > words, this is one method of effectively reducing the number of words you have > to scan through to > find the one you're after. > > Example: Assume I have 5 bins, numbered 0 to 4. > > word hashvalue bin > "CAT" 24 4 > "KANGAROO" 82 2 > "DOG" 26 1 > "ELEPHANT" 80 0 > "LEOPARD" 71 1 > thus "DOG" and "LEOAPARD" would both go in bin#1, but the others would only > have one word in them. > > The hard part is getting a good enough hashing algorithm that spreads the > indexes evenly over the > bins. > > --------- > Cheers, > Derek Parnell > ICQ# 7647806 > > > >
5. Re: print.e and Other Questions
- Posted by euman at bellsouth.net Mar 04, 2002
- 376 views
----- Original Message ----- From: "C. K. Lester" <cklester at yahoo.com> To: "EUforum" <EUforum at topica.com> > > C. K. Lester wrote: > > Where can I get print.e? > > Nevermind. I found it. > > > How do I use Euman's hash table? Or, rather, what structure is it? > > I'll be looking at it tonight, but maybe somebody can give me a > > general idea about hash tables. > > Okay, it looks like it's an index to the dictionary... is that right? > > How is it sorted?! CK, I guess you can get away with saying "index to the dictionary" and that be right. The "word.txt" file is already sorted so when I read in the file it steps thru it from top to bottom (in alphabetical order) the next part is that it also "sorts" by the length of the text that its hashed and just appends to existing words of equal length. I just optimized my original routine to be almost as fast as Dereks optimized version he's still beating me by 0.11 sec. I just might be able to make Dereks version faster. I should know by tommorow. If we keep this up Junko might have to retire her spellchecker. ;) just pickin Junko Euman
6. Re: print.e and Other Questions
- Posted by David Cuny <dcuny at LANSET.COM> Mar 05, 2002
- 382 views
Chris wrote: > I don't se how EumsHash() could be useful for the Contest problem. It depends on how you've coded your solution. My word scoring code looks something like this: if find( word, dictionary ) then return <score based on word size> else return <score based on partial match> end if 50,000 is a lot of words to do that initial find on. A hash-based search would speed things up quite a bit. Searching for a partial match would also benefit from a hash table. For example, I encode the word "DOLLAR" under the key "ABCCDE". Replacing the search for the key with a hash would speed things up (although storing the index is a better long-term solution). -- David Cuny