Re: print.e and Other Questions

new topic     » goto parent     » topic index » view thread      » older message » newer message

5/03/2002 2:51:28 PM, "C. K. Lester" <cklester at yahoo.com> wrote:

>
>Where can I get print.e?
>
>From the RDS Contributions page, I guess. It is a replacement to Euphoria's
>print() routine.

>How do I use Euman's hash table? 

It gives you a fast way on knowing if a given word is in the words.txt file or
not.

You use it by calculating the 'hash' value for the word you are checking on,
then scan through the
sequence referenced by the first letter of the word and its length, looking to
see if the hashvalue
is there. If so, then the word is in the dictionary, otherwise it is not.

   Eg.

	 theWord = upper(theWord)
      hv = EumsHash(theWord)
      l = theWord[1] - 'A' + 1
      s = length(theWord)
      inDict = 0
      for i = 1 to length(hash_table[l][s]) do
          if hash_table[l][s][i] = hv then
               -- Word is in dict.
              inDict = 1
              exit
          end if
	 end for

>Or, rather, what structure is it? 

It is a three-level sequence. The first level represents the letters of the
alphabet. It is used to
group the dictionary words by their initial letter.  The second level, that's
the level within each
initial letter, represents word length. This sorts all words that start with the
same letter into
word size. The third level is just a list of hash values for the dictionary
words.

>I'll be looking at it tonight, 

Have fun.

>but maybe somebody can give me a general idea about hash tables.

The general idea behind hashing is to calculate a single value for an item,
based on attributes of
the item. Then use this value as a sort of index to speed up searches for the
item. It is often used
in compilers and other word-processing programs that have to keep track of
individual words.

A common method is to add up the ASCII values of each letter, divide this by the
number of "bins"
you have and use the remainder to select a bin to put the word into. If you have
a huge list of
words, this is one method of effectively reducing the number of words you have
to scan through to
find the one you're after.

   Example:  Assume I have 5 bins, numbered 0 to 4.

     word          hashvalue        bin
    "CAT"            24              4
    "KANGAROO"       82              2
    "DOG"            26              1
    "ELEPHANT"       80              0
    "LEOPARD"        71              1
thus "DOG" and "LEOAPARD" would both go in bin#1, but the others would only have
one word in them.

The hard part is getting a good enough hashing algorithm that spreads the
indexes evenly over the
bins.

---------
Cheers,
Derek Parnell 
ICQ# 7647806

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu