Re: RE: Hash Table Lookup

new topic     » goto parent     » topic index » view thread      » older message » newer message

The basic idea behind hash tables is to sort a large set of items into different
"buckets", so that
later on, you can find something quickly by working out which bucket its in. The
trick is to try and
limit the number of items you put into any given bucket. If we have a dictionary
of some 50,000+
words and only want about 100 words per bucket, then we need at least 500
buckets. If you want less
in each bucket, you need more buckets to play with. But how do you know which
bucket to put a word
into? Traditionally, people use the letters in the word to calculate a numeric
index - ie, the
number of the bucket to use. You could simply try to add up the values of the
letters and use the
result as an index. If the total exceeds the number of buckets you have, then
divide the bucket
count in to the total and use the remainder (plus one). 

However, with our English text, this doesn't tend to generate big enough numbers
to spread the words
around evenly. Below is another way that you can use to get big numbers from the
letters in a word.


function WordHash( sequence text)
    integer x
    x = text[1] + #FFFF
    for i = 2 to length(text) do
        x += text[i] * i *  text[i-1]
    end for
    return x
end function

This method is fairly fast and still generates big-ish numbers. The bigger the
number, the more
chance you have of it landing it an empty bucket. Another hint is to use
prime-numbers for the
bucket count.

6/03/2002 4:06:28 PM, "C. K. Lester" <cklester at yahoo.com> wrote:

>
>How degraded is the efficiency if you only make a hash table based on 
>length of the text? For instance,
>
>hash_table = {
>              { "a" , "I" },
>              { "ad" , "ah" , "am" , "an" , "as" , ... }...
>             }
>
>I hope I'm using the right terminology! That would be a hash table, 
>right?
>
>I can't seem to build a length-based table as quickly as the 
>EumsBIGDHash gets built! Why is that (besides the fact that I really 
>don't know what I'm doin' just yet).
>
>
>
>
---------
Cheers,
Derek Parnell 
ICQ# 7647806

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu