Re: RE: Hash Table Lookup
- Posted by Derek Parnell <ddparnell at bigpond.com> Mar 05, 2002
- 374 views
The basic idea behind hash tables is to sort a large set of items into different "buckets", so that later on, you can find something quickly by working out which bucket its in. The trick is to try and limit the number of items you put into any given bucket. If we have a dictionary of some 50,000+ words and only want about 100 words per bucket, then we need at least 500 buckets. If you want less in each bucket, you need more buckets to play with. But how do you know which bucket to put a word into? Traditionally, people use the letters in the word to calculate a numeric index - ie, the number of the bucket to use. You could simply try to add up the values of the letters and use the result as an index. If the total exceeds the number of buckets you have, then divide the bucket count in to the total and use the remainder (plus one). However, with our English text, this doesn't tend to generate big enough numbers to spread the words around evenly. Below is another way that you can use to get big numbers from the letters in a word. function WordHash( sequence text) integer x x = text[1] + #FFFF for i = 2 to length(text) do x += text[i] * i * text[i-1] end for return x end function This method is fairly fast and still generates big-ish numbers. The bigger the number, the more chance you have of it landing it an empty bucket. Another hint is to use prime-numbers for the bucket count. 6/03/2002 4:06:28 PM, "C. K. Lester" <cklester at yahoo.com> wrote: > >How degraded is the efficiency if you only make a hash table based on >length of the text? For instance, > >hash_table = { > { "a" , "I" }, > { "ad" , "ah" , "am" , "an" , "as" , ... }... > } > >I hope I'm using the right terminology! That would be a hash table, >right? > >I can't seem to build a length-based table as quickly as the >EumsBIGDHash gets built! Why is that (besides the fact that I really >don't know what I'm doin' just yet). > > > > --------- Cheers, Derek Parnell ICQ# 7647806