Re: Hash Table Lookup
- Posted by euman at bellsouth.net Mar 05, 2002
- 346 views
----- Original Message ----- From: "C. K. Lester" <cklester at yahoo.com> To: "EUforum" <EUforum at topica.com> > How degraded is the efficiency if you only make a hash table based on > length of the text? For instance, > > hash_table = { > { "a" , "I" }, > { "ad" , "ah" , "am" , "an" , "as" , ... }... > } > > I hope I'm using the right terminology! That would be a hash table, > right? > > I can't seem to build a length-based table as quickly as the > EumsBIGDHash gets built! Why is that (besides the fact that I really > don't know what I'm doin' just yet). degredation YES! If you read what Robert says about find( ) being efficient for only 50 or so elements and apply that to your question then most definatly the answer would be yes. "Euphoria's find() is the fastest way to search for a value in a sequence up to about 50 elements. Beyond that, you might consider a hash table (demo\hash.ex) or a binary tree (demo\tree.ex)." Because we dont want a hash table for a hash table, I have divided the data into smaller buckets. (fewer elements per bucket) the one I posted today has been extended even further and thats why I stuck the code in there that shows the hash effectiveness. fewer elements per bucket = faster lookup. 1) EumsBIGDHash sounds about stingy.... 2) EumsHash( ) is pretty fast but certainly not the fastest hash algorythm out there coded in Euphoria beating the routine will require machine code to achieve similar output. EumsHash( ) beats this routine -> in two areas, uniqueness of the hash output and in speed. function rHash(sequence name) integer len len = length(name) return and_bits(name[1] * 8 + name[len] * 2 + len, 127) + 1 end function --R.Craig Like Derek mentioned before, finding a hash that evenly distributes the data is key and EumsHash( ) certainly makes an effort at this but its still slightly uneven distribution. If you consider two things 1) how fast can I load the data into the table and 2) how fast can I retrieve that data then EumsHash( ) isnt too bad. I would like to see what Robert submits to beat the winner seeing as its his language I would be surprised that anyone could beat his routine. Maybe we might get to see this if the winner puts his/her winning's up for it. > Euman and Derek, thanks to you also for your willingness to teach! Hey Im in this thing to learn myself but thanks, Im sortof old school in that I think society only benefits when information is exchanged. Euman