Re: Hash Table Lookup

new topic     » goto parent     » topic index » view thread      » older message » newer message

----- Original Message ----- 
From: "C. K. Lester" <cklester at yahoo.com>
To: "EUforum" <EUforum at topica.com>

> How degraded is the efficiency if you only make a hash table based on 
> length of the text? For instance,
> 
> hash_table = {
>               { "a" , "I" },
>               { "ad" , "ah" , "am" , "an" , "as" , ... }...
>              }
> 
> I hope I'm using the right terminology! That would be a hash table, 
> right?
> 
> I can't seem to build a length-based table as quickly as the 
> EumsBIGDHash gets built! Why is that (besides the fact that I really 
> don't know what I'm doin' just yet).


degredation YES! 

If you read what Robert says about find( ) being efficient for only 50 or so
elements
and apply that to your question then most definatly the answer would be yes.

"Euphoria's find() is the fastest way to search for a value in a sequence up to
about 50 elements.
Beyond that, you might consider a hash table (demo\hash.ex) or a binary tree
(demo\tree.ex)."

Because we dont want a hash table for a hash table, I have divided the data into
smaller buckets.
(fewer elements per bucket) the one I posted today has been extended even
further and thats
why I stuck the code in there that shows the hash effectiveness. 

fewer elements per bucket = faster lookup.

1) EumsBIGDHash sounds about stingy.... blink
2) EumsHash( ) is pretty fast but certainly not the fastest hash algorythm out
there coded in Euphoria
    beating the routine will require machine code to achieve similar output.

EumsHash( ) beats this routine -> in two areas, uniqueness of the hash output
and in speed.

function rHash(sequence name)
    integer len
    len = length(name)
    return and_bits(name[1] * 8 + name[len] * 2 + len, 127) + 1
end function
--R.Craig

Like Derek mentioned before, finding a hash that evenly distributes the data is
key and EumsHash( )
certainly makes an effort at this but its still slightly uneven distribution. If
you consider two things
1) how fast can I load the data into the table and 2) how fast can I retrieve
that data then EumsHash( )
isnt too bad. I would like to see what Robert submits to beat the winner seeing
as its his language I would
be surprised that anyone could beat his routine. Maybe we might get to see this
if the winner puts his/her
winning's up for it.

> Euman and Derek, thanks to you also for your willingness to teach!

Hey Im in this thing to learn myself but thanks, Im sortof old school in that I
think society only benefits
when information is exchanged. 

Euman

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu