Re: Hash Function

new topic     » goto parent     » topic index » view thread      » older message » newer message

On Thu, 11 Nov 2004 09:11:30 -0800, cklester <guest at rapideuphoria.com> wrote:
> posted by: cklester <cklester at yahoo.com>
>
> Anybody have a real fast hash function? This would be for storing words
> like from a dictionary.

Or maybe to index words to see how many are unique?

I suggest you experiment.
What I did to develop my hash function was take a big file (like wrnpc11.txt)
and run it my algorithm through it. After the program printed out the
results, it would iterate through the hash table, counting the density
at each point.
(Every 100 buckets, it would display how many entries there were, then
reset the counter)
I'd end up with a lot of numbers underneath the results, which I'd
redirect into a file.
Copy the numbers into Excel, and plot a line graph. The flatter the
line is, the better the hash function (because the distribution is
even). You can see trends, too - if there are much more entries at the
beginning of the table, you need to generate some higher values to
flatten out the line.

I started off with an adaption of a hash algorithm written in C (v1)
Some things didn't work too well in that, because it's not easy to do
bitwise shifts, so I kept 'fiddling' until I got a nice graph that
didn't take too long to compute. It wasn't all that good, so I kept
trying new things. Enhancements to the hash got me to v2.3, then I
applied "the secret weapon(tm)" which I unfortunately can't share with
you until November... and it produced the lovely flat graph you see in
v3.1.

So experiment a little, using always the same input file so you know
that the hashes are of the same thing, and you'll get a nice flat
graph - which is very important for getting an efficient distribution
of hash values.

I doubt that there could be any improvement upon my hash function.
It's *very* fast, and *very* well distributed. Unfortunately, I'm in
14th(?) place because a whopping 60% of the time is spent somewhere
else, and I haven't figured out a good way to get around it. A hash
function isn't everything.
--
MrTrick

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu