Re: Hash Function
- Posted by Patrick Barnes <mrtrick at gmail.com> Nov 11, 2004
- 483 views
On Thu, 11 Nov 2004 09:11:30 -0800, cklester <guest at rapideuphoria.com> wrote: > posted by: cklester <cklester at yahoo.com> > > Anybody have a real fast hash function? This would be for storing words > like from a dictionary. Or maybe to index words to see how many are unique? I suggest you experiment. What I did to develop my hash function was take a big file (like wrnpc11.txt) and run it my algorithm through it. After the program printed out the results, it would iterate through the hash table, counting the density at each point. (Every 100 buckets, it would display how many entries there were, then reset the counter) I'd end up with a lot of numbers underneath the results, which I'd redirect into a file. Copy the numbers into Excel, and plot a line graph. The flatter the line is, the better the hash function (because the distribution is even). You can see trends, too - if there are much more entries at the beginning of the table, you need to generate some higher values to flatten out the line. I started off with an adaption of a hash algorithm written in C (v1) Some things didn't work too well in that, because it's not easy to do bitwise shifts, so I kept 'fiddling' until I got a nice graph that didn't take too long to compute. It wasn't all that good, so I kept trying new things. Enhancements to the hash got me to v2.3, then I applied "the secret weapon(tm)" which I unfortunately can't share with you until November... and it produced the lovely flat graph you see in v3.1. So experiment a little, using always the same input file so you know that the hashes are of the same thing, and you'll get a nice flat graph - which is very important for getting an efficient distribution of hash values. I doubt that there could be any improvement upon my hash function. It's *very* fast, and *very* well distributed. Unfortunately, I'm in 14th(?) place because a whopping 60% of the time is spent somewhere else, and I haven't figured out a good way to get around it. A hash function isn't everything. -- MrTrick