Re: OT: hashing strings
- Posted by mattlewis (admin) Aug 05, 2009
- 1406 views
useless said...
prickle said...
useless said...
What's your opinion of the fastest hash of 10million unique strings, each being 50char to 400char long, hashing down to a 32bit (or smaller) positive integer, preferably with unique hashes (or one result per bucket)? Preferably using no math.
The probability of a collision given any 32-bit key with a perfectly evenly distributed hash function on random data becomes 99.9972% at 300,000 unique strings. It's pretty much certain after that. Math Rocks!
Cheers,
Nick
Why won't 300,000 unique strings map into a 4,294,967,296 space with no duplicates?!
They can map uniquely. That doesn't mean that it's easy to find a hash function to do it. See, for example:
- http://cmph.sourceforge.net/
- http://burtleburtle.net/bob/hash/perfect.html
- http://en.wikipedia.org/wiki/Perfect_hash_function
Matt