Re: OT: hashing strings
- Posted by prickle Aug 06, 2009
- 1205 views
It seems that for 10 million pure hashes, to get (probably) no collisions you would need to use 46-bit hash values.
I'm not sure that's true.
You're right, because we're ignoring a key assumption, which is that the hash function is uniformly distributed over the hash space, which is not a given at all.
Matt
Yes, real-life performance would be somewhat worse. Even with a very large bitfield for the key I expect it is difficult to guarantee that any given random hash function solves uniquely over a set of data anyway.
Despite all this, at 32 bits the average number of keys per hash would remain very low. Also, any lookup table it used would be quite large. It's kind of hard to say anything more without knowing how it will be used.
Nick