OpenEuphoria: Forum: Re: OT: hashing strings

Re: OT: hashing strings

new topic » goto parent » topic index » view thread » older message » newer message

Posted by useless Aug 05, 2009
1230 views

prickle said...

useless said...

Why won't 300,000 unique strings map into a 4,294,967,296 space with no duplicates?!

Kat

I'm not saying they cant, just the probability is that they wont if one's hash function is of a random nature.

See the Birthday paradox. Say we have a 4,294,967,296 day year. We then consider a string having a birthday (hash value) on any given day. It bails out very quickly despite the large number of keys. 32 bits actually holds out quite well on large datasets if one chooses an evenly distributed hash function. The CRC32 algorithm might be a good choice. Rolling sums can also work well. Nevertheless, you will not be able to avoid having many references in each bucket. It should still be pretty quick

Cheers,

Nick.

Ok, i don't understand. Seems to me there's a 0.00000000023283064365386962890625 chance of the first birthday falling on any one day of the 4,294,967,296 day year. And since there's a inverse chance of a birthday falling on another day, the chance of the 2nd birthday falling on the same day as the first is roughly 4,294,967,296 squared. No?

Ok, maybe according to the Birthday page, the chances are based on pairs, so you're saying there's 1:450 chance of even the first two birthdays being the same, in a sample of 10 million on the 4,294,967,296 day year?

This is not good.

Kat

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: OT: hashing strings

Search

Include:

Quick Links

User menu

Misc Menu