Re: OT: hashing strings

new topic     » goto parent     » topic index » view thread      » older message » newer message
useless said...

Ok, i don't understand. Seems to me there's a 0.00000000023283064365386962890625 chance of the first birthday falling on any one day of the 4,294,967,296 day year. And since there's a inverse chance of a birthday falling on another day, the chance of the 2nd birthday falling on the same day as the first is roughly 4,294,967,296 squared. No?

Ok, maybe according to the Birthday page, the chances are based on pairs, so you're saying there's 1:450 chance of even the first two birthdays being the same, in a sample of 10 million on the 4,294,967,296 day year?

This is not good.

Pairs is probably the wrong way to think about it. Let's imagine that we're going through each of the 300,000 strings. We can calculate the probability of a collision with that many items by considering the number of empty slots. While the number of slots is a lot, we're also talking about a lot of items to be hashed.

Remember, you didn't ask for a good hash function, but a perfect hash function. Here is some (eu 4.0) code that calculates the probabilities. It reports at each 10,000 increment. I hard coded in your values:

include std/console.e 
 
-- atom space = prompt_number("How many slots?", { 1, power( 2, 51 )}) 
-- atom items = prompt_number("How many items?", { 1, power( 2, 51 )}) 
atom space = power( 2, 32 ) 
atom items = 300_000 
 
atom probability = 1.0 
atom empty = space 
for i = 1 to items - 1 do 
	empty -= 1 
	probability *= empty / space 
	if not remainder( i, 10_000 ) then 
		printf(1, "The probability of no collisions for %d items is %0.16f\n", i & probability ) 
	end if 
end for 

The results are as follows:

$ eui birthday.ex 
The probability of no collisions for 10000 items is 0.9884248086525800 
The probability of no collisions for 20000 items is 0.9544991471483176 
The probability of no collisions for 30000 items is 0.9005248436101611 
The probability of no collisions for 40000 items is 0.8300496590491797 
The probability of no collisions for 50000 items is 0.7474818501475397 
The probability of no collisions for 60000 items is 0.6576357350322105 
The probability of no collisions for 70000 items is 0.5652730868057533 
The probability of no collisions for 80000 items is 0.4747001207863105 
The probability of no collisions for 90000 items is 0.3894650411345447 
The probability of no collisions for 100000 items is 0.3121804493495650 
The probability of no collisions for 110000 items is 0.2444730439793339 
The probability of no collisions for 120000 items is 0.1870442204532125 
The probability of no collisions for 130000 items is 0.1398123764617983 
The probability of no collisions for 140000 items is 0.1021021599584672 
The probability of no collisions for 150000 items is 0.0728470896214529 
The probability of no collisions for 160000 items is 0.0507782140487439 
The probability of no collisions for 170000 items is 0.0345804432150361 
The probability of no collisions for 180000 items is 0.0230076139686639 
The probability of no collisions for 190000 items is 0.0149554842056866 
The probability of no collisions for 200000 items is 0.0094976731827056 
The probability of no collisions for 210000 items is 0.0058928011581831 
The probability of no collisions for 220000 items is 0.0035720222185544 
The probability of no collisions for 230000 items is 0.0021154087285301 
The probability of no collisions for 240000 items is 0.0012239457946743 
The probability of no collisions for 250000 items is 0.0006918593356940 
The probability of no collisions for 260000 items is 0.0003820860057772 
The probability of no collisions for 270000 items is 0.0002061541771879 
The probability of no collisions for 280000 items is 0.0001086702846204 
The probability of no collisions for 290000 items is 0.0000559650768343 
Matt

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu