Re: OT: hashing strings

new topic     » goto parent     » topic index » view thread      » older message » newer message
mattlewis said...
useless said...

Ok, i don't understand. Seems to me there's a 0.00000000023283064365386962890625 chance of the first birthday falling on any one day of the 4,294,967,296 day year. And since there's a inverse chance of a birthday falling on another day, the chance of the 2nd birthday falling on the same day as the first is roughly 4,294,967,296 squared. No?

Ok, maybe according to the Birthday page, the chances are based on pairs, so you're saying there's 1:450 chance of even the first two birthdays being the same, in a sample of 10 million on the 4,294,967,296 day year?

This is not good.

Pairs is probably the wrong way to think about it.

It seems that for 10 million pure hashes, to get (probably) no collisions you would need to use 46-bit hash values. So here's an update to that function to generate 64-bit values. These are presented as pairs of 32-bit values such that each hash value is a two element sequence.

include machine.e 
atom mem 
mem = allocate(8) 
 
function hashstr(sequence text) 
	sequence chs 
	sequence val 
	integer j 
	integer k 
	 
	chs = repeat(0,8) 
	text -= ' ' 
	 
	j = 0 
	for i = 1 to length(text) do 
		j += 1 
		if j > length(chs) then 
			j = 1 
		else 
			chs[j] = remainder(length(text), 256) 
		end if 
		chs[j] += text[i] * i 
	end for 
	poke(mem, chs) 
 
        -- Grab the 64-bits of data from RAM	 
	val = peek4u({mem,2}) 
	 
	return val 
end function 
new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu