Re: OT: hashing strings
- Posted by DerekParnell (admin) Aug 05, 2009
- 1227 views
Ok, i don't understand. Seems to me there's a 0.00000000023283064365386962890625 chance of the first birthday falling on any one day of the 4,294,967,296 day year. And since there's a inverse chance of a birthday falling on another day, the chance of the 2nd birthday falling on the same day as the first is roughly 4,294,967,296 squared. No?
Ok, maybe according to the Birthday page, the chances are based on pairs, so you're saying there's 1:450 chance of even the first two birthdays being the same, in a sample of 10 million on the 4,294,967,296 day year?
This is not good.
Pairs is probably the wrong way to think about it.
It seems that for 10 million pure hashes, to get (probably) no collisions you would need to use 46-bit hash values. So here's an update to that function to generate 64-bit values. These are presented as pairs of 32-bit values such that each hash value is a two element sequence.
include machine.e atom mem mem = allocate(8) function hashstr(sequence text) sequence chs sequence val integer j integer k chs = repeat(0,8) text -= ' ' j = 0 for i = 1 to length(text) do j += 1 if j > length(chs) then j = 1 else chs[j] = remainder(length(text), 256) end if chs[j] += text[i] * i end for poke(mem, chs) -- Grab the 64-bits of data from RAM val = peek4u({mem,2}) return val end function