1. Hash function anomaly

I have Eu 4.0.0 so I don´t know if this issue has been fixed in a later update. I was adding a checksum field to a custom database (stored as a flat CSV file) to ensure data integrity when I noticed that the hash() function doesn't work as (I) expected (usually when the arguments are strings), eg:

include std\hash.e 
 
? hash({"a", ""}, FLETCHER32) -- 1627480323 
? hash({"", "a"}, FLETCHER32) -- 1627480323 
 
? hash({"a", "b", "c"}, FLETCHER32) -- 637740548 
? hash({"c", "a", "b"}, FLETCHER32) -- 637740548 
 
?9/0 

Surely these arguments should hash to a different value. Wikipedia says that "The Fletcher checksum is an algorithm for computing a position-dependent checksum". This is clearly not the case in this implementation. Worse, Hsieh and Adler are similarly affected.

For peace of mind, I have to be confident that *any* change to the data will be detectable (within the probability limit of the algorithm). Only the cyclic variant correctly handled these cases so I'll have to use that to hash the data. Still, the dispersion quality depends on the chosen parameter value. Does someone have an idea as to what might be a good value?

Spock

new topic     » topic index » view message » categorize

2. Re: Hash function anomaly

There was one hash bug ticket fixed between 4.0.0 and 4.0.1, ticket:593 - but this looks like it's unrelated to the problem you're reporting.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Hash function anomaly

jimcbrown said...

There was one hash bug ticket fixed between 4.0.0 and 4.0.1, ticket:593 - but this looks like it's unrelated to the problem you're reporting.

I've confirmed the results with 4.0.3 and 4.1 development code.

Matt

new topic     » goto parent     » topic index » view message » categorize

4. Re: Hash function anomaly

I'm looking at an old source tree (4.0b3) and in the calc_hash function, for the sequence path, the prev hash is added to the tf hash:

			tf.ieee_uint.a += prev.ieee_uint.b; 
			tf.ieee_uint.b += prev.ieee_uint.a; 
This makes the same hash result regardless of the order of the elements (since addition is associative.)
Edit: I may be incorrect. The hash is already chained, and the addition only affects the Nth and Nth-from-last elements. I can't explain why it doesn't work.
Instead the hash calculation should be chained (untested!):
			for(tfi = 0; tfi < 8; tfi++) 
			{ 
				if (prev.ieee_char[tfi] == 0) 
					prev.ieee_char[tfi] = (unsigned char)(tfi * 171 + 1); 
				lHashValue = rol(lHashValue, 3) ^ (prev.ieee_char[tfi] + (tfi + 1) << 8); 
			} 
			for(tfi = 0; tfi < 8; tfi++) 
			{ 
				if (tf.ieee_char[tfi] == 0) 
					tf.ieee_char[tfi] = (unsigned char)(tfi * 171 + 1); 
				lHashValue = rol(lHashValue, 3) ^ (tf.ieee_char[tfi] + (tfi + 1) << 8); 
			} 
 

new topic     » goto parent     » topic index » view message » categorize

5. Re: Hash function anomaly

Spock said...

I have Eu 4.0.0 so I don´t know if this issue has been fixed in a later update. I was adding a checksum field to a custom database (stored as a flat CSV file) to ensure data integrity when I noticed that the hash() function doesn't work as (I) expected (usually when the arguments are strings), eg:

I'll look into this. In the meantime, try 0 instead of FLETCHER32 as the algorigthm.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Hash function anomaly

DerekParnell said...
Spock said...

I have Eu 4.0.0 so I don´t know if this issue has been fixed in a later update. I was adding a checksum field to a custom database (stored as a flat CSV file) to ensure data integrity when I noticed that the hash() function doesn't work as (I) expected (usually when the arguments are strings), eg:

I'll look into this. In the meantime, try the HSIEH32 algorithm instead.

Different numbers, same problem. Passing 0 (zero) gives different values.

It only seems to be when you have subsequences. Slightly modified with same results:

? hash({{"a"}, "b", "c"}, FLETCHER32) -- 637740549  
? hash({{"c"}, "a", "b"}, FLETCHER32) -- 637740549 

Matt

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu