1. Hash function anomaly
- Posted by Spock Sep 29, 2011
- 1415 views
I have Eu 4.0.0 so I don´t know if this issue has been fixed in a later update. I was adding a checksum field to a custom database (stored as a flat CSV file) to ensure data integrity when I noticed that the hash() function doesn't work as (I) expected (usually when the arguments are strings), eg:
include std\hash.e ? hash({"a", ""}, FLETCHER32) -- 1627480323 ? hash({"", "a"}, FLETCHER32) -- 1627480323 ? hash({"a", "b", "c"}, FLETCHER32) -- 637740548 ? hash({"c", "a", "b"}, FLETCHER32) -- 637740548 ?9/0
Surely these arguments should hash to a different value. Wikipedia says that "The Fletcher checksum is an algorithm for computing a position-dependent checksum". This is clearly not the case in this implementation. Worse, Hsieh and Adler are similarly affected.
For peace of mind, I have to be confident that *any* change to the data will be detectable (within the probability limit of the algorithm). Only the cyclic variant correctly handled these cases so I'll have to use that to hash the data. Still, the dispersion quality depends on the chosen parameter value. Does someone have an idea as to what might be a good value?
Spock
2. Re: Hash function anomaly
- Posted by jimcbrown (admin) Sep 29, 2011
- 1348 views
There was one hash bug ticket fixed between 4.0.0 and 4.0.1, ticket:593 - but this looks like it's unrelated to the problem you're reporting.
3. Re: Hash function anomaly
- Posted by mattlewis (admin) Sep 29, 2011
- 1240 views
There was one hash bug ticket fixed between 4.0.0 and 4.0.1, ticket:593 - but this looks like it's unrelated to the problem you're reporting.
I've confirmed the results with 4.0.3 and 4.1 development code.
Matt
4. Re: Hash function anomaly
- Posted by PeteE Sep 29, 2011
- 1239 views
I'm looking at an old source tree (4.0b3) and in the calc_hash function, for the sequence path, the prev hash is added to the tf hash:
tf.ieee_uint.a += prev.ieee_uint.b; tf.ieee_uint.b += prev.ieee_uint.a;This makes the same hash result regardless of the order of the elements (since addition is associative.)
Edit: I may be incorrect. The hash is already chained, and the addition only affects the Nth and Nth-from-last elements. I can't explain why it doesn't work.
Instead the hash calculation should be chained (untested!):
for(tfi = 0; tfi < 8; tfi++) { if (prev.ieee_char[tfi] == 0) prev.ieee_char[tfi] = (unsigned char)(tfi * 171 + 1); lHashValue = rol(lHashValue, 3) ^ (prev.ieee_char[tfi] + (tfi + 1) << 8); } for(tfi = 0; tfi < 8; tfi++) { if (tf.ieee_char[tfi] == 0) tf.ieee_char[tfi] = (unsigned char)(tfi * 171 + 1); lHashValue = rol(lHashValue, 3) ^ (tf.ieee_char[tfi] + (tfi + 1) << 8); }
5. Re: Hash function anomaly
- Posted by DerekParnell (admin) Sep 29, 2011
- 1248 views
I have Eu 4.0.0 so I don´t know if this issue has been fixed in a later update. I was adding a checksum field to a custom database (stored as a flat CSV file) to ensure data integrity when I noticed that the hash() function doesn't work as (I) expected (usually when the arguments are strings), eg:
I'll look into this. In the meantime, try 0 instead of FLETCHER32 as the algorigthm.
6. Re: Hash function anomaly
- Posted by mattlewis (admin) Sep 29, 2011
- 1260 views
I have Eu 4.0.0 so I don´t know if this issue has been fixed in a later update. I was adding a checksum field to a custom database (stored as a flat CSV file) to ensure data integrity when I noticed that the hash() function doesn't work as (I) expected (usually when the arguments are strings), eg:
I'll look into this. In the meantime, try the HSIEH32 algorithm instead.
Different numbers, same problem. Passing 0 (zero) gives different values.
It only seems to be when you have subsequences. Slightly modified with same results:
? hash({{"a"}, "b", "c"}, FLETCHER32) -- 637740549 ? hash({{"c"}, "a", "b"}, FLETCHER32) -- 637740549
Matt