Hash function anomaly
- Posted by Spock Sep 29, 2011
- 1414 views
I have Eu 4.0.0 so I donĀ“t know if this issue has been fixed in a later update. I was adding a checksum field to a custom database (stored as a flat CSV file) to ensure data integrity when I noticed that the hash() function doesn't work as (I) expected (usually when the arguments are strings), eg:
include std\hash.e ? hash({"a", ""}, FLETCHER32) -- 1627480323 ? hash({"", "a"}, FLETCHER32) -- 1627480323 ? hash({"a", "b", "c"}, FLETCHER32) -- 637740548 ? hash({"c", "a", "b"}, FLETCHER32) -- 637740548 ?9/0
Surely these arguments should hash to a different value. Wikipedia says that "The Fletcher checksum is an algorithm for computing a position-dependent checksum". This is clearly not the case in this implementation. Worse, Hsieh and Adler are similarly affected.
For peace of mind, I have to be confident that *any* change to the data will be detectable (within the probability limit of the algorithm). Only the cyclic variant correctly handled these cases so I'll have to use that to hash the data. Still, the dispersion quality depends on the chosen parameter value. Does someone have an idea as to what might be a good value?
Spock