Euphoria
Ticket #259:
filesys:checksum() fails
-
Reported by
mattlewis
Oct 29, 2010
When comparing files that are identical (or even the same file with multiple calls) and a size > 0, the checksums don't match. When size = 1, checksum correctly identifies the files as the same.
Failing tests were added to t_filesys.e in svn:3664.
Details
1. Comment by jimcbrown
Oct 30, 2010
This appears to be a bug in hash(), demonstrated by this example program:
atom jx = 2514801793
sequence data = "itial_direc"
jx = hash(jx, data) ? jx
jx = hash(jx, data) ? jx
jx = hash(jx, data) ? jx
jx = hash(jx, data) ? jx
2. Comment by jimcbrown
Oct 30, 2010
I changed checksum() not to use hash() in svn:3691. This works around the bug, but hash() itself still needs to be fixed. Leaving this bug open until hash() is fixed (and then we can revert filesys.e ...)
3. Comment by DerekParnell
Oct 30, 2010
That example is not a bug because each time you call hash(), you are supplying it with a different hashing key, so the result is different for each call.
What were you expecting?
4. Comment by jimcbrown
Oct 30, 2010
I was NOT expecting to see this:
first run $ eui example.e 1763789629 1404177190 1379719974 1398610726
second run $ eui example.e 1763789629 1182927653 1192870693 1189561125
third run $ eui example.e 1763789629 3759278887 3760266023 3774716711
fourth run $ eui example.e 1763789629 1074924326 1101183782 1104395046
Why does the result of hash() change each time on the second call? The input is identical.
5. Comment by DerekParnell
Oct 30, 2010
When I run this I get ...
c:\temp>eui hasher
1763789629
2620525367
2619396919
2626900791
c:\temp>eui hasher
1763789629
2620525367
2619396919
2626900791
c:\temp>eui hasher
1763789629
2620525367
2619396919
2626900791
c:\temp>eui hasher
1763789629
2620525367
2619396919
2626900791
But there is a problem with hash(). Still looking into it.
6. Comment by jimcbrown
Oct 30, 2010
Ok, let me fix up the formatting...
I was NOT expecting to see this:
-- first run
$ eui example.e
1763789629
1404177190
1379719974
1398610726
--second run
$ eui example.e
1763789629
1182927653
1192870693
1189561125
--third run
$ eui example.e
1763789629
3759278887
3760266023
3774716711
--fourth run
$ eui example.e
1763789629
1074924326
1101183782
1104395046
Why does the result of hash() change each time on the second call? The input is identical.
7. Comment by DerekParnell
Oct 30, 2010
Because there's a bug in hash()
But you see that when I ran your example, the output was consistent. However, other tests I've done just now do show different outputs for the same input and as you say, that is not right.
8. Comment by DerekParnell
Oct 30, 2010
The bug is actually in the parser or backend. The problem happens when assigning the output from a function to one of the parameters to that function call.
eg. jx = func(jx)
In some circumstances, the returned value is not assigned correctly, or the something like that.
The fix for this ticket will be a workaround in filesys.e:checksum() to avoid this type of construct, but another ticket has to be created for the underlying issue.
9. Comment by jimcbrown
Oct 30, 2010
Confirmed. New filesys.e passes all tests.