1. Nested Sequence Hash Bug
- Posted by bryanso Jan 04, 2015
- 1220 views
The following keys all have minor differences. But they all hash to the same value with HSIEH30 which is extremely unusual. Is this a hash bug or my misunderstanding? Thanks.
include std/hash.e sequence key1 = {{{},3},{{{{1,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{{2,1,4},{2,0,5},1},{{5,1,4},{2,0,5},1},{ {4,1,4},{2,0,5},1},{{6,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} sequence key2 = {{{},3},{{{{1,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{{2,1,4},{2,0,5},1},{{5,1,4},{2,0,5},1},{ {6,1,4},{2,0,5},1},{{4,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} sequence key3 = {{{},3},{{{{4,1,4},{2,0,5},1},{{5,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{{2,1,4},{2,0,5},1},{ {6,1,4},{2,0,5},1},{{1,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} sequence key4 = {{{},3},{{{{1,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{{2,1,4},{2,0,5},1},{{6,1,4},{2,0,5},1},{ {5,1,4},{2,0,5},1},{{4,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} sequence key5 = {{{},3},{{{{1,1,4},{2,0,5},1},{{4,1,4},{2,0,5},1},{{5,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{ {2,1,4},{2,0,5},1},{{6,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} printf(1, "Why are these all 202008777?\n") ? hash(key1, HSIEH30) ? hash(key2, HSIEH30) ? hash(key3, HSIEH30) ? hash(key4, HSIEH30) ? hash(key5, HSIEH30) printf(1, "0 is a better algorithm:\n") ? hash(key1, 0) ? hash(key2, 0) ? hash(key3, 0) ? hash(key4, 0) ? hash(key5, 0)
Output
Why are these all 202008777? 202008777 202008777 202008777 202008777 202008777 0 is a better algorithm: 4288307168 3098639727 939899006 1488901913 1623105407
2. Re: Nested Sequence Hash Bug
- Posted by DerekParnell (admin) Jan 04, 2015
- 1216 views
The following keys all have minor differences. But they all hash to the same value with HSIEH30 which is extremely unusual. Is this a hash bug or my misunderstanding? Thanks.
I also discovered this bug over the weekend, while working on my own application.
The bug shows itself when using either HSIEH30 or HSIEH32 mode and when the object being hashed is a sequence that contains an even number of sub-sequences.
I'm still trying to isolate the bad C code that's causing this weird effect. Hopefully I'll have a fix in the next day or so.
3. Re: Nested Sequence Hash Bug
- Posted by bryanso Jan 04, 2015
- 1212 views
Thanks a lot. I am using map to keep these "states". Unfortunately map uses HSIEH30 by default and that's how I discovered this problem (the symptom is the hash bucket size keeps tripling very, very quickly making map:put slow at first and eventually erroring out with out of memory); amazingly at about the exact same time as you, Derek, what a coincidence!
I'll temporarily modify map.e to use another algorithm to get past this.
Also ADLER30 behaves the same.
Thanks
4. Re: Nested Sequence Hash Bug
- Posted by Spock Jan 05, 2015
- 1168 views
Thanks a lot. I am using map to keep these "states". Unfortunately map uses HSIEH30 by default and that's how I discovered this problem (the symptom is the hash bucket size keeps tripling very, very quickly making map:put slow at first and eventually erroring out with out of memory); amazingly at about the exact same time as you, Derek, what a coincidence!
I'll temporarily modify map.e to use another algorithm to get past this.
Also ADLER30 behaves the same.
Thanks
It's not the first time the builtin hash function had this kind of problem.
A bit disappointing that fundamentals like these in Euphoria have remained broken for so long..
Spock
5. Re: Nested Sequence Hash Bug
- Posted by jimcbrown (admin) Jan 05, 2015
- 1145 views
Thanks a lot. I am using map to keep these "states". Unfortunately map uses HSIEH30 by default and that's how I discovered this problem (the symptom is the hash bucket size keeps tripling very, very quickly making map:put slow at first and eventually erroring out with out of memory); amazingly at about the exact same time as you, Derek, what a coincidence!
I'll temporarily modify map.e to use another algorithm to get past this.
Also ADLER30 behaves the same.
Thanks
It's not the first time the builtin hash function had this kind of problem.
A bit disappointing that fundamentals like these in Euphoria have remained broken for so long..
Spock
This is why it's important to file tickets.... looking at the SCM log, it appears that the first issue (from 2011) was never fixed. http://scm.openeuphoria.org/hg/euphoria/log/ab99e8bb362e/source/be_runtime.c?revcount=120
6. Re: Nested Sequence Hash Bug
- Posted by DerekParnell (admin) Jan 06, 2015
- 1075 views
The bug shows itself when using either HSIEH30 or HSIEH32 mode and when the object being hashed is a sequence that contains an even number of sub-sequences.
I haven't been able to reproduce the issues reported by others here. I'm using v4.1 so maybe they have already been fixed up.
Anyway, the bug I've discovered is really weird. It only happens when processing a sequence that has an even number of sub-sequences, and all the sub-sequences are identical. An odd number doesn't trigger it. An atom instead of a sub-sequence doesn't trigger it. If any of the sub-sequences is different, the bug isn't triggered. I'm starting to think that it might be a deeper issue with the way sequences are being stored in RAM or something. Anyhow, this is just a progress update.
7. Re: Nested Sequence Hash Bug
- Posted by mattlewis (admin) Jan 06, 2015
- 1118 views
The bug shows itself when using either HSIEH30 or HSIEH32 mode and when the object being hashed is a sequence that contains an even number of sub-sequences.
I haven't been able to reproduce the issues reported by others here. I'm using v4.1 so maybe they have already been fixed up.
I think we cleaned up some stuff when making the code work the same on 32 and 64 bit architectures.
Matt