1. Nested Sequence Hash Bug

The following keys all have minor differences. But they all hash to the same value with HSIEH30 which is extremely unusual. Is this a hash bug or my misunderstanding? Thanks.

include std/hash.e 
 
sequence key1 = {{{},3},{{{{1,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{{2,1,4},{2,0,5},1},{{5,1,4},{2,0,5},1},{ 
{4,1,4},{2,0,5},1},{{6,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} 
 
sequence key2 = {{{},3},{{{{1,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{{2,1,4},{2,0,5},1},{{5,1,4},{2,0,5},1},{ 
{6,1,4},{2,0,5},1},{{4,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} 
 
sequence key3 = {{{},3},{{{{4,1,4},{2,0,5},1},{{5,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{{2,1,4},{2,0,5},1},{ 
{6,1,4},{2,0,5},1},{{1,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} 
 
sequence key4 = {{{},3},{{{{1,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{{2,1,4},{2,0,5},1},{{6,1,4},{2,0,5},1},{ 
{5,1,4},{2,0,5},1},{{4,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} 
 
sequence key5 = {{{},3},{{{{1,1,4},{2,0,5},1},{{4,1,4},{2,0,5},1},{{5,1,4},{2,0,5},1},{{3,1,4},{2,0,5},1},{ 
{2,1,4},{2,0,5},1},{{6,1,4},{2,0,5},1}},3},{{},3},{},{{},3},{{},3},6} 
 
printf(1, "Why are these all 202008777?\n") 
 
? hash(key1, HSIEH30) 
? hash(key2, HSIEH30) 
? hash(key3, HSIEH30) 
? hash(key4, HSIEH30) 
? hash(key5, HSIEH30) 
 
printf(1, "0 is a better algorithm:\n") 
 
? hash(key1, 0) 
? hash(key2, 0) 
? hash(key3, 0) 
? hash(key4, 0) 
? hash(key5, 0) 

Output

Why are these all 202008777? 
202008777 
202008777 
202008777 
202008777 
202008777 
0 is a better algorithm: 
4288307168 
3098639727 
939899006 
1488901913 
1623105407 

new topic     » topic index » view message » categorize

2. Re: Nested Sequence Hash Bug

bryanso said...

The following keys all have minor differences. But they all hash to the same value with HSIEH30 which is extremely unusual. Is this a hash bug or my misunderstanding? Thanks.

I also discovered this bug over the weekend, while working on my own application.

The bug shows itself when using either HSIEH30 or HSIEH32 mode and when the object being hashed is a sequence that contains an even number of sub-sequences.

I'm still trying to isolate the bad C code that's causing this weird effect. Hopefully I'll have a fix in the next day or so.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Nested Sequence Hash Bug

Thanks a lot. I am using map to keep these "states". Unfortunately map uses HSIEH30 by default and that's how I discovered this problem (the symptom is the hash bucket size keeps tripling very, very quickly making map:put slow at first and eventually erroring out with out of memory); amazingly at about the exact same time as you, Derek, what a coincidence!

I'll temporarily modify map.e to use another algorithm to get past this.

Also ADLER30 behaves the same.

Thanks

new topic     » goto parent     » topic index » view message » categorize

4. Re: Nested Sequence Hash Bug

bryanso said...

Thanks a lot. I am using map to keep these "states". Unfortunately map uses HSIEH30 by default and that's how I discovered this problem (the symptom is the hash bucket size keeps tripling very, very quickly making map:put slow at first and eventually erroring out with out of memory); amazingly at about the exact same time as you, Derek, what a coincidence!

I'll temporarily modify map.e to use another algorithm to get past this.

Also ADLER30 behaves the same.

Thanks

It's not the first time the builtin hash function had this kind of problem.

A bit disappointing that fundamentals like these in Euphoria have remained broken for so long..

Spock

new topic     » goto parent     » topic index » view message » categorize

5. Re: Nested Sequence Hash Bug

Spock said...
bryanso said...

Thanks a lot. I am using map to keep these "states". Unfortunately map uses HSIEH30 by default and that's how I discovered this problem (the symptom is the hash bucket size keeps tripling very, very quickly making map:put slow at first and eventually erroring out with out of memory); amazingly at about the exact same time as you, Derek, what a coincidence!

I'll temporarily modify map.e to use another algorithm to get past this.

Also ADLER30 behaves the same.

Thanks

It's not the first time the builtin hash function had this kind of problem.

A bit disappointing that fundamentals like these in Euphoria have remained broken for so long..

Spock

This is why it's important to file tickets.... looking at the SCM log, it appears that the first issue (from 2011) was never fixed. http://scm.openeuphoria.org/hg/euphoria/log/ab99e8bb362e/source/be_runtime.c?revcount=120

new topic     » goto parent     » topic index » view message » categorize

6. Re: Nested Sequence Hash Bug

DerekParnell said...

The bug shows itself when using either HSIEH30 or HSIEH32 mode and when the object being hashed is a sequence that contains an even number of sub-sequences.

I haven't been able to reproduce the issues reported by others here. I'm using v4.1 so maybe they have already been fixed up.

Anyway, the bug I've discovered is really weird. It only happens when processing a sequence that has an even number of sub-sequences, and all the sub-sequences are identical. An odd number doesn't trigger it. An atom instead of a sub-sequence doesn't trigger it. If any of the sub-sequences is different, the bug isn't triggered. I'm starting to think that it might be a deeper issue with the way sequences are being stored in RAM or something. Anyhow, this is just a progress update.

new topic     » goto parent     » topic index » view message » categorize

7. Re: Nested Sequence Hash Bug

DerekParnell said...
DerekParnell said...

The bug shows itself when using either HSIEH30 or HSIEH32 mode and when the object being hashed is a sequence that contains an even number of sub-sequences.

I haven't been able to reproduce the issues reported by others here. I'm using v4.1 so maybe they have already been fixed up.

I think we cleaned up some stuff when making the code work the same on 32 and 64 bit architectures.

Matt

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu