1. Hash Function

Anybody have a real fast hash function? This would be for storing words
like from a dictionary.

Thanks!

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » topic index » view message » categorize

2. Re: Hash Function

cklester wrote:
> 
> Anybody have a real fast hash function? This would be for storing words
> like from a dictionary.
> 
> Thanks!
> 
> -=ck
> "Programming in a state of EUPHORIA."
> <a
> href="http://www.cklester.com/euphoria/">http://www.cklester.com/euphoria/</a>
> 

Check http://burtleburtle.net/bob/index.html That's where I found some great
information
on hashing, with some good hash functions.

My contest entry is using a variation of the pearson hash found here so I can
avoid
multiplication and shifting.  Since my time is still near the bottom of the list
I don't think that will help you much.

Also check here http://www.partow.net/programming/hashfunctions/
I haven't tested these yet but both pages have some very good information.

Google is your friend!

------------------------------------
Too many freaks, not enough circuses.
j.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Hash Function

Jason Gade wrote:

> Check <a
> href="http://burtleburtle.net/bob/index.html">http://burtleburtle.net/bob/index.html</a>
> That's where I found some great information
> on hashing, with some good hash functions.
> Also check here <a
> href="http://www.partow.net/programming/hashfunctions/">http://www.partow.net/programming/hashfunctions/</a>
> I haven't tested these yet but both pages have some very good information.

Unfortunately, I don't know C. :/

> Google is your friend!

Not all the time. ;)

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view message » categorize

4. Re: Hash Function

Jason Gade wrote:
> 
> cklester wrote:
> > 
> > Anybody have a real fast hash function? This would be for storing words
> > like from a dictionary.
> 
> My contest entry is using a variation of the pearson hash found here so I can
> avoid
> multiplication and shifting.  Since my time is still near the bottom of the
> list
> I don't think that will help you much.

Contest? What contest? :D

Unfortunately, I think the contest is just going to be who can write the
fastest hash algorithm... something with which I have limited experience,
and fortunately wasn't a factor in the first official contest. :)

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view message » categorize

5. Re: Hash Function

cklester wrote:
> 
> Jason Gade wrote:
> 
> > Check <a
> > href="http://burtleburtle.net/bob/index.html">http://burtleburtle.net/bob/index.html</a>
> That's where I found some great information</font></i>
> > on hashing, with some good hash functions.
> > Also check here <a
> > href="http://www.partow.net/programming/hashfunctions/">http://www.partow.net/programming/hashfunctions/</a>
> > I haven't tested these yet but both pages have some very good information.
> 
> Unfortunately, I don't know C. :/
> 
> > Google is your friend!
> 
> Not all the time. ;)
> 
> -=ck
> "Programming in a state of EUPHORIA."
> <a
> href="http://www.cklester.com/euphoria/">http://www.cklester.com/euphoria/</a>
> 

Sorry to say most examples I have found were written in either C or Pascal.
I think the example for the Pearson Hash is written in either pseuod-C or
pseudo-BASIC
so it wasn't too hard for me to translate.

I found lots of examples in LISP as well, which is a beast that I do not yet
understand.


------------------------------------
Too many freaks, not enough circuses.
j.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Hash Function

cklester wrote:
> 
> Jason Gade wrote:
> > 
> > cklester wrote:
> > > 
> > > Anybody have a real fast hash function? This would be for storing words
> > > like from a dictionary.
> > 
> > My contest entry is using a variation of the pearson hash found here so I
> > can avoid
> > multiplication and shifting.  Since my time is still near the bottom of the
> > list
> > I don't think that will help you much.
> 
> Contest? What contest? :D
> 
> Unfortunately, I think the contest is just going to be who can write the
> fastest hash algorithm... something with which I have limited experience,
> and fortunately wasn't a factor in the first official contest. :)
> 

I'll share mine with you when the contest is over. It's fast and near-
optimal.

However, I don't think that the contest is only about hashes - mine uses
14% of the total runtime.

Regards, Alexander Toresson

Shhh! Be vewy quiet! I'm hunting wuntime ewwows!

new topic     » goto parent     » topic index » view message » categorize

7. Re: Hash Function

Alexander Toresson wrote:
> cklester wrote:
> > Unfortunately, I think the contest is just going to be who can write the
> > fastest hash algorithm... something with which I have limited experience,
> > and fortunately wasn't a factor in the first official contest. :)
> > 
> 
> I'll share mine with you when the contest is over. It's fast and near-
> optimal.
> 
> However, I don't think that the contest is only about hashes - mine uses
> 14% of the total runtime.
> 
> Regards, Alexander Toresson
> 
> Shhh! Be vewy quiet! I'm hunting wuntime ewwows!
> 

Yes, I will be very interested to see some of the winning results.  I can't
profile my code
because I'm not registered, so I don't know what my bottlenecks are.

Right now, though, I'm working on correctness more than anything.

After that, my biggest challenge has been figuring out how to (quickly) make a
list of words
sorted by frequency.


------------------------------------
Too many freaks, not enough circuses.
j.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Hash Function

Jason Gade wrote:
[snip]
> After that, my biggest challenge has been figuring out how to (quickly) make a
> list
> of words
> sorted by frequency.
> 

Here comes a tip which may be very helpful:

The solution is quite simple - don't sort more than necessary.
My sorting uses almost no cpu time.
And while writing this I got an idea on how to make it faster :)

Regards, Alexander Toresson

Shhh! Be vewy quiet! I'm hunting wuntime ewwows!

new topic     » goto parent     » topic index » view message » categorize

9. Re: Hash Function

Alexander Toresson wrote:
> Here comes a tip which may be very helpful:
> 
> The solution is quite simple - don't sort more than necessary.
> My sorting uses almost no cpu time.
> And while writing this I got an idea on how to make it faster :)
> 
> Regards, Alexander Toresson
> 
> Shhh! Be vewy quiet! I'm hunting wuntime ewwows!
> 

Yeah, my first and second submission did a sort on the entire data set.  I've
been working with a method of sorting only what I need, but while there has been
a performance gain it hasn't been all that I had hoped for.
I'm currently in the process of re-writing almost from scratch.


j.

new topic     » goto parent     » topic index » view message » categorize

10. Re: Hash Function

Alexander Toresson wrote:
> 
> cklester wrote:
> > 
> > Unfortunately, I think the contest is just going to be who can write the
> > fastest hash algorithm... something with which I have limited experience,
> > and fortunately wasn't a factor in the first official contest. :)
> 
> I'll share mine with you when the contest is over. It's fast and near-
> optimal.
> 
> However, I don't think that the contest is only about hashes - mine uses
> 14% of the total runtime.

If it's not about the hash, then sharing your hash won't affect the
outcome. In fact, it will level the playing field for those of us who
don't have any training/education regarding hash algorithms. Then the
contest is a matter of the word management algorithm.

As it is, I'm guessing Pete is on top because he has the fastest hash.
I could be wrong!!! I don't know for sure. That's why I'm asking. It
could well be that he has an awesome word management algorithm instead
that would blow everybody away no matter what hash was used.

Maybe Derek can shed some light on this.

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view message » categorize

11. Re: Hash Function

Jason Gade wrote:
> 
> Yeah, my first and second submission did a sort on the entire data set.  I've
> been
> working with a method of sorting only what I need, but while there has been a
> performance
> gain it hasn't been all that I had hoped for.
> I'm currently in the process of re-writing almost from scratch.

I grabbed some hashing code from the archives and got a good speed
increase. I'm going to try and incorporate some hash code I found
on the mailing list also... and try using getc() and see what happens.

<whispered aside>
Hey, Lomax, can I borrow your hash algorithm? :)
</whispered aside>

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view message » categorize

12. Re: Hash Function

> Alexander Toresson wrote:

> > However, I don't think that the contest is only about hashes - mine uses
> > 14% of the total runtime.

if your hash is fast then it's logical it doesnt take large percentage of total
runtime. if hash is fast then its speed is not important, if it's slow then its
speed is important.

new topic     » goto parent     » topic index » view message » categorize

13. Re: Hash Function

in case you dind't check yet: search eu archives for "hash". jiri's version is
very simple. version of derek is a little more complicated. don't know which is
faster and better.

new topic     » goto parent     » topic index » view message » categorize

14. Re: Hash Function

On Thu, 11 Nov 2004 10:35:27 -0800, cklester <guest at RapidEuphoria.com>
wrote:

><whispered aside>
>Hey, Lomax, can I borrow your hash algorithm? :)
></whispered aside>
>
Sure:

Pete
PS It's written in invisible ink. Hold over a candle to reveal blink)

new topic     » goto parent     » topic index » view message » categorize

15. Re: Hash Function

Pete Lomax wrote:
> 
> PS It's written in invisible ink. Hold over a candle to reveal blink)

I didn't see anything, but my office chair caught on fire. :P~~

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view message » categorize

16. Re: Hash Function

On Thu, 11 Nov 2004 09:11:30 -0800, cklester <guest at rapideuphoria.com> wrote:
> posted by: cklester <cklester at yahoo.com>
>
> Anybody have a real fast hash function? This would be for storing words
> like from a dictionary.

Or maybe to index words to see how many are unique?

I suggest you experiment.
What I did to develop my hash function was take a big file (like wrnpc11.txt)
and run it my algorithm through it. After the program printed out the
results, it would iterate through the hash table, counting the density
at each point.
(Every 100 buckets, it would display how many entries there were, then
reset the counter)
I'd end up with a lot of numbers underneath the results, which I'd
redirect into a file.
Copy the numbers into Excel, and plot a line graph. The flatter the
line is, the better the hash function (because the distribution is
even). You can see trends, too - if there are much more entries at the
beginning of the table, you need to generate some higher values to
flatten out the line.

I started off with an adaption of a hash algorithm written in C (v1)
Some things didn't work too well in that, because it's not easy to do
bitwise shifts, so I kept 'fiddling' until I got a nice graph that
didn't take too long to compute. It wasn't all that good, so I kept
trying new things. Enhancements to the hash got me to v2.3, then I
applied "the secret weapon(tm)" which I unfortunately can't share with
you until November... and it produced the lovely flat graph you see in
v3.1.

So experiment a little, using always the same input file so you know
that the hashes are of the same thing, and you'll get a nice flat
graph - which is very important for getting an efficient distribution
of hash values.

I doubt that there could be any improvement upon my hash function.
It's *very* fast, and *very* well distributed. Unfortunately, I'm in
14th(?) place because a whopping 60% of the time is spent somewhere
else, and I haven't figured out a good way to get around it. A hash
function isn't everything.
--
MrTrick

new topic     » goto parent     » topic index » view message » categorize

17. Re: Hash Function

Patrick Barnes wrote:
[snip] 
> 
> I doubt that there could be any improvement upon my hash function.
> It's *very* fast, and *very* well distributed. 

Using the War&Peace file and my hashing algo, I got an average distribution
of 1.053133 tokens per bucket with a maximum of 4.

Using the spelling checker dictionary, I get avg of 1.114220 with a max
of 5.

> Unfortunately, I'm in
> 14th(?) place because a whopping 60% of the time is spent somewhere
> else, and I haven't figured out a good way to get around it. A hash
> function isn't everything.

That's right. The hashing algo isn't everything. Mine takes less than
10% of the program's time. About 25% is taken up with token recognition.

I still have some areas to tweak. 
 
-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

18. Re: Hash Function

Derek Parnell wrote:
> 
> Using the War&Peace file and my hashing algo, I got an average distribution
> of 1.053133 tokens per bucket with a maximum of 4.

I don't know what my average is, but I get a maximum of 40. That BYTES! I
mean BITES!

> That's right. The hashing algo isn't everything. Mine takes less than
> 10% of the program's time. About 25% is taken up with token recognition.

My biggest time waster is the hashing stuff. :/

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu