1. Hash Function
- Posted by cklester <cklester at yahoo.com> Nov 11, 2004
- 531 views
Anybody have a real fast hash function? This would be for storing words like from a dictionary. Thanks! -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/
2. Re: Hash Function
- Posted by Jason Gade <jaygade at yahoo.com> Nov 11, 2004
- 492 views
cklester wrote: > > Anybody have a real fast hash function? This would be for storing words > like from a dictionary. > > Thanks! > > -=ck > "Programming in a state of EUPHORIA." > <a > href="http://www.cklester.com/euphoria/">http://www.cklester.com/euphoria/</a> > Check http://burtleburtle.net/bob/index.html That's where I found some great information on hashing, with some good hash functions. My contest entry is using a variation of the pearson hash found here so I can avoid multiplication and shifting. Since my time is still near the bottom of the list I don't think that will help you much. Also check here http://www.partow.net/programming/hashfunctions/ I haven't tested these yet but both pages have some very good information. Google is your friend! ------------------------------------ Too many freaks, not enough circuses. j.
3. Re: Hash Function
- Posted by cklester <cklester at yahoo.com> Nov 11, 2004
- 467 views
Jason Gade wrote: > Check <a > href="http://burtleburtle.net/bob/index.html">http://burtleburtle.net/bob/index.html</a> > That's where I found some great information > on hashing, with some good hash functions. > Also check here <a > href="http://www.partow.net/programming/hashfunctions/">http://www.partow.net/programming/hashfunctions/</a> > I haven't tested these yet but both pages have some very good information. Unfortunately, I don't know C. :/ > Google is your friend! Not all the time. ;) -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/
4. Re: Hash Function
- Posted by cklester <cklester at yahoo.com> Nov 11, 2004
- 481 views
Jason Gade wrote: > > cklester wrote: > > > > Anybody have a real fast hash function? This would be for storing words > > like from a dictionary. > > My contest entry is using a variation of the pearson hash found here so I can > avoid > multiplication and shifting. Since my time is still near the bottom of the > list > I don't think that will help you much. Contest? What contest? :D Unfortunately, I think the contest is just going to be who can write the fastest hash algorithm... something with which I have limited experience, and fortunately wasn't a factor in the first official contest. :) -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/
5. Re: Hash Function
- Posted by Jason Gade <jaygade at yahoo.com> Nov 11, 2004
- 483 views
cklester wrote: > > Jason Gade wrote: > > > Check <a > > href="http://burtleburtle.net/bob/index.html">http://burtleburtle.net/bob/index.html</a> > That's where I found some great information</font></i> > > on hashing, with some good hash functions. > > Also check here <a > > href="http://www.partow.net/programming/hashfunctions/">http://www.partow.net/programming/hashfunctions/</a> > > I haven't tested these yet but both pages have some very good information. > > Unfortunately, I don't know C. :/ > > > Google is your friend! > > Not all the time. ;) > > -=ck > "Programming in a state of EUPHORIA." > <a > href="http://www.cklester.com/euphoria/">http://www.cklester.com/euphoria/</a> > Sorry to say most examples I have found were written in either C or Pascal. I think the example for the Pearson Hash is written in either pseuod-C or pseudo-BASIC so it wasn't too hard for me to translate. I found lots of examples in LISP as well, which is a beast that I do not yet understand. ------------------------------------ Too many freaks, not enough circuses. j.
6. Re: Hash Function
- Posted by Alexander Toresson <toressonodakra at swipnet.se> Nov 11, 2004
- 500 views
cklester wrote: > > Jason Gade wrote: > > > > cklester wrote: > > > > > > Anybody have a real fast hash function? This would be for storing words > > > like from a dictionary. > > > > My contest entry is using a variation of the pearson hash found here so I > > can avoid > > multiplication and shifting. Since my time is still near the bottom of the > > list > > I don't think that will help you much. > > Contest? What contest? :D > > Unfortunately, I think the contest is just going to be who can write the > fastest hash algorithm... something with which I have limited experience, > and fortunately wasn't a factor in the first official contest. :) > I'll share mine with you when the contest is over. It's fast and near- optimal. However, I don't think that the contest is only about hashes - mine uses 14% of the total runtime. Regards, Alexander Toresson Shhh! Be vewy quiet! I'm hunting wuntime ewwows!
7. Re: Hash Function
- Posted by Jason Gade <jaygade at yahoo.com> Nov 11, 2004
- 490 views
Alexander Toresson wrote: > cklester wrote: > > Unfortunately, I think the contest is just going to be who can write the > > fastest hash algorithm... something with which I have limited experience, > > and fortunately wasn't a factor in the first official contest. :) > > > > I'll share mine with you when the contest is over. It's fast and near- > optimal. > > However, I don't think that the contest is only about hashes - mine uses > 14% of the total runtime. > > Regards, Alexander Toresson > > Shhh! Be vewy quiet! I'm hunting wuntime ewwows! > Yes, I will be very interested to see some of the winning results. I can't profile my code because I'm not registered, so I don't know what my bottlenecks are. Right now, though, I'm working on correctness more than anything. After that, my biggest challenge has been figuring out how to (quickly) make a list of words sorted by frequency. ------------------------------------ Too many freaks, not enough circuses. j.
8. Re: Hash Function
- Posted by Alexander Toresson <toressonodakra at swipnet.se> Nov 11, 2004
- 483 views
Jason Gade wrote: [snip] > After that, my biggest challenge has been figuring out how to (quickly) make a > list > of words > sorted by frequency. > Here comes a tip which may be very helpful: The solution is quite simple - don't sort more than necessary. My sorting uses almost no cpu time. And while writing this I got an idea on how to make it faster :) Regards, Alexander Toresson Shhh! Be vewy quiet! I'm hunting wuntime ewwows!
9. Re: Hash Function
- Posted by Jason Gade <jaygade at yahoo.com> Nov 11, 2004
- 477 views
Alexander Toresson wrote: > Here comes a tip which may be very helpful: > > The solution is quite simple - don't sort more than necessary. > My sorting uses almost no cpu time. > And while writing this I got an idea on how to make it faster :) > > Regards, Alexander Toresson > > Shhh! Be vewy quiet! I'm hunting wuntime ewwows! > Yeah, my first and second submission did a sort on the entire data set. I've been working with a method of sorting only what I need, but while there has been a performance gain it hasn't been all that I had hoped for. I'm currently in the process of re-writing almost from scratch. j.
10. Re: Hash Function
- Posted by cklester <cklester at yahoo.com> Nov 11, 2004
- 473 views
Alexander Toresson wrote: > > cklester wrote: > > > > Unfortunately, I think the contest is just going to be who can write the > > fastest hash algorithm... something with which I have limited experience, > > and fortunately wasn't a factor in the first official contest. :) > > I'll share mine with you when the contest is over. It's fast and near- > optimal. > > However, I don't think that the contest is only about hashes - mine uses > 14% of the total runtime. If it's not about the hash, then sharing your hash won't affect the outcome. In fact, it will level the playing field for those of us who don't have any training/education regarding hash algorithms. Then the contest is a matter of the word management algorithm. As it is, I'm guessing Pete is on top because he has the fastest hash. I could be wrong!!! I don't know for sure. That's why I'm asking. It could well be that he has an awesome word management algorithm instead that would blow everybody away no matter what hash was used. Maybe Derek can shed some light on this. -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/
11. Re: Hash Function
- Posted by cklester <cklester at yahoo.com> Nov 11, 2004
- 472 views
Jason Gade wrote: > > Yeah, my first and second submission did a sort on the entire data set. I've > been > working with a method of sorting only what I need, but while there has been a > performance > gain it hasn't been all that I had hoped for. > I'm currently in the process of re-writing almost from scratch. I grabbed some hashing code from the archives and got a good speed increase. I'm going to try and incorporate some hash code I found on the mailing list also... and try using getc() and see what happens. <whispered aside> Hey, Lomax, can I borrow your hash algorithm? :) </whispered aside> -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/
12. Re: Hash Function
- Posted by Tone Škoda <tskoda at email.si> Nov 11, 2004
- 485 views
- Last edited Nov 12, 2004
> Alexander Toresson wrote: > > However, I don't think that the contest is only about hashes - mine uses > > 14% of the total runtime. if your hash is fast then it's logical it doesnt take large percentage of total runtime. if hash is fast then its speed is not important, if it's slow then its speed is important.
13. Re: Hash Function
- Posted by Tone Škoda <tskoda at email.si> Nov 11, 2004
- 479 views
- Last edited Nov 12, 2004
in case you dind't check yet: search eu archives for "hash". jiri's version is very simple. version of derek is a little more complicated. don't know which is faster and better.
14. Re: Hash Function
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Nov 11, 2004
- 476 views
- Last edited Nov 12, 2004
On Thu, 11 Nov 2004 10:35:27 -0800, cklester <guest at RapidEuphoria.com> wrote: ><whispered aside> >Hey, Lomax, can I borrow your hash algorithm? :) ></whispered aside> > Sure: Pete PS It's written in invisible ink. Hold over a candle to reveal )
15. Re: Hash Function
- Posted by cklester <cklester at yahoo.com> Nov 11, 2004
- 503 views
- Last edited Nov 12, 2004
Pete Lomax wrote: > > PS It's written in invisible ink. Hold over a candle to reveal ) I didn't see anything, but my office chair caught on fire. :P~~ -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/
16. Re: Hash Function
- Posted by Patrick Barnes <mrtrick at gmail.com> Nov 11, 2004
- 491 views
- Last edited Nov 12, 2004
On Thu, 11 Nov 2004 09:11:30 -0800, cklester <guest at rapideuphoria.com> wrote: > posted by: cklester <cklester at yahoo.com> > > Anybody have a real fast hash function? This would be for storing words > like from a dictionary. Or maybe to index words to see how many are unique? I suggest you experiment. What I did to develop my hash function was take a big file (like wrnpc11.txt) and run it my algorithm through it. After the program printed out the results, it would iterate through the hash table, counting the density at each point. (Every 100 buckets, it would display how many entries there were, then reset the counter) I'd end up with a lot of numbers underneath the results, which I'd redirect into a file. Copy the numbers into Excel, and plot a line graph. The flatter the line is, the better the hash function (because the distribution is even). You can see trends, too - if there are much more entries at the beginning of the table, you need to generate some higher values to flatten out the line. I started off with an adaption of a hash algorithm written in C (v1) Some things didn't work too well in that, because it's not easy to do bitwise shifts, so I kept 'fiddling' until I got a nice graph that didn't take too long to compute. It wasn't all that good, so I kept trying new things. Enhancements to the hash got me to v2.3, then I applied "the secret weapon(tm)" which I unfortunately can't share with you until November... and it produced the lovely flat graph you see in v3.1. So experiment a little, using always the same input file so you know that the hashes are of the same thing, and you'll get a nice flat graph - which is very important for getting an efficient distribution of hash values. I doubt that there could be any improvement upon my hash function. It's *very* fast, and *very* well distributed. Unfortunately, I'm in 14th(?) place because a whopping 60% of the time is spent somewhere else, and I haven't figured out a good way to get around it. A hash function isn't everything. -- MrTrick
17. Re: Hash Function
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 12, 2004
- 483 views
Patrick Barnes wrote: [snip] > > I doubt that there could be any improvement upon my hash function. > It's *very* fast, and *very* well distributed. Using the War&Peace file and my hashing algo, I got an average distribution of 1.053133 tokens per bucket with a maximum of 4. Using the spelling checker dictionary, I get avg of 1.114220 with a max of 5. > Unfortunately, I'm in > 14th(?) place because a whopping 60% of the time is spent somewhere > else, and I haven't figured out a good way to get around it. A hash > function isn't everything. That's right. The hashing algo isn't everything. Mine takes less than 10% of the program's time. About 25% is taken up with token recognition. I still have some areas to tweak. -- Derek Parnell Melbourne, Australia
18. Re: Hash Function
- Posted by cklester <cklester at yahoo.com> Nov 15, 2004
- 496 views
Derek Parnell wrote: > > Using the War&Peace file and my hashing algo, I got an average distribution > of 1.053133 tokens per bucket with a maximum of 4. I don't know what my average is, but I get a maximum of 40. That BYTES! I mean BITES! > That's right. The hashing algo isn't everything. Mine takes less than > 10% of the program's time. About 25% is taken up with token recognition. My biggest time waster is the hashing stuff. :/ -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/