1. RE: contest #2
- Posted by bensler at mail.com Mar 16, 2002
- 400 views
Derek is still beating me, but here are my results. Chris PIII 600mhz w/64MB --------------------------------------- Load time: 1.37 (pattern and solutions) - (seconds) --------------------------------------- {1,2,3,6,4,7,3,10,4,8,7,'S'} 0.28 EXAMINATIONS {'A','P',1,2,3} 1.43 APHID APHIS APORT APRIL APRON APTLY {'A', 'P', 'P', 1, 2} 0.39 APPLE APPLY {'E',7,8,9,7,9} 0.16 No matched words. {1,2,'C','K',1} 3.9 SACKS SOCKS SUCKS {4,'i','e'} 0.66 DIE FIE HIE LIE PIE TIE VIE "rabbit" 0.11 RABBIT {'t',1,2} 0.44 (40 matched words) {1,2,3,'s'} 2.63 (283 matched words) {1,2,3,4,'e','d'} 5.2 (331 matched words) {1,2,3,4,5,6,7} 38.5 (3095 matches) {1,2,3,4} 22.3 (2252 matches)
2. RE: contest #2
- Posted by "C. K. Lester" <cklester at yahoo.com> Mar 16, 2002
- 388 views
> Derek is still beating me, but here are my results. > > Chris > PIII 600mhz w/64MB Chris, all mine are found in under a second... but it may be the hardware. I'm wondering if I could send you my code and just let you run it on your PC so I can get better relative results...? > {1,2,3,4,5,6,7} 38.5 > (3095 matches) Even doing the search with this pattern 100 times ends up being less than one second. I don't know how much faster my PC should be, having an AMD Athlon XP 1.7Ghz with 512Mb RAM... didn't we measure that already? :) Or maybe Rob would be willing to give us reports on submitted code run on his PC? Or maybe somebody else in neutral territory? :)
3. RE: contest #2
- Posted by "C. K. Lester" <cklester at yahoo.com> Mar 16, 2002
- 376 views
Are we still using this list for benchmarking? {1,1,2,3,4}, {'E', 1, 2, 3, 1, 3}, {1,'E',2,3,4,'S'}, {1,1}, {1}, {1,2,3}, {4,'I','E'}, {1,'E',2, 'E', 3, 'E'}, {1,2,2,3,'S'}, "RABBIT", "R" & {1,1,2}, {'T',1,2}, {1,2,3,'S'}, {1,1,2,2}, {1,2,2,1}, {1,2,3,4,'E','D'}, {1,2,3,4,5,6,7}, {1,2,3,1,4,3,3,2,5,3}, {1,2,3,4,5,4,3,2,1}, {1,2,'X'}, {'M',1,2,3,4,5}, {1,2,3,4,5,6,7,8,9,10}, {'E',1,1,2}, {1,2,2,1,3}, {1,1,2}, {1,2,1}, {'M',1,2,1,'M'}, "MARTIN", {1,2,'X',2,1}, {1,2,3,'B',3,4}, {1,2,'M',2,1}, {'E',1,2,3,1,3}
4. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 406 views
Shroud it and I can test it. The times I posted are for 10000 iterations. IE. it takes 38.5 seconds to iterate match_pattern({1,2,3,4,5,6,7}) 10000 times. I'm working on an optimization for that case. Chris C. K. Lester wrote: > > Derek is still beating me, but here are my results. > > > > Chris > > PIII 600mhz w/64MB > > Chris, all mine are found in under a second... but it may be the > hardware. I'm wondering if I could send you my code and just let you run > > it on your PC so I can get better relative results...? > > > {1,2,3,4,5,6,7} 38.5 > > (3095 matches) > > Even doing the search with this pattern 100 times ends up being less > than one second. I don't know how much faster my PC should be, having an > > AMD Athlon XP 1.7Ghz with 512Mb RAM... didn't we measure that already? > :) > > Or maybe Rob would be willing to give us reports on submitted code run > on his PC? Or maybe somebody else in neutral territory? :) > >
5. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 379 views
Here is my complete test, using Derek's last list. It takes a total of 14.77 seconds to run this on my machine. <TEST> sequence words atom t,t2 t=time() include match_pattern.e t2 = time()-t ? t2 for i = 1 to 1000 do words = match_pattern({1,1,2,3,4}) words = match_pattern({'E', 1, 2, 3, 1, 3}) words = match_pattern({1,'E',2,3,4,'S'}) words = match_pattern({1,1}) words = match_pattern({1,'R',2,'R',3}) words = match_pattern({1}) words = match_pattern({1,2,3}) words = match_pattern({4,'I','E'}) words = match_pattern({1,'E',2, 'E', 3, 'E'}) words = match_pattern({1,2,2,3,'S'}) words = match_pattern("RABBIT") words = match_pattern("R" & {1,1,2}) words = match_pattern({'T',1,2}) words = match_pattern({1,2,3,'S'}) words = match_pattern({1,1,2,2}) words = match_pattern({'B','E','A','R'}) words = match_pattern({1,2,3,4,'E','D'}) words = match_pattern({1,2,3,4,5,6,7}) words = match_pattern({1,2,3,1,4,3,3,2,5,3}) words = match_pattern({1,2,3,4,5,4,3,2,1}) words = match_pattern({1,2,'X'}) words = match_pattern({'M',1,2,3,4,5}) words = match_pattern({1,2,3,4,5,6,7,8,9,10}) words = match_pattern({'E',1,1,2}) words = match_pattern({1,2,2,1,3}) words = match_pattern({1,1,2}) words = match_pattern({1,2,1}) words = match_pattern({'M',1,2,1,'M'}) words = match_pattern("MARTIN") words = match_pattern({1,2,'X',2,1}) words = match_pattern({1,2,3,'B',3,4}) words = match_pattern({1,2,'M',2,1}) words = match_pattern({9,8,7,6,7,8,9}) words = match_pattern({'E',8,7,6,7,8,9}) words = match_pattern({9,'E',7,6,7,8,9}) words = match_pattern({9,8,'E',6,7,8,9}) words = match_pattern({9,8,7,'E',7,8,9}) words = match_pattern({9,8,7,6,'E',8,9}) words = match_pattern({9,8,7,6,7,'E',9}) words = match_pattern({9,8,7,6,7,8,'E'}) end for ? time()-t-t2 ? time()-t ? (time()-t-t2)/1000/40 for i = 1 to length(words) do -- puts(1,words[i]&"\n") -- uncomment this, to list results end for while get_key()=-1 do end while <END TEST> C. K. Lester wrote: > Are we still using this list for benchmarking? > > {1,1,2,3,4}, > {'E', 1, 2, 3, 1, 3}, > {1,'E',2,3,4,'S'}, > {1,1}, > {1}, > {1,2,3}, > {4,'I','E'}, > {1,'E',2, 'E', 3, 'E'}, > {1,2,2,3,'S'}, > "RABBIT", > "R" & {1,1,2}, > {'T',1,2}, > {1,2,3,'S'}, > {1,1,2,2}, > {1,2,2,1}, > {1,2,3,4,'E','D'}, > {1,2,3,4,5,6,7}, > {1,2,3,1,4,3,3,2,5,3}, > {1,2,3,4,5,4,3,2,1}, > {1,2,'X'}, > {'M',1,2,3,4,5}, > {1,2,3,4,5,6,7,8,9,10}, > {'E',1,1,2}, > {1,2,2,1,3}, > {1,1,2}, > {1,2,1}, > {'M',1,2,1,'M'}, > "MARTIN", > {1,2,'X',2,1}, > {1,2,3,'B',3,4}, > {1,2,'M',2,1}, > {'E',1,2,3,1,3} > >
6. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 388 views
My contest#2 routine is pure EU. No memory addressing. I'll give you one hint :) I used Derek/Eumans hash function. If I explain how my data is stored, I'll give away my method :P Chris tone.skoda at siol.net wrote: > If I run that test (Duron700 128MB RAM) I get these results: > > 2.2 - file load time > 23.43 - matching patterns > 25.63 - total > 0.000586 - average > > Are you even storing all your data in sequences, or are you poking it in > memory? > I organize data in two non-nested (1-level nested?) sequences, or better > said two sequences which have sequences which have numbers. > > ----- Original Message ----- > From: <bensler at mail.com> > To: "EUforum" <EUforum at topica.com> > Sent: Sunday, March 17, 2002 4:03 PM > Subject: RE: contest #2 > > > > Here is my complete test, using Derek's last list. > > > > It takes a total of 14.77 seconds to run this on my machine. > > > > <TEST> > > sequence words > > > > atom t,t2 t=time() > > include match_pattern.e > > > > t2 = time()-t > > ? t2 > > > > for i = 1 to 1000 do > > words = match_pattern({1,1,2,3,4}) > > words = match_pattern({'E', 1, 2, 3, 1, 3}) > > words = match_pattern({1,'E',2,3,4,'S'}) > > words = match_pattern({1,1}) > > words = match_pattern({1,'R',2,'R',3}) > > words = match_pattern({1}) > > words = match_pattern({1,2,3}) > > words = match_pattern({4,'I','E'}) > > words = match_pattern({1,'E',2, 'E', 3, 'E'}) > > words = match_pattern({1,2,2,3,'S'}) > > words = match_pattern("RABBIT") > > words = match_pattern("R" & {1,1,2}) > > words = match_pattern({'T',1,2}) > > words = match_pattern({1,2,3,'S'}) > > words = match_pattern({1,1,2,2}) > > words = match_pattern({'B','E','A','R'}) > > words = match_pattern({1,2,3,4,'E','D'}) > > words = match_pattern({1,2,3,4,5,6,7}) > > words = match_pattern({1,2,3,1,4,3,3,2,5,3}) > > words = match_pattern({1,2,3,4,5,4,3,2,1}) > > words = match_pattern({1,2,'X'}) > > words = match_pattern({'M',1,2,3,4,5}) > > words = match_pattern({1,2,3,4,5,6,7,8,9,10}) > > words = match_pattern({'E',1,1,2}) > > words = match_pattern({1,2,2,1,3}) > > words = match_pattern({1,1,2}) > > words = match_pattern({1,2,1}) > > words = match_pattern({'M',1,2,1,'M'}) > > words = match_pattern("MARTIN") > > words = match_pattern({1,2,'X',2,1}) > > words = match_pattern({1,2,3,'B',3,4}) > > words = match_pattern({1,2,'M',2,1}) > > words = match_pattern({9,8,7,6,7,8,9}) > > words = match_pattern({'E',8,7,6,7,8,9}) > > words = match_pattern({9,'E',7,6,7,8,9}) > > words = match_pattern({9,8,'E',6,7,8,9}) > > words = match_pattern({9,8,7,'E',7,8,9}) > > words = match_pattern({9,8,7,6,'E',8,9}) > > words = match_pattern({9,8,7,6,7,'E',9}) > > words = match_pattern({9,8,7,6,7,8,'E'}) > > end for > > > > ? time()-t-t2 > > ? time()-t > > ? (time()-t-t2)/1000/40 > > > > for i = 1 to length(words) do > > -- puts(1,words[i]&"\n") -- uncomment this, to list results > > end for > > while get_key()=-1 do end while > > <END TEST> > > > > C. K. Lester wrote: > > > Are we still using this list for benchmarking? > > > > > > {1,1,2,3,4}, > > > {'E', 1, 2, 3, 1, 3}, > > > {1,'E',2,3,4,'S'}, > > > {1,1}, > > > {1}, > > > {1,2,3}, > > > {4,'I','E'}, > > > {1,'E',2, 'E', 3, 'E'}, > > > {1,2,2,3,'S'}, > > > "RABBIT", > > > "R" & {1,1,2}, <snip>
7. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 374 views
Yes please, I improved the last posted version of EumsHash() by changing the multiplier from 3 to 8(binary multiple). It gave me about a 5% to 10% increase calling the routine. However, if I don't fix the lookup for wild filters (filters with no absolutes), I doubt I will be winning. As is, I would be surprised if no one else is close or beating me. I know Derek is beating me. Fortunately, he's not competing :P Chris euman at bellsouth.net wrote: > ----- Original Message ----- > From: <bensler at mail.com> > > > I'll give you one hint :) I used Derek/Eumans hash function. > > Chris > > I'd like to see Derek & Eumans Hash Function take the winners circle.. > > Chris, I have a faster version if you would like to have it. > > Euman > >
8. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 382 views
I shouldn't say that it's faster. It depends on the duration of execution. <5000 total iterations, then 3 is faster. > 5000 total, then 8 is better. 8 takes longer to load, but lookups are a bit faster. Using my test, at 4000 total iterations (100 * 40 tests): 3 gives a total time of 2.69 8 gives a total time of 2.74 Using my test, at 40000 total iterations (1000 * 40 tests): 3 gives a total time of 15.32 8 gives a total time of 14.78 Those are consistent results. So, depending on what kind of testing Rob will be doing, will depend what I use. He has said he will be doing about 1000 iterations, but using how many tests? 1000 isn't enough to be accurate. I'm assuming there will be more than 10000 total calls to the routine. In which case, 8 is better. Chris euman at bellsouth.net wrote: > ----- Original Message ----- > From: <bensler at mail.com> > > > Yes please, > > I'll send it private. > > > I improved the last posted version of EumsHash() by changing the > > multiplier from 3 to 8(binary multiple). It gave me about a 5% to 10% > > increase calling the routine. > > That was one of my changes because the multiplier was originally "16" > the reason I set this to "3" was because of integer overflow and when > you > set this to "8" I just cant see how it helped speed you up any. > > I profiled the routine and stuck in "3" so "h" would be less than > #0FFFFFFF > > so this code would not be executed as often. > > if h > #0FFFFFFF then > poke4(m,h) > g = and_bits(peek(m), #F0) > h = xor_bits(h, g) > h = and_bits(h, not_bits(g)) > end if > > Im very surprised this helped you. Did you profile the routine set to > "8" > how often was the code executed (above) more than when its set at "3"? > > > However, if I don't fix the lookup for wild filters (filters with no > > absolutes), I doubt I will be winning. As is, I would be surprised if no > > > > one else is close or beating me. I know Derek is beating me. > > Fortunately, he's not competing :P > > > > Chris > > Euman > >
9. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 389 views
I really didn't do any thoruough testing for this conclusion. My conclusion was based on the fact that 8 doesn't overflow (16 does), and 8 is a binary multiple, where 3 isn't. Because it's a binary multiple, EU optimises it to a bit shift, instead of a true multiply. 3 doesn't get optimised. I originally tried 4, but it seems to be slower than both 3 and 8. 2 doesn't provide enough variance, and I end up with too many collisions. I think 8 disperses slightly better than the others too. Ending up with a more even distribution of the hashed items. Chris euman at bellsouth.net wrote: > ----- Original Message ----- > From: <bensler at mail.com> > > > Yes please, > > I'll send it private. > > > I improved the last posted version of EumsHash() by changing the > > multiplier from 3 to 8(binary multiple). It gave me about a 5% to 10% > > increase calling the routine. > > That was one of my changes because the multiplier was originally "16" > the reason I set this to "3" was because of integer overflow and when > you > set this to "8" I just cant see how it helped speed you up any. > > I profiled the routine and stuck in "3" so "h" would be less than > #0FFFFFFF > > so this code would not be executed as often. > > if h > #0FFFFFFF then > poke4(m,h) > g = and_bits(peek(m), #F0) > h = xor_bits(h, g) > h = and_bits(h, not_bits(g)) > end if > > Im very surprised this helped you. Did you profile the routine set to > "8" > how often was the code executed (above) more than when its set at "3"? > > > However, if I don't fix the lookup for wild filters (filters with no > > absolutes), I doubt I will be winning. As is, I would be surprised if no > > > > one else is close or beating me. I know Derek is beating me. > > Fortunately, he's not competing :P > > > > Chris > > Euman > >
10. RE: contest #2
- Posted by "C. K. Lester" <cklester at yahoo.com> Mar 17, 2002
- 390 views
euman at bellsouth.net wrote: > > I'd like to see Derek & Eumans Hash Function take > the winners circle.. Okay, Derek or Euman, how about posting your latest hash algorithm? Thanks! ck
11. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 377 views
It said solutions to the problems should not be posted to the list. Chris Derek Parnell wrote: > Didn't Robert say something about posting actual code is not allowed. > > I don't hash the dictionary words, just their patterns. Also, I don't > use an algorithm that looks > anything like Euman's. The "alphabet" that I work with is thus in the > range 1 - 32. My approach is > to pre-calculate as much as possible at start up, so that inside the > inner loops, I don't do any > unnecessary calcs. Also, I try to avoid 'expensive' compuations like > multiply and divide and nested > sequence references. And append() is almost totally avoided, but when I > do have to, I append more > than I need so that in the next few iterations I can avoid doing another > append. > > I organise the dicionary by pattern. Note that over 95% of the patterns > in the dictionary have less > than 20 words in them, and only a few have more than 2000 words. > > ---------- > Derek. > > > 18/03/2002 10:57:57 AM, "C. K. Lester" <cklester at yahoo.com> wrote: > > > > >euman at bellsouth.net wrote: > >> > >> I'd like to see Derek & Eumans Hash Function take > >> the winners circle.. > > > >Okay, Derek or Euman, how about posting your latest hash algorithm? > > > >Thanks! > >ck > > > > > --------- > Cheers, > Derek Parnell > ICQ# 7647806 > >
12. RE: contest #2
- Posted by "C. K. Lester" <cklester at yahoo.com> Mar 17, 2002
- 381 views
Derek Parnell wrote: > Didn't Robert say something about posting actual > code is not allowed. The code you would be posting, however, could just as easily be posted to the contributions page to be used by everyone... I'm using some of Kat's strtok.e; it's generic code that is readily available. That is how I would classify your code in this instance. Your code is not the algorithm that will solve the problem, but it can be a component that can be used (just like Kat's code). If you don't want to post it here, will you provide it privately? > I don't hash the dictionary words, just their patterns. I actually don't care how it's done... in fact, you could shroud your source and let me use it as an include. I'm not entering contest #2, but I'd like my contest #3 algorithm to be faster, not because speed will determine the winner, but because it will help me more quickly refine my deciphering algorithm. Whether it takes 7 seconds or 70 seconds to return the deciphered text won't matter in contest #3, so it's mainly for development's sake. However, I certainly don't want to break the spirit of this competition, so I'll submit to Rob's ruling(s) in this regard.
13. RE: contest #2
- Posted by "C. K. Lester" <cklester at yahoo.com> Mar 17, 2002
- 389 views
Chris (bensler at mail.com) wrote: > It said solutions to the problems should not be posted to the list. True. Is Derek's code (or yours, or Euman's) a solution to any of the contests? If so, they shouldn't be posted. > Derek Parnell wrote: > > Didn't Robert say something about posting > > actual code is not allowed.
14. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 382 views
Contest#2 only requires the routine to be able to match a pattern against words.txt, why add more words ?:P Chris euman at bellsouth.net wrote: > Hello, > > I made a copy of the words in the anagram dictionary that i posted a > link to. > 3017 words (some being duplicate) that are in the anagram dictionary an > not in junko's dictionary. > > I ran this test with my own spellchecker in 0.90 sec that uses EumsHash( > ) > > anagram.txt file is 141k > w/ words not in junko's dictionary being 24.9k (some duplicate) > > So anyone that claims EumsHash( ) isnt very fast "Think again" > > Junko's checked the file in 1.82 sec more than twice as slow for the > same file. > Junko's still orders the words and eliminates the duplicates so I'll see > about > fixing this. I still feel like I can beat her when finished. > > Euman > euman at bellsouth.net > > ----- Original Message ----- > From: <euman at bellsouth.net> > > > > Robert never said we couldnt use a second dictionary only that we > > must use Junko's dictionary so I'll add this (scrabble) anagram > > dictionary > > that may help some of you. The things one can find on the I-Net > > is amazing! > > > > http://www.orchy.com/dictionary/anagrams.htm > > > > Euman > > euman at bellsouth.net > >
15. RE: contest #2
- Posted by bensler at mail.com Mar 17, 2002
- 388 views
I think maybe the difference is because I'm not using the table structure that the hash function is designed for. I use Eumshash, and use remainder(hKey,Prime) to reference into my table. I don't have a perfect hash, but because of the dispersment I have in my table. There are no collisions. I think that for me, using 8 provides a more unique key, creating a more balanced fill of the bins. Because of the better balance, I get better lookups. Chris euman at bellsouth.net wrote: > From: <bensler at mail.com> > > > Contest#2 only requires the routine to be able to match a pattern > > against words.txt, why add more words ?:P > > > > Chris > > I was commenting on the routine in general because it seem'd to be > under the microscope for speed. Im not writing for #2 either. > > EumsHash( ) is versitile and very fast! > > It will be hard to beat in the dictionary hash arena! > Load time is slightly slower than most but lookup > is extremely powerfull and fast because of the key > returned based on strings being so individual. > > Chris hit on something earlier that changing the multiplier > will create values that will be different. The higher the value > the more unique the hash value will be but the slower the > EumsHash( ) algorithm will be. At present the multiplier of 3 > is the fastest method (based on profile and profile_time) > > Euman > euman at bellsouth.net > >