1. contest #2
- Posted by tone.skoda at siol.net Mar 16, 2002
- 371 views
This is a multi-part message in MIME format. ------=_NextPart_000_0007_01C1CC80.002E5080 charset="iso-8859-2" Contest 2, my benchmarks: 10.000 iterations, without dictionary load time (about 3 seconds): (pattern and solutions) - (seconds) --------------------------------------- {1,2,3,6,4,7,3,10,4,8,7,'S'} 1.58 EXAMINATIONS {'A','P',1,2,3} 3.22 APHID APHIS APORT APRIL APRON APTLY {'A', 'P', 'P', 1, 2} 0.35 APPLE APPLY {'E',7,8,9,7,9} 0.96 No matched words. {1,2,'C','K',1} 1.8 SACKS SOCKS SUCKS {4,'i','e'} 3.64 DIE FIE HIE LIE PIE TIE VIE "rabbit" 2.48 RABBIT {'t',1,2} 1.12 (40 matched words) {1,2,3,'s'} 10.75 (283 matched words) {1,2,3,4,'e','d'} 55.39 (331 matched words) For pattenrs that have no letters, like this one: {1,2,3,4,5,6,7} it takes too long to wait and needs to be optimized. Dictionary load time may increase for about one second for this. What are your results? Derek does you code really take only about 1 second to load dictionary and get words for 10.000 random patterns, or was that about something else? ------=_NextPart_000_0007_01C1CC80.002E5080 Content-Type: text/html; charset="iso-8859-2" Content-Transfer-Encoding: 8bit <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=iso-8859-2"> <META content="MSHTML 5.50.4522.1800" name=GENERATOR> <STYLE></STYLE> </HEAD> <BODY bgColor=#ffffff> <DIV><FONT face=Arial size=2>Contest 2, my benchmarks:<BR>10.000 iterations, without dictionary load time (about 3 seconds):</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>(pattern and solutions) - (seconds)<BR>---------------------------------------<BR>{1,2,3,6,4,7,3,10,4,8,7,'S'} 1.58<BR>EXAMINATIONS</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>{'A','P',1,2,3} 3.22<BR>APHID<BR>APHIS<BR>APORT<BR>APRIL<BR>APRON<BR>APTLY</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>{'A', 'P', 'P', 1, 2} 0.35<BR>APPLE<BR>APPLY</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>{'E',7,8,9,7,9} 0.96<BR>No matched words.</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>{1,2,'C','K',1} 1.8<BR>SACKS<BR>SOCKS<BR>SUCKS</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>{4,'i','e'} 3.64<BR>DIE<BR>FIE<BR>HIE<BR>LIE<BR>PIE<BR>TIE<BR>VIE</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>"rabbit" 2.48<BR>RABBIT</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>{'t',1,2} 1.12<BR>(40 matched words)</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>{1,2,3,'s'} 10.75<BR>(283 matched words)</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>{1,2,3,4,'e','d'} 55.39<BR>(331 matched words)</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>For pattenrs that have no letters, like this one: {1,2,3,4,5,6,7} it takes too long to wait<BR>and needs to be optimized. Dictionary load time may increase for about one second for this.</FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2>What are your results? Derek does you code really take only about 1 second to<BR>load dictionary and get words for 10.000 random ------=_NextPart_000_0007_01C1CC80.002E5080--
2. Re: contest #2
- Posted by tone.skoda at siol.net Mar 17, 2002
- 369 views
If I run that test (Duron700 128MB RAM) I get these results: 2.2 - file load time 23.43 - matching patterns 25.63 - total 0.000586 - average Are you even storing all your data in sequences, or are you poking it in memory? I organize data in two non-nested (1-level nested?) sequences, or better said two sequences which have sequences which have numbers. ----- Original Message ----- From: <bensler at mail.com> To: "EUforum" <EUforum at topica.com> Sent: Sunday, March 17, 2002 4:03 PM Subject: RE: contest #2 > > Here is my complete test, using Derek's last list. > > It takes a total of 14.77 seconds to run this on my machine. > > <TEST> > sequence words > > atom t,t2 t=time() > include match_pattern.e > > t2 = time()-t > ? t2 > > for i = 1 to 1000 do > words = match_pattern({1,1,2,3,4}) > words = match_pattern({'E', 1, 2, 3, 1, 3}) > words = match_pattern({1,'E',2,3,4,'S'}) > words = match_pattern({1,1}) > words = match_pattern({1,'R',2,'R',3}) > words = match_pattern({1}) > words = match_pattern({1,2,3}) > words = match_pattern({4,'I','E'}) > words = match_pattern({1,'E',2, 'E', 3, 'E'}) > words = match_pattern({1,2,2,3,'S'}) > words = match_pattern("RABBIT") > words = match_pattern("R" & {1,1,2}) > words = match_pattern({'T',1,2}) > words = match_pattern({1,2,3,'S'}) > words = match_pattern({1,1,2,2}) > words = match_pattern({'B','E','A','R'}) > words = match_pattern({1,2,3,4,'E','D'}) > words = match_pattern({1,2,3,4,5,6,7}) > words = match_pattern({1,2,3,1,4,3,3,2,5,3}) > words = match_pattern({1,2,3,4,5,4,3,2,1}) > words = match_pattern({1,2,'X'}) > words = match_pattern({'M',1,2,3,4,5}) > words = match_pattern({1,2,3,4,5,6,7,8,9,10}) > words = match_pattern({'E',1,1,2}) > words = match_pattern({1,2,2,1,3}) > words = match_pattern({1,1,2}) > words = match_pattern({1,2,1}) > words = match_pattern({'M',1,2,1,'M'}) > words = match_pattern("MARTIN") > words = match_pattern({1,2,'X',2,1}) > words = match_pattern({1,2,3,'B',3,4}) > words = match_pattern({1,2,'M',2,1}) > words = match_pattern({9,8,7,6,7,8,9}) > words = match_pattern({'E',8,7,6,7,8,9}) > words = match_pattern({9,'E',7,6,7,8,9}) > words = match_pattern({9,8,'E',6,7,8,9}) > words = match_pattern({9,8,7,'E',7,8,9}) > words = match_pattern({9,8,7,6,'E',8,9}) > words = match_pattern({9,8,7,6,7,'E',9}) > words = match_pattern({9,8,7,6,7,8,'E'}) > end for > > ? time()-t-t2 > ? time()-t > ? (time()-t-t2)/1000/40 > > for i = 1 to length(words) do > -- puts(1,words[i]&"\n") -- uncomment this, to list results > end for > while get_key()=-1 do end while > <END TEST> > > C. K. Lester wrote: > > Are we still using this list for benchmarking? > > > > {1,1,2,3,4}, > > {'E', 1, 2, 3, 1, 3}, > > {1,'E',2,3,4,'S'}, > > {1,1}, > > {1}, > > {1,2,3}, > > {4,'I','E'}, > > {1,'E',2, 'E', 3, 'E'}, > > {1,2,2,3,'S'}, > > "RABBIT", > > "R" & {1,1,2}, > > {'T',1,2}, > > {1,2,3,'S'}, > > {1,1,2,2}, > > {1,2,2,1}, > > {1,2,3,4,'E','D'}, > > {1,2,3,4,5,6,7}, > > {1,2,3,1,4,3,3,2,5,3}, > > {1,2,3,4,5,4,3,2,1}, > > {1,2,'X'}, > > {'M',1,2,3,4,5}, > > {1,2,3,4,5,6,7,8,9,10}, > > {'E',1,1,2}, > > {1,2,2,1,3}, > > {1,1,2}, > > {1,2,1}, > > {'M',1,2,1,'M'}, > > "MARTIN", > > {1,2,'X',2,1}, > > {1,2,3,'B',3,4}, > > {1,2,'M',2,1}, <snip> > > >
3. Re: contest #2
- Posted by euman at bellsouth.net Mar 17, 2002
- 353 views
----- Original Message ----- From: <bensler at mail.com> > I'll give you one hint :) I used Derek/Eumans hash function. > Chris I'd like to see Derek & Eumans Hash Function take the winners circle.. Chris, I have a faster version if you would like to have it. Euman
4. Re: contest #2
- Posted by petelomax at blueyonder.co.uk Mar 17, 2002
- 341 views
I'm a bit disappointed with my results (see below), but I'll make two observations just to cheer myself up: 1) The worst cases (*'d) are where I knew they would be. Depending on how Rob generates the random test data, they will either be quite rare or extremely common. If he picks random words from the dictionary, and makes a patterns from them, I might be OK. If he chooses to do: pattern=repeat(0,rand(10)) for i = 1 to length(pattern) pattern[i]=rand(26)+(rand(2)-1)*64 end for I'm spannered, (& I wouldn't consider that a fair test). 2) I doubt Rob's test set will be < 50% with fixed letters and > 50% without, but if it is, maybe I'll not do quite so bad. (for a seven letter word, the above will generate 127 words with fixed letters for every one with none, I think) Maybe a bit late to realise this, but I guess I subconsciously assumed there would be a bias towards patterns with no fixed letters, given I realised when using such a routine for problem #3, no fixed letters would ever be passed, unless you use the routine again rather than filter/test the list it already gave you. Any comments Rob? Does the 90% rule for problem 3 apply to the patterns for problem 2 as well, or not? Oh well, we'll see. Down to luck again I see, C.K. Pete Here are my (disappointing) results: constant testset= { --# pattern matches tone chris pete {1,2,3,6,4,7,3,10,4,8,7,'S'}, 1, 1.58, 0.28, 0.57, {'A','P',1,2,3}, 6, 3.22, 1.43, 51.45, * {'A', 'P', 'P', 1, 2}, 2, 0.35, 0.39, 2.3, {'E',7,8,9,7,9}, 0, 0.96, 0.16, 0.54, {1,2,'C','K',1}, 3, 1.8, 3.9, 3.12, {4,'I','E'}, 7, 3.64, 0.66, 10.95, * "RABBIT", 1, 2.48, 0.11, 8.52, {'T',1,2}, 40, 1.12, 0.44, 10.49, * {1,2,3,'S'}, 283, 10.75, 2.63, 35.76, * {1,2,3,4,'E','D'}, 331, 55.39, 5.2, 66.66, * } -- totals 81.29, 15.2, 190.36!!!
5. Re: contest #2
- Posted by euman at bellsouth.net Mar 17, 2002
- 352 views
----- Original Message ----- From: <bensler at mail.com> > Yes please, I'll send it private. > I improved the last posted version of EumsHash() by changing the > multiplier from 3 to 8(binary multiple). It gave me about a 5% to 10% > increase calling the routine. That was one of my changes because the multiplier was originally "16" the reason I set this to "3" was because of integer overflow and when you set this to "8" I just cant see how it helped speed you up any. I profiled the routine and stuck in "3" so "h" would be less than #0FFFFFFF so this code would not be executed as often. if h > #0FFFFFFF then poke4(m,h) g = and_bits(peek(m), #F0) h = xor_bits(h, g) h = and_bits(h, not_bits(g)) end if Im very surprised this helped you. Did you profile the routine set to "8" how often was the code executed (above) more than when its set at "3"? > However, if I don't fix the lookup for wild filters (filters with no > absolutes), I doubt I will be winning. As is, I would be surprised if no > one else is close or beating me. I know Derek is beating me. > Fortunately, he's not competing :P > > Chris Euman
6. Re: contest #2
- Posted by Derek Parnell <ddparnell at bigpond.com> Mar 17, 2002
- 341 views
Didn't Robert say something about posting actual code is not allowed. I don't hash the dictionary words, just their patterns. Also, I don't use an algorithm that looks anything like Euman's. The "alphabet" that I work with is thus in the range 1 - 32. My approach is to pre-calculate as much as possible at start up, so that inside the inner loops, I don't do any unnecessary calcs. Also, I try to avoid 'expensive' compuations like multiply and divide and nested sequence references. And append() is almost totally avoided, but when I do have to, I append more than I need so that in the next few iterations I can avoid doing another append. I organise the dicionary by pattern. Note that over 95% of the patterns in the dictionary have less than 20 words in them, and only a few have more than 2000 words. ---------- Derek. 18/03/2002 10:57:57 AM, "C. K. Lester" <cklester at yahoo.com> wrote: > > >euman at bellsouth.net wrote: >> >> I'd like to see Derek & Eumans Hash Function take >> the winners circle.. > >Okay, Derek or Euman, how about posting your latest hash algorithm? > >Thanks! >ck > > > > --------- Cheers, Derek Parnell ICQ# 7647806
7. Re: contest #2
- Posted by tone.skoda at siol.net Mar 17, 2002
- 362 views
I'm using only my own code and I didn't do any research on the internet and don't have any experience with hash tables so my results are pretty good. :) I also don't use any and_bits () and poke () functions like EumsHash does, and I use gets (), so some optimizing can be possible here. I improved it a little, now it's faster than yours for that test, I get 11.27 seconds! :) The question is, will Rob run it on 10.000 patterns or on 1000. Cause if it's gonna be on 1000 then my file load time is too slow. 1.98 - file load time, improved 9.29 - matching patterns 11.27 - total 0.00023225 - average match_words = match_pattern ({1,2,3,6,4,7,3,10,4,8,7,'S'}) -- 0.28 sec for 10.000 match_words = match_pattern ({'A','P',1,2,3}) -- 0.34 sec for 10.000 match_words = match_pattern ({'A', 'P', 'P', 1, 2}) -- 0.18 sec for 10.000 match_words = match_pattern ({'E',7,8,9,7,9}) -- 0.22 sec for 10.000 match_words = match_pattern ({1,2,'C','K',1}) -- 0.22 sec for 10.000 match_words = match_pattern ({4,'i','e'}) -- 0.65 sec for 10.000 match_words = match_pattern ("rabbit") -- 0.39 sec for 10.000 match_words = match_pattern ({'t',1,2}) -- 0.29 sec for 10.000 match_words = match_pattern ({1,2,3,'s'}) -- 1.46 sec for 10.000 match_words = match_pattern ({1,2,3,4,'e','d'}) -- 11.6 sec for 10.000 !!! match_words = match_pattern ({1,2,3,4,5,6,7}) -- 35.17 sec for 10.000 !!! match_words = match_pattern ({1,2,3,4}) -- 25.16 sec for 10.000 !!! match_words = match_pattern ({1,'V',1,'R',1,2,2}) -- 0.15 sec for 10.000 ----- Original Message ----- From: <bensler at mail.com> To: "EUforum" <EUforum at topica.com> Sent: Sunday, March 17, 2002 7:39 PM Subject: RE: contest #2 > > My contest#2 routine is pure EU. No memory addressing. > > I'll give you one hint :) I used Derek/Eumans hash function. > > If I explain how my data is stored, I'll give away my method :P > > Chris > > > tone.skoda at siol.net wrote: > > If I run that test (Duron700 128MB RAM) I get these results: > > > > 2.2 - file load time > > 23.43 - matching patterns > > 25.63 - total > > 0.000586 - average > > > > Are you even storing all your data in sequences, or are you poking it in > > memory? > > I organize data in two non-nested (1-level nested?) sequences, or better > > said two sequences which have sequences which have numbers. > > > > ----- Original Message ----- > > From: <bensler at mail.com> > > To: "EUforum" <EUforum at topica.com> > > Sent: Sunday, March 17, 2002 4:03 PM > > Subject: RE: contest #2 > > > > > > > Here is my complete test, using Derek's last list. > > > > > > It takes a total of 14.77 seconds to run this on my machine. > > > > > > <TEST> > > > sequence words > > > > > > atom t,t2 t=time() > > > include match_pattern.e > > > > > > t2 = time()-t > > > ? t2 > > > > > > for i = 1 to 1000 do > > > words = match_pattern({1,1,2,3,4}) > > > words = match_pattern({'E', 1, 2, 3, 1, 3}) > > > words = match_pattern({1,'E',2,3,4,'S'}) > > > words = match_pattern({1,1}) > > > words = match_pattern({1,'R',2,'R',3}) > > > words = match_pattern({1}) > > > words = match_pattern({1,2,3}) > > > words = match_pattern({4,'I','E'}) > > > words = match_pattern({1,'E',2, 'E', 3, 'E'}) > > > words = match_pattern({1,2,2,3,'S'}) > > > words = match_pattern("RABBIT") > > > words = match_pattern("R" & {1,1,2}) > > > words = match_pattern({'T',1,2}) > > > words = match_pattern({1,2,3,'S'}) > > > words = match_pattern({1,1,2,2}) > > > words = match_pattern({'B','E','A','R'}) > > > words = match_pattern({1,2,3,4,'E','D'}) > > > words = match_pattern({1,2,3,4,5,6,7}) > > > words = match_pattern({1,2,3,1,4,3,3,2,5,3}) > > > words = match_pattern({1,2,3,4,5,4,3,2,1}) > > > words = match_pattern({1,2,'X'}) > > > words = match_pattern({'M',1,2,3,4,5}) > > > words = match_pattern({1,2,3,4,5,6,7,8,9,10}) > > > words = match_pattern({'E',1,1,2}) > > > words = match_pattern({1,2,2,1,3}) > > > words = match_pattern({1,1,2}) > > > words = match_pattern({1,2,1}) > > > words = match_pattern({'M',1,2,1,'M'}) > > > words = match_pattern("MARTIN") > > > words = match_pattern({1,2,'X',2,1}) > > > words = match_pattern({1,2,3,'B',3,4}) > > > words = match_pattern({1,2,'M',2,1}) > > > words = match_pattern({9,8,7,6,7,8,9}) > > > words = match_pattern({'E',8,7,6,7,8,9}) > > > words = match_pattern({9,'E',7,6,7,8,9}) > > > words = match_pattern({9,8,'E',6,7,8,9}) > > > words = match_pattern({9,8,7,'E',7,8,9}) > > > words = match_pattern({9,8,7,6,'E',8,9}) > > > words = match_pattern({9,8,7,6,7,'E',9}) > > > words = match_pattern({9,8,7,6,7,8,'E'}) > > > end for > > > > > > ? time()-t-t2 > > > ? time()-t > > > ? (time()-t-t2)/1000/40 > > > > > > for i = 1 to length(words) do > > > -- puts(1,words[i]&"\n") -- uncomment this, to list results > > > end for > > > while get_key()=-1 do end while > > > <END TEST> > > > > > > C. K. Lester wrote: > > > > Are we still using this list for benchmarking? > > > > > > > > {1,1,2,3,4}, > > > > {'E', 1, 2, 3, 1, 3}, <snip> > > >
8. Re: contest #2
- Posted by euman at bellsouth.net Mar 17, 2002
- 350 views
Robert never said we couldnt use a second dictionary only that we must use Junko's dictionary so I'll add this (scrabble) anagram dictionary that may help some of you. The things one can find on the I-Net is amazing! http://www.orchy.com/dictionary/anagrams.htm Euman euman at bellsouth.net Q: Are we monetarily insane? A: YES ----- Original Message ----- From: "C. K. Lester" <cklester at yahoo.com> To: "EUforum" <EUforum at topica.com> Sent: Sunday, March 17, 2002 6:57 PM Subject: RE: contest #2 > > > euman at bellsouth.net wrote: > > > > I'd like to see Derek & Eumans Hash Function take > > the winners circle.. > > Okay, Derek or Euman, how about posting your latest hash algorithm? > > Thanks! > ck > > > >
9. Re: contest #2
- Posted by euman at bellsouth.net Mar 17, 2002
- 376 views
Hello, I made a copy of the words in the anagram dictionary that i posted a link to. 3017 words (some being duplicate) that are in the anagram dictionary an not in junko's dictionary. I ran this test with my own spellchecker in 0.90 sec that uses EumsHash( ) anagram.txt file is 141k w/ words not in junko's dictionary being 24.9k (some duplicate) So anyone that claims EumsHash( ) isnt very fast "Think again" Junko's checked the file in 1.82 sec more than twice as slow for the same file. Junko's still orders the words and eliminates the duplicates so I'll see about fixing this. I still feel like I can beat her when finished. Euman euman at bellsouth.net ----- Original Message ----- From: <euman at bellsouth.net> > > Robert never said we couldnt use a second dictionary only that we > must use Junko's dictionary so I'll add this (scrabble) anagram dictionary > that may help some of you. The things one can find on the I-Net > is amazing! > > http://www.orchy.com/dictionary/anagrams.htm > > Euman > euman at bellsouth.net
10. Re: contest #2
- Posted by euman at bellsouth.net Mar 17, 2002
- 349 views
From: <bensler at mail.com> > Contest#2 only requires the routine to be able to match a pattern > against words.txt, why add more words ?:P > > Chris I was commenting on the routine in general because it seem'd to be under the microscope for speed. Im not writing for #2 either. EumsHash( ) is versitile and very fast! It will be hard to beat in the dictionary hash arena! Load time is slightly slower than most but lookup is extremely powerfull and fast because of the key returned based on strings being so individual. Chris hit on something earlier that changing the multiplier will create values that will be different. The higher the value the more unique the hash value will be but the slower the EumsHash( ) algorithm will be. At present the multiplier of 3 is the fastest method (based on profile and profile_time) Euman euman at bellsouth.net
11. Re: contest #2
- Posted by Kat <gertie at PELL.NET> Mar 17, 2002
- 343 views
On 17 Mar 2002, at 22:03, euman at bellsouth.net wrote: > > Hello, > > I made a copy of the words in the anagram dictionary that i posted a link to. > 3017 words (some being duplicate) that are in the anagram dictionary an not in > junko's dictionary. I mined the Mirriam Webster site over a yr ago, got 127,949 words (with definitions), i could send the words to anyone, i think, without any copyright violations. That's 20,000 words more than in Wordnet 1.7. Kat