1. contest #2

This is a multi-part message in MIME format.

------=_NextPart_000_0007_01C1CC80.002E5080
	charset="iso-8859-2"

Contest 2, my benchmarks:
10.000 iterations, without dictionary load time (about 3 seconds):

(pattern and solutions) - (seconds)
---------------------------------------
{1,2,3,6,4,7,3,10,4,8,7,'S'} 1.58
EXAMINATIONS

{'A','P',1,2,3}   3.22
APHID
APHIS
APORT
APRIL
APRON
APTLY

{'A', 'P', 'P', 1, 2}  0.35
APPLE
APPLY

{'E',7,8,9,7,9}   0.96
No matched words.

{1,2,'C','K',1}   1.8
SACKS
SOCKS
SUCKS

{4,'i','e'}   3.64
DIE
FIE
HIE
LIE
PIE
TIE
VIE

"rabbit"   2.48
RABBIT

{'t',1,2}   1.12
(40 matched words)

{1,2,3,'s'}   10.75
(283 matched words)

{1,2,3,4,'e','d'}  55.39
(331 matched words)

For pattenrs that have no letters, like this one: {1,2,3,4,5,6,7} it takes too
long to wait
and needs to be optimized. Dictionary load time may increase for about one
second for this.

What are your results? Derek does you code really take only about 1 second to
load dictionary and get words for 10.000 random patterns, or was that about
something else?

------=_NextPart_000_0007_01C1CC80.002E5080
Content-Type: text/html;
	charset="iso-8859-2"
Content-Transfer-Encoding: 8bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-2">
<META content="MSHTML 5.50.4522.1800" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Contest 2, my benchmarks:<BR>10.000 iterations, 
without dictionary load time (about 3 seconds):</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>(pattern and 
solutions)&nbsp;-&nbsp;(seconds)<BR>---------------------------------------<BR>{1,2,3,6,4,7,3,10,4,8,7,'S'}&nbsp;1.58<BR>EXAMINATIONS</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial 
size=2>{'A','P',1,2,3}&nbsp;&nbsp;&nbsp;3.22<BR>APHID<BR>APHIS<BR>APORT<BR>APRIL<BR>APRON<BR>APTLY</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>{'A', 'P', 'P', 1, 
2}&nbsp;&nbsp;0.35<BR>APPLE<BR>APPLY</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>{'E',7,8,9,7,9}&nbsp;&nbsp;&nbsp;0.96<BR>No matched
words.</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial 
size=2>{1,2,'C','K',1}&nbsp;&nbsp;&nbsp;1.8<BR>SACKS<BR>SOCKS<BR>SUCKS</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial 
size=2>{4,'i','e'}&nbsp;&nbsp;&nbsp;3.64<BR>DIE<BR>FIE<BR>HIE<BR>LIE<BR>PIE<BR>TIE<BR>VIE</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial 
size=2>"rabbit"&nbsp;&nbsp;&nbsp;2.48<BR>RABBIT</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>{'t',1,2}&nbsp;&nbsp;&nbsp;1.12<BR>(40 matched 
words)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>{1,2,3,'s'}&nbsp;&nbsp;&nbsp;10.75<BR>(283 matched 
words)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>{1,2,3,4,'e','d'}&nbsp;&nbsp;55.39<BR>(331 matched 
words)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>For pattenrs that have no letters, like this one: 
{1,2,3,4,5,6,7} it takes too long to wait<BR>and needs to be optimized. 
Dictionary load time may increase for about one second for this.</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>What are your results? Derek does you code really 
take only about 1 second to<BR>load dictionary and get words for 10.000 random 

------=_NextPart_000_0007_01C1CC80.002E5080--

new topic     » topic index » view message » categorize

2. Re: contest #2

If I run that test (Duron700 128MB RAM) I get these results:

2.2 - file load time
23.43 - matching patterns
25.63 - total
0.000586 - average

Are you even storing all your data in sequences, or are you poking it in
memory?
I organize data in two non-nested (1-level nested?) sequences, or better
said two sequences which have sequences which have numbers.

----- Original Message -----
From: <bensler at mail.com>
To: "EUforum" <EUforum at topica.com>
Sent: Sunday, March 17, 2002 4:03 PM
Subject: RE: contest #2


>
> Here is my complete test, using Derek's last list.
>
> It takes a total of 14.77 seconds to run this on my machine.
>
> <TEST>
> sequence words
>
> atom t,t2   t=time()
> include match_pattern.e
>
> t2 = time()-t
> ? t2
>
> for i = 1 to 1000 do
>    words = match_pattern({1,1,2,3,4})
>    words = match_pattern({'E', 1, 2, 3, 1, 3})
>    words = match_pattern({1,'E',2,3,4,'S'})
>    words = match_pattern({1,1})
>    words = match_pattern({1,'R',2,'R',3})
>    words = match_pattern({1})
>    words = match_pattern({1,2,3})
>    words = match_pattern({4,'I','E'})
>    words = match_pattern({1,'E',2, 'E', 3, 'E'})
>    words = match_pattern({1,2,2,3,'S'})
>    words = match_pattern("RABBIT")
>    words = match_pattern("R" & {1,1,2})
>    words = match_pattern({'T',1,2})
>    words = match_pattern({1,2,3,'S'})
>    words = match_pattern({1,1,2,2})
>    words = match_pattern({'B','E','A','R'})
>    words = match_pattern({1,2,3,4,'E','D'})
>    words = match_pattern({1,2,3,4,5,6,7})
>    words = match_pattern({1,2,3,1,4,3,3,2,5,3})
>    words = match_pattern({1,2,3,4,5,4,3,2,1})
>    words = match_pattern({1,2,'X'})
>    words = match_pattern({'M',1,2,3,4,5})
>    words = match_pattern({1,2,3,4,5,6,7,8,9,10})
>    words = match_pattern({'E',1,1,2})
>    words = match_pattern({1,2,2,1,3})
>    words = match_pattern({1,1,2})
>    words = match_pattern({1,2,1})
>    words = match_pattern({'M',1,2,1,'M'})
>    words = match_pattern("MARTIN")
>    words = match_pattern({1,2,'X',2,1})
>    words = match_pattern({1,2,3,'B',3,4})
>    words = match_pattern({1,2,'M',2,1})
>    words = match_pattern({9,8,7,6,7,8,9})
>    words = match_pattern({'E',8,7,6,7,8,9})
>    words = match_pattern({9,'E',7,6,7,8,9})
>    words = match_pattern({9,8,'E',6,7,8,9})
>    words = match_pattern({9,8,7,'E',7,8,9})
>    words = match_pattern({9,8,7,6,'E',8,9})
>    words = match_pattern({9,8,7,6,7,'E',9})
>    words = match_pattern({9,8,7,6,7,8,'E'})
> end for
>
> ? time()-t-t2
> ? time()-t
> ? (time()-t-t2)/1000/40
>
> for i = 1 to length(words) do
> --   puts(1,words[i]&"\n") -- uncomment this, to list results
> end for
> while get_key()=-1 do end while
> <END TEST>
>
> C. K. Lester wrote:
> > Are we still using this list for benchmarking?
> >
> > {1,1,2,3,4},
> > {'E', 1, 2, 3, 1, 3},
> > {1,'E',2,3,4,'S'},
> > {1,1},
> > {1},
> > {1,2,3},
> > {4,'I','E'},
> > {1,'E',2, 'E', 3, 'E'},
> > {1,2,2,3,'S'},
> > "RABBIT",
> > "R" & {1,1,2},
> > {'T',1,2},
> > {1,2,3,'S'},
> > {1,1,2,2},
> > {1,2,2,1},
> > {1,2,3,4,'E','D'},
> > {1,2,3,4,5,6,7},
> > {1,2,3,1,4,3,3,2,5,3},
> > {1,2,3,4,5,4,3,2,1},
> > {1,2,'X'},
> > {'M',1,2,3,4,5},
> > {1,2,3,4,5,6,7,8,9,10},
> > {'E',1,1,2},
> > {1,2,2,1,3},
> > {1,1,2},
> > {1,2,1},
> > {'M',1,2,1,'M'},
> > "MARTIN",
> > {1,2,'X',2,1},
> > {1,2,3,'B',3,4},
> > {1,2,'M',2,1},
<snip>

>
>
>

new topic     » goto parent     » topic index » view message » categorize

3. Re: contest #2

----- Original Message ----- 
From: <bensler at mail.com>

> I'll give you one hint :) I used Derek/Eumans hash function.
> Chris

I'd like to see Derek & Eumans Hash Function take the winners circle..

Chris, I have a faster version if you would like to have it.

Euman

new topic     » goto parent     » topic index » view message » categorize

4. Re: contest #2

I'm a bit disappointed with my results (see below), but I'll make two
observations just to cheer myself up:

1) The worst cases (*'d) are where I knew they would be. Depending on
how Rob generates the random test data, they will either be quite rare
or extremely common. If he picks random words from the dictionary, and
makes a patterns from them, I might be OK. If he chooses to do:
	pattern=repeat(0,rand(10))
	for i = 1 to length(pattern)
		pattern[i]=rand(26)+(rand(2)-1)*64
	end for
I'm spannered, (& I wouldn't consider that a fair test).

2) I doubt Rob's test set will be < 50% with fixed letters and > 50%
without, but if it is, maybe I'll not do quite so bad.
(for a seven letter word, the above will generate 127 words with fixed
letters for every one with none, I think)

Maybe a bit late to realise this, but I guess I subconsciously assumed
there would be a bias towards patterns with no fixed letters, given I
realised when using such a routine for problem #3, no fixed letters
would ever be passed, unless you use the routine again rather than
filter/test the list it already gave you.

Any comments Rob? Does the 90% rule for problem 3 apply to the
patterns for problem 2 as well, or not?

Oh well, we'll see. Down to luck again I see, C.K.

Pete

Here are my (disappointing) results:
constant testset= {
--# pattern			matches	tone		chris	pete		
{1,2,3,6,4,7,3,10,4,8,7,'S'},	1,	1.58,		0.28,		0.57,		
{'A','P',1,2,3},				6,	3.22,		1.43,		51.45,	*
{'A', 'P', 'P', 1, 2},			2,	0.35,		0.39,		2.3,
{'E',7,8,9,7,9},				0,	0.96,		0.16,		0.54,
{1,2,'C','K',1},				3,	1.8,		3.9,		3.12,
{4,'I','E'},					7,	3.64,		0.66,		10.95,	*
"RABBIT",				1,	2.48,		0.11,		8.52,
{'T',1,2},					40, 	1.12,		0.44,		10.49,	*
{1,2,3,'S'},				283,	10.75,	2.63,		35.76,	*
{1,2,3,4,'E','D'},			331,	55.39,	5.2,		66.66,	*
}
--					totals  	 81.29,  	15.2,		190.36!!!

new topic     » goto parent     » topic index » view message » categorize

5. Re: contest #2

----- Original Message ----- 
From: <bensler at mail.com>

> Yes please,

I'll send it private.

>    I improved the last posted version of EumsHash() by changing the 
> multiplier from 3 to 8(binary multiple). It gave me about a 5% to 10% 
> increase calling the routine.

That was one of my changes because the multiplier was originally "16"
the reason I set this to "3" was because of integer overflow and when you
set this to "8" I just cant see how it helped speed you up any.

I profiled the routine and stuck in "3" so "h" would be less than #0FFFFFFF

so this code would not be executed as often.

 if h > #0FFFFFFF then
           poke4(m,h)
           g = and_bits(peek(m), #F0)
           h = xor_bits(h, g)
           h = and_bits(h, not_bits(g))
 end if

Im very surprised this helped you. Did you profile the routine set to "8"
how often was the code executed (above) more than when its set at "3"? 

>    However, if I don't fix the lookup for wild filters (filters with no 
> absolutes), I doubt I will be winning. As is, I would be surprised if no 
> one else is close or beating me. I know Derek is beating me. 
> Fortunately, he's not competing :P
> 
> Chris

Euman

new topic     » goto parent     » topic index » view message » categorize

6. Re: contest #2

Didn't Robert say something about posting actual code is not allowed.

I don't hash the dictionary words, just their patterns. Also, I don't use an
algorithm that looks
anything like Euman's. The "alphabet" that I work with is thus in the range 1 -
32. My approach is
to pre-calculate as much as possible at start up, so that inside the inner
loops, I don't do any
unnecessary calcs. Also, I try to avoid 'expensive' compuations like multiply
and divide and nested
sequence references. And append() is almost totally avoided, but when I do have
to, I append more
than I need so that in the next few iterations I can avoid doing another append.

I organise the dicionary by pattern. Note that over 95% of the patterns in the
dictionary have less
than 20 words in them, and only a few have more than 2000 words.

----------
Derek.


18/03/2002 10:57:57 AM, "C. K. Lester" <cklester at yahoo.com> wrote:

>
>
>euman at bellsouth.net wrote:
>> 
>> I'd like to see Derek & Eumans Hash Function take
>> the winners circle..
>
>Okay, Derek or Euman, how about posting your latest hash algorithm?
>
>Thanks!
>ck
>
>
>
>
---------
Cheers,
Derek Parnell 
ICQ# 7647806

new topic     » goto parent     » topic index » view message » categorize

7. Re: contest #2

I'm using only my own code and I didn't do any research on the internet and
don't have any experience with hash tables so my results are pretty good. :)
I also don't use any and_bits () and poke () functions like EumsHash does,
and I use gets (), so some optimizing can be possible here.

I improved it a little, now it's faster than yours for that test, I get
11.27 seconds! :)

The question is, will Rob run it on 10.000 patterns or on 1000. Cause if
it's
gonna be on 1000 then my file load time is too slow.

1.98 - file load time, improved
9.29 - matching patterns
11.27 - total
0.00023225 - average

match_words = match_pattern ({1,2,3,6,4,7,3,10,4,8,7,'S'}) -- 0.28 sec for
10.000
match_words = match_pattern ({'A','P',1,2,3}) -- 0.34 sec for 10.000
match_words = match_pattern ({'A', 'P', 'P', 1, 2}) -- 0.18 sec for 10.000
match_words = match_pattern ({'E',7,8,9,7,9}) -- 0.22 sec for 10.000
match_words = match_pattern ({1,2,'C','K',1})  -- 0.22 sec for 10.000
match_words = match_pattern ({4,'i','e'}) -- 0.65 sec for 10.000
match_words = match_pattern ("rabbit") -- 0.39 sec for 10.000
match_words = match_pattern ({'t',1,2}) -- 0.29 sec for 10.000
match_words = match_pattern ({1,2,3,'s'}) -- 1.46 sec for 10.000
match_words = match_pattern ({1,2,3,4,'e','d'}) -- 11.6 sec for 10.000 !!!
match_words = match_pattern ({1,2,3,4,5,6,7}) -- 35.17 sec for 10.000 !!!
match_words = match_pattern ({1,2,3,4}) -- 25.16 sec for 10.000 !!!
match_words = match_pattern ({1,'V',1,'R',1,2,2}) -- 0.15 sec for 10.000

----- Original Message -----
From: <bensler at mail.com>
To: "EUforum" <EUforum at topica.com>
Sent: Sunday, March 17, 2002 7:39 PM
Subject: RE: contest #2


>
> My contest#2 routine is pure EU. No memory addressing.
>
> I'll give you one hint :) I used Derek/Eumans hash function.
>
> If I explain how my data is stored, I'll give away my method :P
>
> Chris
>
>
> tone.skoda at siol.net wrote:
> > If I run that test (Duron700 128MB RAM) I get these results:
> >
> > 2.2 - file load time
> > 23.43 - matching patterns
> > 25.63 - total
> > 0.000586 - average
> >
> > Are you even storing all your data in sequences, or are you poking it in
> > memory?
> > I organize data in two non-nested (1-level nested?) sequences, or better
> > said two sequences which have sequences which have numbers.
> >
> > ----- Original Message -----
> > From: <bensler at mail.com>
> > To: "EUforum" <EUforum at topica.com>
> > Sent: Sunday, March 17, 2002 4:03 PM
> > Subject: RE: contest #2
> >
> >
> > > Here is my complete test, using Derek's last list.
> > >
> > > It takes a total of 14.77 seconds to run this on my machine.
> > >
> > > <TEST>
> > > sequence words
> > >
> > > atom t,t2   t=time()
> > > include match_pattern.e
> > >
> > > t2 = time()-t
> > > ? t2
> > >
> > > for i = 1 to 1000 do
> > >    words = match_pattern({1,1,2,3,4})
> > >    words = match_pattern({'E', 1, 2, 3, 1, 3})
> > >    words = match_pattern({1,'E',2,3,4,'S'})
> > >    words = match_pattern({1,1})
> > >    words = match_pattern({1,'R',2,'R',3})
> > >    words = match_pattern({1})
> > >    words = match_pattern({1,2,3})
> > >    words = match_pattern({4,'I','E'})
> > >    words = match_pattern({1,'E',2, 'E', 3, 'E'})
> > >    words = match_pattern({1,2,2,3,'S'})
> > >    words = match_pattern("RABBIT")
> > >    words = match_pattern("R" & {1,1,2})
> > >    words = match_pattern({'T',1,2})
> > >    words = match_pattern({1,2,3,'S'})
> > >    words = match_pattern({1,1,2,2})
> > >    words = match_pattern({'B','E','A','R'})
> > >    words = match_pattern({1,2,3,4,'E','D'})
> > >    words = match_pattern({1,2,3,4,5,6,7})
> > >    words = match_pattern({1,2,3,1,4,3,3,2,5,3})
> > >    words = match_pattern({1,2,3,4,5,4,3,2,1})
> > >    words = match_pattern({1,2,'X'})
> > >    words = match_pattern({'M',1,2,3,4,5})
> > >    words = match_pattern({1,2,3,4,5,6,7,8,9,10})
> > >    words = match_pattern({'E',1,1,2})
> > >    words = match_pattern({1,2,2,1,3})
> > >    words = match_pattern({1,1,2})
> > >    words = match_pattern({1,2,1})
> > >    words = match_pattern({'M',1,2,1,'M'})
> > >    words = match_pattern("MARTIN")
> > >    words = match_pattern({1,2,'X',2,1})
> > >    words = match_pattern({1,2,3,'B',3,4})
> > >    words = match_pattern({1,2,'M',2,1})
> > >    words = match_pattern({9,8,7,6,7,8,9})
> > >    words = match_pattern({'E',8,7,6,7,8,9})
> > >    words = match_pattern({9,'E',7,6,7,8,9})
> > >    words = match_pattern({9,8,'E',6,7,8,9})
> > >    words = match_pattern({9,8,7,'E',7,8,9})
> > >    words = match_pattern({9,8,7,6,'E',8,9})
> > >    words = match_pattern({9,8,7,6,7,'E',9})
> > >    words = match_pattern({9,8,7,6,7,8,'E'})
> > > end for
> > >
> > > ? time()-t-t2
> > > ? time()-t
> > > ? (time()-t-t2)/1000/40
> > >
> > > for i = 1 to length(words) do
> > > --   puts(1,words[i]&"\n") -- uncomment this, to list results
> > > end for
> > > while get_key()=-1 do end while
> > > <END TEST>
> > >
> > > C. K. Lester wrote:
> > > > Are we still using this list for benchmarking?
> > > >
> > > > {1,1,2,3,4},
> > > > {'E', 1, 2, 3, 1, 3},
<snip>

>
>
>

new topic     » goto parent     » topic index » view message » categorize

8. Re: contest #2

Robert never said we couldnt use a second dictionary only that we
must use Junko's dictionary so I'll add this (scrabble) anagram dictionary
that may help some of you. The things one can find on the I-Net
is amazing!

http://www.orchy.com/dictionary/anagrams.htm

Euman
euman at bellsouth.net

Q: Are we monetarily insane?
A: YES
----- Original Message ----- 
From: "C. K. Lester" <cklester at yahoo.com>
To: "EUforum" <EUforum at topica.com>
Sent: Sunday, March 17, 2002 6:57 PM
Subject: RE: contest #2


> 
> 
> euman at bellsouth.net wrote:
> > 
> > I'd like to see Derek & Eumans Hash Function take
> > the winners circle..
> 
> Okay, Derek or Euman, how about posting your latest hash algorithm?
> 
> Thanks!
> ck
> 
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

9. Re: contest #2

Hello,

I made a copy of the words in the anagram dictionary that i posted a link to.
3017 words (some being duplicate) that are in the anagram dictionary an 
not in junko's dictionary.

I ran this test with my own spellchecker in 0.90 sec that uses EumsHash( )

anagram.txt file is 141k
w/ words not in junko's dictionary being 24.9k (some duplicate)

So anyone that claims EumsHash( ) isnt very fast "Think again"

Junko's checked the file in 1.82 sec more than twice as slow for the same file.
Junko's still orders the words and eliminates the duplicates so I'll see about
fixing this. I still feel like I can beat her when finished.

Euman
euman at bellsouth.net

----- Original Message ----- 
From: <euman at bellsouth.net>
> 
> Robert never said we couldnt use a second dictionary only that we
> must use Junko's dictionary so I'll add this (scrabble) anagram dictionary
> that may help some of you. The things one can find on the I-Net
> is amazing!
> 
> http://www.orchy.com/dictionary/anagrams.htm
> 
> Euman
> euman at bellsouth.net

new topic     » goto parent     » topic index » view message » categorize

10. Re: contest #2

From: <bensler at mail.com>

> Contest#2 only requires the routine to be able to match a pattern 
> against words.txt, why add more words ?:P
> 
> Chris

I was commenting on the routine in general because it seem'd to be
under the microscope for speed. Im not writing for #2 either.

EumsHash( ) is versitile and very fast!

It will be hard to beat in the dictionary hash arena!
Load time is slightly slower than most but lookup
is extremely powerfull and fast because of the key
returned based on strings being so individual.

Chris hit on something earlier that changing the multiplier
will create values that will be different. The higher the value
the more unique the hash value will be but the slower the
EumsHash( ) algorithm will be. At present the multiplier of 3
is the fastest method (based on profile and profile_time)

Euman
euman at bellsouth.net

new topic     » goto parent     » topic index » view message » categorize

11. Re: contest #2

On 17 Mar 2002, at 22:03, euman at bellsouth.net wrote:

> 
> Hello,
> 
> I made a copy of the words in the anagram dictionary that i posted a link to.
> 3017 words (some being duplicate) that are in the anagram dictionary an not in
> junko's dictionary.

I mined the Mirriam Webster site over a yr ago, got 127,949 words (with 
definitions), i could send the words to anyone, i think, without any copyright 
violations. That's 20,000 words more than in Wordnet 1.7.

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu