1. RE: contest #2

Derek is still beating me, but here are my results.

Chris
PIII 600mhz w/64MB

---------------------------------------
Load time: 1.37

(pattern and solutions) - (seconds)
---------------------------------------
{1,2,3,6,4,7,3,10,4,8,7,'S'} 0.28
EXAMINATIONS

{'A','P',1,2,3}   1.43
APHID
APHIS
APORT
APRIL
APRON
APTLY

{'A', 'P', 'P', 1, 2} 0.39
APPLE
APPLY

{'E',7,8,9,7,9}   0.16
No matched words.

{1,2,'C','K',1}   3.9
SACKS
SOCKS
SUCKS

{4,'i','e'}   0.66
DIE
FIE
HIE
LIE
PIE
TIE
VIE

"rabbit"   0.11
RABBIT

{'t',1,2}   0.44
(40 matched words)

{1,2,3,'s'}   2.63
(283 matched words)

{1,2,3,4,'e','d'} 5.2
(331 matched words)

{1,2,3,4,5,6,7} 38.5
(3095 matches)

{1,2,3,4} 22.3
(2252 matches)

new topic     » topic index » view message » categorize

2. RE: contest #2

> Derek is still beating me, but here are my results.
> 
> Chris
> PIII 600mhz w/64MB

Chris, all mine are found in under a second... but it may be the 
hardware. I'm wondering if I could send you my code and just let you run 
it on your PC so I can get better relative results...?

> {1,2,3,4,5,6,7} 38.5
> (3095 matches)

Even doing the search with this pattern 100 times ends up being less 
than one second. I don't know how much faster my PC should be, having an 
AMD Athlon XP 1.7Ghz with 512Mb RAM... didn't we measure that already? 
:)

Or maybe Rob would be willing to give us reports on submitted code run 
on his PC? Or maybe somebody else in neutral territory? :)

new topic     » goto parent     » topic index » view message » categorize

3. RE: contest #2

Are we still using this list for benchmarking?

{1,1,2,3,4},
{'E', 1, 2, 3, 1, 3},
{1,'E',2,3,4,'S'},
{1,1},
{1},
{1,2,3},
{4,'I','E'},
{1,'E',2, 'E', 3, 'E'},
{1,2,2,3,'S'},
"RABBIT",
"R" & {1,1,2},
{'T',1,2},
{1,2,3,'S'},
{1,1,2,2},
{1,2,2,1},
{1,2,3,4,'E','D'},
{1,2,3,4,5,6,7},
{1,2,3,1,4,3,3,2,5,3},
{1,2,3,4,5,4,3,2,1},
{1,2,'X'},
{'M',1,2,3,4,5},
{1,2,3,4,5,6,7,8,9,10},
{'E',1,1,2},
{1,2,2,1,3},
{1,1,2},
{1,2,1},
{'M',1,2,1,'M'},
"MARTIN",
{1,2,'X',2,1},
{1,2,3,'B',3,4},
{1,2,'M',2,1},
{'E',1,2,3,1,3}

new topic     » goto parent     » topic index » view message » categorize

4. RE: contest #2

Shroud it and I can test it.

The times I posted are for 10000 iterations. IE. it takes 38.5 seconds 
to iterate match_pattern({1,2,3,4,5,6,7}) 10000 times.

I'm working on an optimization for that case.

Chris


C. K. Lester wrote:
> > Derek is still beating me, but here are my results.
> > 
> > Chris
> > PIII 600mhz w/64MB
> 
> Chris, all mine are found in under a second... but it may be the 
> hardware. I'm wondering if I could send you my code and just let you run 
> 
> it on your PC so I can get better relative results...?
> 
> > {1,2,3,4,5,6,7} 38.5
> > (3095 matches)
> 
> Even doing the search with this pattern 100 times ends up being less 
> than one second. I don't know how much faster my PC should be, having an 
> 
> AMD Athlon XP 1.7Ghz with 512Mb RAM... didn't we measure that already? 
> :)
> 
> Or maybe Rob would be willing to give us reports on submitted code run 
> on his PC? Or maybe somebody else in neutral territory? :)
> 
>

new topic     » goto parent     » topic index » view message » categorize

5. RE: contest #2

Here is my complete test, using Derek's last list.

It takes a total of 14.77 seconds to run this on my machine.

<TEST>
sequence words

atom t,t2   t=time()
include match_pattern.e

t2 = time()-t
? t2

for i = 1 to 1000 do
   words = match_pattern({1,1,2,3,4})
   words = match_pattern({'E', 1, 2, 3, 1, 3})
   words = match_pattern({1,'E',2,3,4,'S'})
   words = match_pattern({1,1})
   words = match_pattern({1,'R',2,'R',3})
   words = match_pattern({1})
   words = match_pattern({1,2,3})
   words = match_pattern({4,'I','E'})
   words = match_pattern({1,'E',2, 'E', 3, 'E'})
   words = match_pattern({1,2,2,3,'S'})
   words = match_pattern("RABBIT")
   words = match_pattern("R" & {1,1,2})
   words = match_pattern({'T',1,2})
   words = match_pattern({1,2,3,'S'})
   words = match_pattern({1,1,2,2})
   words = match_pattern({'B','E','A','R'})
   words = match_pattern({1,2,3,4,'E','D'})
   words = match_pattern({1,2,3,4,5,6,7})
   words = match_pattern({1,2,3,1,4,3,3,2,5,3})
   words = match_pattern({1,2,3,4,5,4,3,2,1})
   words = match_pattern({1,2,'X'})
   words = match_pattern({'M',1,2,3,4,5})
   words = match_pattern({1,2,3,4,5,6,7,8,9,10})
   words = match_pattern({'E',1,1,2})
   words = match_pattern({1,2,2,1,3})
   words = match_pattern({1,1,2})
   words = match_pattern({1,2,1})
   words = match_pattern({'M',1,2,1,'M'})
   words = match_pattern("MARTIN")
   words = match_pattern({1,2,'X',2,1})
   words = match_pattern({1,2,3,'B',3,4})
   words = match_pattern({1,2,'M',2,1})
   words = match_pattern({9,8,7,6,7,8,9})
   words = match_pattern({'E',8,7,6,7,8,9})
   words = match_pattern({9,'E',7,6,7,8,9})
   words = match_pattern({9,8,'E',6,7,8,9})
   words = match_pattern({9,8,7,'E',7,8,9})
   words = match_pattern({9,8,7,6,'E',8,9})
   words = match_pattern({9,8,7,6,7,'E',9})
   words = match_pattern({9,8,7,6,7,8,'E'})
end for

? time()-t-t2
? time()-t
? (time()-t-t2)/1000/40

for i = 1 to length(words) do
--   puts(1,words[i]&"\n") -- uncomment this, to list results
end for
while get_key()=-1 do end while
<END TEST>

C. K. Lester wrote:
> Are we still using this list for benchmarking?
> 
> {1,1,2,3,4},
> {'E', 1, 2, 3, 1, 3},
> {1,'E',2,3,4,'S'},
> {1,1},
> {1},
> {1,2,3},
> {4,'I','E'},
> {1,'E',2, 'E', 3, 'E'},
> {1,2,2,3,'S'},
> "RABBIT",
> "R" & {1,1,2},
> {'T',1,2},
> {1,2,3,'S'},
> {1,1,2,2},
> {1,2,2,1},
> {1,2,3,4,'E','D'},
> {1,2,3,4,5,6,7},
> {1,2,3,1,4,3,3,2,5,3},
> {1,2,3,4,5,4,3,2,1},
> {1,2,'X'},
> {'M',1,2,3,4,5},
> {1,2,3,4,5,6,7,8,9,10},
> {'E',1,1,2},
> {1,2,2,1,3},
> {1,1,2},
> {1,2,1},
> {'M',1,2,1,'M'},
> "MARTIN",
> {1,2,'X',2,1},
> {1,2,3,'B',3,4},
> {1,2,'M',2,1},
> {'E',1,2,3,1,3}
> 
>

new topic     » goto parent     » topic index » view message » categorize

6. RE: contest #2

My contest#2 routine is pure EU. No memory addressing.

I'll give you one hint :) I used Derek/Eumans hash function.

If I explain how my data is stored, I'll give away my method :P

Chris


tone.skoda at siol.net wrote:
> If I run that test (Duron700 128MB RAM) I get these results:
> 
> 2.2 - file load time
> 23.43 - matching patterns
> 25.63 - total
> 0.000586 - average
> 
> Are you even storing all your data in sequences, or are you poking it in
> memory?
> I organize data in two non-nested (1-level nested?) sequences, or better
> said two sequences which have sequences which have numbers.
> 
> ----- Original Message -----
> From: <bensler at mail.com>
> To: "EUforum" <EUforum at topica.com>
> Sent: Sunday, March 17, 2002 4:03 PM
> Subject: RE: contest #2
> 
> 
> > Here is my complete test, using Derek's last list.
> >
> > It takes a total of 14.77 seconds to run this on my machine.
> >
> > <TEST>
> > sequence words
> >
> > atom t,t2   t=time()
> > include match_pattern.e
> >
> > t2 = time()-t
> > ? t2
> >
> > for i = 1 to 1000 do
> >    words = match_pattern({1,1,2,3,4})
> >    words = match_pattern({'E', 1, 2, 3, 1, 3})
> >    words = match_pattern({1,'E',2,3,4,'S'})
> >    words = match_pattern({1,1})
> >    words = match_pattern({1,'R',2,'R',3})
> >    words = match_pattern({1})
> >    words = match_pattern({1,2,3})
> >    words = match_pattern({4,'I','E'})
> >    words = match_pattern({1,'E',2, 'E', 3, 'E'})
> >    words = match_pattern({1,2,2,3,'S'})
> >    words = match_pattern("RABBIT")
> >    words = match_pattern("R" & {1,1,2})
> >    words = match_pattern({'T',1,2})
> >    words = match_pattern({1,2,3,'S'})
> >    words = match_pattern({1,1,2,2})
> >    words = match_pattern({'B','E','A','R'})
> >    words = match_pattern({1,2,3,4,'E','D'})
> >    words = match_pattern({1,2,3,4,5,6,7})
> >    words = match_pattern({1,2,3,1,4,3,3,2,5,3})
> >    words = match_pattern({1,2,3,4,5,4,3,2,1})
> >    words = match_pattern({1,2,'X'})
> >    words = match_pattern({'M',1,2,3,4,5})
> >    words = match_pattern({1,2,3,4,5,6,7,8,9,10})
> >    words = match_pattern({'E',1,1,2})
> >    words = match_pattern({1,2,2,1,3})
> >    words = match_pattern({1,1,2})
> >    words = match_pattern({1,2,1})
> >    words = match_pattern({'M',1,2,1,'M'})
> >    words = match_pattern("MARTIN")
> >    words = match_pattern({1,2,'X',2,1})
> >    words = match_pattern({1,2,3,'B',3,4})
> >    words = match_pattern({1,2,'M',2,1})
> >    words = match_pattern({9,8,7,6,7,8,9})
> >    words = match_pattern({'E',8,7,6,7,8,9})
> >    words = match_pattern({9,'E',7,6,7,8,9})
> >    words = match_pattern({9,8,'E',6,7,8,9})
> >    words = match_pattern({9,8,7,'E',7,8,9})
> >    words = match_pattern({9,8,7,6,'E',8,9})
> >    words = match_pattern({9,8,7,6,7,'E',9})
> >    words = match_pattern({9,8,7,6,7,8,'E'})
> > end for
> >
> > ? time()-t-t2
> > ? time()-t
> > ? (time()-t-t2)/1000/40
> >
> > for i = 1 to length(words) do
> > --   puts(1,words[i]&"\n") -- uncomment this, to list results
> > end for
> > while get_key()=-1 do end while
> > <END TEST>
> >
> > C. K. Lester wrote:
> > > Are we still using this list for benchmarking?
> > >
> > > {1,1,2,3,4},
> > > {'E', 1, 2, 3, 1, 3},
> > > {1,'E',2,3,4,'S'},
> > > {1,1},
> > > {1},
> > > {1,2,3},
> > > {4,'I','E'},
> > > {1,'E',2, 'E', 3, 'E'},
> > > {1,2,2,3,'S'},
> > > "RABBIT",
> > > "R" & {1,1,2},
<snip>

new topic     » goto parent     » topic index » view message » categorize

7. RE: contest #2

Yes please,

   I improved the last posted version of EumsHash() by changing the 
multiplier from 3 to 8(binary multiple). It gave me about a 5% to 10% 
increase calling the routine.

   However, if I don't fix the lookup for wild filters (filters with no 
absolutes), I doubt I will be winning. As is, I would be surprised if no 
one else is close or beating me. I know Derek is beating me. 
Fortunately, he's not competing :P

Chris


euman at bellsouth.net wrote:
> ----- Original Message ----- 
> From: <bensler at mail.com>
> 
> > I'll give you one hint :) I used Derek/Eumans hash function.
> > Chris
> 
> I'd like to see Derek & Eumans Hash Function take the winners circle..
> 
> Chris, I have a faster version if you would like to have it.
> 
> Euman
> 
>

new topic     » goto parent     » topic index » view message » categorize

8. RE: contest #2

I shouldn't say that it's faster. It depends on the duration of 
execution. <5000 total iterations, then 3 is faster. > 5000 total, then 
8 is better.
8 takes longer to load, but lookups are a bit faster.

Using my test, at 4000 total iterations (100 * 40 tests):
3 gives a total time of 2.69
8 gives a total time of 2.74

Using my test, at 40000 total iterations (1000 * 40 tests):
3 gives a total time of 15.32
8 gives a total time of 14.78

Those are consistent results.

So, depending on what kind of testing Rob will be doing, will depend 
what I use. He has said he will be doing about 1000 iterations, but 
using how many tests? 1000 isn't enough to be accurate.

I'm assuming there will be more than 10000 total calls to the routine. 
In which case, 8 is better.

Chris

euman at bellsouth.net wrote:
> ----- Original Message ----- 
> From: <bensler at mail.com>
> 
> > Yes please,
> 
> I'll send it private.
> 
> >    I improved the last posted version of EumsHash() by changing the 
> > multiplier from 3 to 8(binary multiple). It gave me about a 5% to 10% 
> > increase calling the routine.
> 
> That was one of my changes because the multiplier was originally "16"
> the reason I set this to "3" was because of integer overflow and when 
> you
> set this to "8" I just cant see how it helped speed you up any.
> 
> I profiled the routine and stuck in "3" so "h" would be less than 
> #0FFFFFFF
> 
> so this code would not be executed as often.
> 
>  if h > #0FFFFFFF then
>            poke4(m,h)
>            g = and_bits(peek(m), #F0)
>            h = xor_bits(h, g)
>            h = and_bits(h, not_bits(g))
>  end if
> 
> Im very surprised this helped you. Did you profile the routine set to 
> "8"
> how often was the code executed (above) more than when its set at "3"? 
> 
> >    However, if I don't fix the lookup for wild filters (filters with no 
> > absolutes), I doubt I will be winning. As is, I would be surprised if no 
> > 
> > one else is close or beating me. I know Derek is beating me. 
> > Fortunately, he's not competing :P
> > 
> > Chris
> 
> Euman
> 
>

new topic     » goto parent     » topic index » view message » categorize

9. RE: contest #2

I really didn't do any thoruough testing for this conclusion. My 
conclusion was based on the fact that 8 doesn't overflow (16 does), and 
8 is a binary multiple, where 3 isn't.

Because it's a binary multiple, EU optimises it to a bit shift, instead 
of a true multiply. 3 doesn't get optimised.

I originally tried 4, but it seems to be slower than both 3 and 8.
2 doesn't provide enough variance, and I end up with too many 
collisions.
I think 8 disperses slightly better than the others too. Ending up with 
a more even distribution of the hashed items.


Chris


euman at bellsouth.net wrote:
> ----- Original Message ----- 
> From: <bensler at mail.com>
> 
> > Yes please,
> 
> I'll send it private.
> 
> >    I improved the last posted version of EumsHash() by changing the 
> > multiplier from 3 to 8(binary multiple). It gave me about a 5% to 10% 
> > increase calling the routine.
> 
> That was one of my changes because the multiplier was originally "16"
> the reason I set this to "3" was because of integer overflow and when 
> you
> set this to "8" I just cant see how it helped speed you up any.
> 
> I profiled the routine and stuck in "3" so "h" would be less than 
> #0FFFFFFF
> 
> so this code would not be executed as often.
> 
>  if h > #0FFFFFFF then
>            poke4(m,h)
>            g = and_bits(peek(m), #F0)
>            h = xor_bits(h, g)
>            h = and_bits(h, not_bits(g))
>  end if
> 
> Im very surprised this helped you. Did you profile the routine set to 
> "8"
> how often was the code executed (above) more than when its set at "3"? 
> 
> >    However, if I don't fix the lookup for wild filters (filters with no 
> > absolutes), I doubt I will be winning. As is, I would be surprised if no 
> > 
> > one else is close or beating me. I know Derek is beating me. 
> > Fortunately, he's not competing :P
> > 
> > Chris
> 
> Euman
> 
>

new topic     » goto parent     » topic index » view message » categorize

10. RE: contest #2

euman at bellsouth.net wrote:
> 
> I'd like to see Derek & Eumans Hash Function take
> the winners circle..

Okay, Derek or Euman, how about posting your latest hash algorithm?

Thanks!
ck

new topic     » goto parent     » topic index » view message » categorize

11. RE: contest #2

It said solutions to the problems should not be posted to the list.

Chris

Derek Parnell wrote:
> Didn't Robert say something about posting actual code is not allowed.
> 
> I don't hash the dictionary words, just their patterns. Also, I don't 
> use an algorithm that looks 
> anything like Euman's. The "alphabet" that I work with is thus in the 
> range 1 - 32. My approach is 
> to pre-calculate as much as possible at start up, so that inside the 
> inner loops, I don't do any 
> unnecessary calcs. Also, I try to avoid 'expensive' compuations like 
> multiply and divide and nested 
> sequence references. And append() is almost totally avoided, but when I 
> do have to, I append more 
> than I need so that in the next few iterations I can avoid doing another 
> append.
> 
> I organise the dicionary by pattern. Note that over 95% of the patterns 
> in the dictionary have less 
> than 20 words in them, and only a few have more than 2000 words.
> 
> ----------
> Derek.
> 
> 
> 18/03/2002 10:57:57 AM, "C. K. Lester" <cklester at yahoo.com> wrote:
> 
> >
> >euman at bellsouth.net wrote:
> >> 
> >> I'd like to see Derek & Eumans Hash Function take
> >> the winners circle..
> >
> >Okay, Derek or Euman, how about posting your latest hash algorithm?
> >
> >Thanks!
> >ck
> >
> >
> ---------
> Cheers,
> Derek Parnell 
> ICQ# 7647806
> 
>

new topic     » goto parent     » topic index » view message » categorize

12. RE: contest #2

Derek Parnell wrote:
> Didn't Robert say something about posting actual
> code is not allowed.

The code you would be posting, however, could just as easily be posted 
to the contributions page to be used by everyone... I'm using some of 
Kat's strtok.e; it's generic code that is readily available. That is how 
I would classify your code in this instance. Your code is not the 
algorithm that will solve the problem, but it can be a component that 
can be used (just like Kat's code).

If you don't want to post it here, will you provide it privately?

> I don't hash the dictionary words, just their patterns.

I actually don't care how it's done... in fact, you could shroud your 
source and let me use it as an include. I'm not entering contest #2, but 
I'd like my contest #3 algorithm to be faster, not because speed will 
determine the winner, but because it will help me more quickly refine my 
deciphering algorithm. Whether it takes 7 seconds or 70 seconds to 
return the deciphered text won't matter in contest #3, so it's mainly 
for development's sake.

However, I certainly don't want to break the spirit of this competition, 
so I'll submit to Rob's ruling(s) in this regard.

new topic     » goto parent     » topic index » view message » categorize

13. RE: contest #2

Chris (bensler at mail.com) wrote:
> It said solutions to the problems should not be posted to the list.

True. Is Derek's code (or yours, or Euman's) a solution to any of the 
contests? If so, they shouldn't be posted.

> Derek Parnell wrote:
> > Didn't Robert say something about posting
> > actual code is not allowed.

new topic     » goto parent     » topic index » view message » categorize

14. RE: contest #2

Contest#2 only requires the routine to be able to match a pattern 
against words.txt, why add more words ?:P

Chris

euman at bellsouth.net wrote:
> Hello,
> 
> I made a copy of the words in the anagram dictionary that i posted a 
> link to.
> 3017 words (some being duplicate) that are in the anagram dictionary an 
> not in junko's dictionary.
> 
> I ran this test with my own spellchecker in 0.90 sec that uses EumsHash( 
> )
> 
> anagram.txt file is 141k
> w/ words not in junko's dictionary being 24.9k (some duplicate)
> 
> So anyone that claims EumsHash( ) isnt very fast "Think again"
> 
> Junko's checked the file in 1.82 sec more than twice as slow for the 
> same file.
> Junko's still orders the words and eliminates the duplicates so I'll see 
> about
> fixing this. I still feel like I can beat her when finished.
> 
> Euman
> euman at bellsouth.net
> 
> ----- Original Message ----- 
> From: <euman at bellsouth.net>
> > 
> > Robert never said we couldnt use a second dictionary only that we
> > must use Junko's dictionary so I'll add this (scrabble) anagram 
> > dictionary
> > that may help some of you. The things one can find on the I-Net
> > is amazing!
> > 
> > http://www.orchy.com/dictionary/anagrams.htm
> > 
> > Euman
> > euman at bellsouth.net
> 
>

new topic     » goto parent     » topic index » view message » categorize

15. RE: contest #2

I think maybe the difference is because I'm not using the table 
structure that the hash function is designed for.

I use Eumshash, and use remainder(hKey,Prime) to reference into my 
table. I don't have a perfect hash, but because of the dispersment I 
have in my table. There are no collisions.

I think that for me, using 8 provides a more unique key, creating a more 
balanced fill of the bins. Because of the better balance, I get better 
lookups.

Chris

euman at bellsouth.net wrote:
> From: <bensler at mail.com>
> 
> > Contest#2 only requires the routine to be able to match a pattern 
> > against words.txt, why add more words ?:P
> > 
> > Chris
> 
> I was commenting on the routine in general because it seem'd to be
> under the microscope for speed. Im not writing for #2 either.
> 
> EumsHash( ) is versitile and very fast!
> 
> It will be hard to beat in the dictionary hash arena!
> Load time is slightly slower than most but lookup
> is extremely powerfull and fast because of the key
> returned based on strings being so individual.
> 
> Chris hit on something earlier that changing the multiplier
> will create values that will be different. The higher the value
> the more unique the hash value will be but the slower the
> EumsHash( ) algorithm will be. At present the multiplier of 3
> is the fastest method (based on profile and profile_time)
> 
> Euman
> euman at bellsouth.net
> 
>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu