OpenEuphoria: Forum: RE: $100 Contest Question

1. RE: $100 Contest Question

Posted by "C. K. Lester" <cklester at yahoo.com> Mar 02, 2002
574 views

Kat wrote:
> 
> My new question: It's taking me 143 seconds to load a
> dictionary with the following code:

Kat, I'm loading Junko's 50,000-word dictionary in less than a second. 
What dictionary are YOU using?! :D

new topic » topic index » view message » categorize

2. RE: $100 Contest Question

Posted by Kat <gertie at PELL.NET> Mar 02, 2002
559 views

On 3 Mar 2002, at 2:52, C. K. Lester wrote:

> 
> Kat wrote:
> > 
> > My new question: It's taking me 143 seconds to load a
> > dictionary with the following code:
> 
> Kat, I'm loading Junko's 50,000-word dictionary in less than a second. 
> What dictionary are YOU using?! :D

One i threw together and premunged just for this. Unfortunately, it's making a 
4.8Meg file out of the dictionaries, even after deleting all the trailing
spaces,
CR, LF, and duplicates. Using:

  writefile = open(dfilename,"wb")
  print(writefile,dictionary)
  close(writefile)

puts this stuff into the file:

{{{50},{67},{71},{72},{75},{77},{78},{84},{88},{65},{73},{65},{66},{66},{67},{68},{68}
, etc, which is just a listing of the letters of the alphabet, which is what i 
wanted at that point. But i didn't need it represented that way, i wanted it 
done in a way Eu could reload it instantly,, like,, umm, in 
D:\Euphoria\DEMO\MYDATA.EX. As it is saved tho, it is taking up 5x as 
much space and load time as needed. Any ideas?

Kat

new topic » goto parent » topic index » view message » categorize

3. RE: $100 Contest Question

Posted by bensler at mail.com Mar 02, 2002
551 views

Derek, are you using a preformatted version of words.txt, or are you 
formatting it within your program? It sounds like the latter, which I 
don't think is allowed.

Chris


Derek Parnell wrote:
> ----- Original Message -----
> From: "Kat" <gertie at PELL.NET>
> To: "EUforum" <EUforum at topica.com>
> Sent: Sunday, March 03, 2002 2:39 PM
> Subject: RE: $100 Contest Question
> 
> 
> > On 3 Mar 2002, at 2:52, C. K. Lester wrote:
> >
> > >
> > > Kat wrote:
> > > >
> > > > My new question: It's taking me 143 seconds to load a
> > > > dictionary with the following code:
> > >
> > > Kat, I'm loading Junko's 50,000-word dictionary in less than a second.
> > > What dictionary are YOU using?! :D
> >
> > One i threw together and premunged just for this. Unfortunately, it's
> making a
> > 4.8Meg file out of the dictionaries, even after deleting all the 
> > trailing
> spaces,
> > CR, LF, and duplicates. Using:
> >
> >   writefile = open(dfilename,"wb")
> >   print(writefile,dictionary)
> >   close(writefile)
> >
> > puts this stuff into the file:
> >
> >
> {{{50},{67},{71},{72},{75},{77},{78},{84},{88},{65},{73},{65},{66},{66},{67}
> 
> ,{68},{68}
> > , etc, which is just a listing of the letters of the alphabet, which is
> what i
> > wanted at that point. But i didn't need it represented that way, i 
> > wanted
> it
> > done in a way Eu could reload it instantly,, like,, umm, in
> > D:\Euphoria\DEMO\MYDATA.EX. As it is saved tho, it is taking up 5x as
> > much space and load time as needed. Any ideas?
> 
> That is exactly why print() and get() should rarely be used. They are
> extremely inefficient. In my version of reformatting Junko's WORDS.TXT, 
> I
> went from the original 508,190 bytes to 531,828 bytes. In speed 
> differences,
> it takes my about 6 seconds to use WORDS.TXT to build the internal
> dictionary, and about 0.5 seconds to build it using the reformatted
> words.txt.
> 
> For speed, use binary reads and writes, and just use getc() and puts().
> 
> I'm not sure if I'm allowed to give any more details of the algorithms I
> used etc..., but I did strip out unnecessary delimiters between words.
> 
> ----------
> Derek
> 
>

new topic » goto parent » topic index » view message » categorize

4. RE: $100 Contest Question

Posted by bensler at mail.com Mar 02, 2002
548 views

New question...

   What platform will be used to test with? To be fair, it would have to 
be tested on all 3.

   What if one entry only works on a specific platform, but is faster 
than all others for that platform?

   What are the valid match characters for Contest#2? A-Z and a-z? What 
about hyphens, and apostrophes?

Chris

new topic » goto parent » topic index » view message » categorize

5. RE: $100 Contest Question

Posted by bensler at mail.com Mar 02, 2002
534 views

Oops, I meant the former, not the latter :p

Chris

bensler at mail.com wrote:
> Derek, are you using a preformatted version of words.txt, or are you 
> formatting it within your program? It sounds like the latter, which I 
> don't think is allowed.
> 
> Chris
> 
> 
> Derek Parnell wrote:
> > ----- Original Message -----
> > From: "Kat" <gertie at PELL.NET>
> > To: "EUforum" <EUforum at topica.com>
> > Sent: Sunday, March 03, 2002 2:39 PM
> > Subject: RE: $100 Contest Question
> > 
> > 
> > > On 3 Mar 2002, at 2:52, C. K. Lester wrote:
> > >
> > > >
> > > > Kat wrote:
> > > > >
> > > > > My new question: It's taking me 143 seconds to load a
> > > > > dictionary with the following code:
> > > >
> > > > Kat, I'm loading Junko's 50,000-word dictionary in less than a second.
> > > > What dictionary are YOU using?! :D
> > >
> > > One i threw together and premunged just for this. Unfortunately, it's
> > making a
> > > 4.8Meg file out of the dictionaries, even after deleting all the 
> > > trailing
> > spaces,
> > > CR, LF, and duplicates. Using:
> > >
> > >   writefile = open(dfilename,"wb")
> > >   print(writefile,dictionary)
> > >   close(writefile)
> > >
> > > puts this stuff into the file:
> > >
> > >
> > {{{50},{67},{71},{72},{75},{77},{78},{84},{88},{65},{73},{65},{66},{66},{67}
> > 
> > 
> > ,{68},{68}
> > > , etc, which is just a listing of the letters of the alphabet, which is
> > what i
> > > wanted at that point. But i didn't need it represented that way, i 
> > > wanted
> > it
> > > done in a way Eu could reload it instantly,, like,, umm, in
> > > D:\Euphoria\DEMO\MYDATA.EX. As it is saved tho, it is taking up 5x as
> > > much space and load time as needed. Any ideas?
> > 
> > That is exactly why print() and get() should rarely be used. They are
> > extremely inefficient. In my version of reformatting Junko's WORDS.TXT, 
> > I
> > went from the original 508,190 bytes to 531,828 bytes. In speed 
> > differences,
> > it takes my about 6 seconds to use WORDS.TXT to build the internal
> > dictionary, and about 0.5 seconds to build it using the reformatted
> > words.txt.
> > 
> > For speed, use binary reads and writes, and just use getc() and puts().
> > 
> > I'm not sure if I'm allowed to give any more details of the algorithms I
> > used etc..., but I did strip out unnecessary delimiters between words.
> > 
> > ----------
> > Derek
> > 
> >

new topic » goto parent » topic index » view message » categorize

6. RE: $100 Contest Question

Posted by bensler at mail.com Mar 03, 2002
536 views

I'm able to load up Junko's Words.txt and format it in 0.11 seconds, 
done when the library loads. This could be sped up quite a bit if I can 
use a preformatted version of words.txt.

PIII 600mhz
Chris


Derek Parnell wrote:
> ----- Original Message -----
> From: <bensler at mail.com>
> To: "EUforum" <EUforum at topica.com>
> Sent: Sunday, March 03, 2002 4:17 PM
> Subject: RE: $100 Contest Question
> 
> 
> > Derek, are you using a preformatted version of words.txt, or are you
> > formatting it within your program? It sounds like the latter, which I
> > don't think is allowed.
> >
> > Chris
> >
> >
> The short answer is both.
> 
> For comp#2, the algorithm I used when the matching routine is called 
> was:
>    If the internal dictionary is not set up,
>        look for a file called 'dict.dat'.
>        If that is present, use it's contents to
>             set up the internal dictionary.
>        otherwise look for 'words.txt'.
>        If that is present, use it's contents to
>             set up the internal dictionary, then
>             write out the internal dictionary to
>             'dict.dat' using a special format.
> 
> In either case, there is a small delay the first time the routine is 
> called
> while it initialises the internal dictionary. Only with the dict.dat 
> file,
> this delay is a lot smaller than with words.txt. Once the dictionary is 
> set
> up, find the matching words is lightening fast.
> 
> When I submitted my program to Robert yesterday, the wording of the
> competition did not say "You must use words.txt contained in Junko's 
> spell
> checker in the Archive." So I guess the rules have changed after my
> submission! Oh well. Of course, in one sense. I did use Junko's file - 
> to
> create a reformatted one - and I can use Junko's file if the dict.dat 
> file
> is not present.
> 
> I wrote the program as if it was to be used in the real world, not just 
> some
> artificial competetion environment. Thus the routine that uses words.txt 
> is
> not hyper-optimised as I was only going to use it once to create the
> dict.dat file. That file is the optimised one.
> 
> If Robert rules against this concept, I guess I can submit another 
> version
> of the program.
> -----
> Derek.
> 
>

new topic » goto parent » topic index » view message » categorize

7. RE: $100 Contest Question

Posted by bensler at mail.com Mar 03, 2002
555 views

These were my assumtions also. But I'd like to know for sure.

Chris


Derek Parnell wrote:
> ----- Original Message -----
> From: <bensler at mail.com>
> To: "EUforum" <EUforum at topica.com>
> Sent: Sunday, March 03, 2002 4:22 PM
> Subject: RE: $100 Contest Question
> 
> 
> > about hyphens, and apostrophes?
> >
> 
> I documented my assumptions in the source code. For this one, it was 
> than
> any character in the pattern text, whose ASCII value is higher than 
> SPACE,
> is meant to be an exact-match character. Meaning that the integers 0 to 
> 32
> are reserved for inexact matches.
> 
> On a similar point I made the assumption that a pattern of {4,6,9} is
> equivalent to {1,2,3}. In other words, the actual value of the pattern
> characters is not important, only that they represent a unique character 
> in
> the target word(s).
> 
> --------
> Derek.
> 
>

new topic » goto parent » topic index » view message » categorize

8. RE: $100 Contest Question

Posted by Ray Smith <smithr at ix.net.au> Mar 03, 2002
537 views

Robert Craig wrote:

> On problem #2 you must use Junko's dictionary (word list),
> because I will be determining correctness based on
> the words in that particular dictionary. If you want to reformat
> her file into a different file, that's fine, but the time that it
> takes to do that will be included in your total time. 

I've just wasted a weekend of programming!!!!

Robert Craig wrote in message: 
http://www.topica.com/lists/EUforum/read/message.html?mid=1709760866&sort=d&start=10864

For problem #3, you can change the input word list if you want.
For #2 you must use Junko's list as is.

NOTE THE "as is"  !!!!

after I specifically asked the question:
>Is the time to load the database part of the calculated time to
>decipher the sentence? If it is can we re-organise the
>database in a particular order to help the load time.
>ie. if we create some new hash scheme can we re-order the table to
>make the load time faster and therefore reduce total decipher time.

I don't mind the rules being manipulated for cases that haven't 
already been discussed but it's frustrating when the rules change
from something that has already been said!

... a frustrated

Ray Smith
http://rays-web.com

new topic » goto parent » topic index » view message » categorize

9. RE: $100 Contest Question

Posted by Ray Smith <smithr at ix.net.au> Mar 03, 2002
544 views

Hi Derek,

> All Robert is saying here, is that at some point in the execution of the
> comp#2 program, you must use Junko's file 'as is' to get the list of 
> words
> that will be used to test the program against. You are perfectly free to
> reformat that DATA in your program to help speed things up. I think what
> Robert is getting at is that for comp#2, you cannot use a different file 
> as
> the SOURCE of the word list. Junko's file is the only SOURCE of words
> permitted. If you want to create a new file based on her's, you can. The
> time to do that will be included in the test time though.

OK, I see your point. (my weekend wasn't wasted .. phew!)

Can this be clarified by Rob as it wasn't to clear me (and probably 
others) when I read the previous thread.


Ray Smith
http://rays-web.com

new topic » goto parent » topic index » view message » categorize

10. RE: $100 Contest Question

Posted by Ray Smith <smithr at ix.net.au> Mar 03, 2002
555 views

Derek Parnell wrote:

> In comp#1, we are required to *encipher* some English text. As a trial, 
> I
> combined all the Euphoria\DOC files in to a single file then copied that
> file to itself a few times until I had a file that was 4,267,462 bytes 
> long.
> My program takes about 4 seconds to encipher this file.

What type of PC did you run this on Derek?
My PC at work (PIII 650 Windows 2000 Professional) does this
in 2.36 seconds (for a 4,121,548 bytes sized file)

I haven't tried to optimise it yet (not sure what I could do really)

Note: I won't be submitting my entry to contest 1 (I assume Derek
won't either) 

Ray Smith
http://rays-web.com

new topic » goto parent » topic index » view message » categorize

11. RE: $100 Contest Question

Posted by bensler at mail.com Mar 03, 2002
531 views

The 5 minutes is total program run time. Not runtime for each iteration.

Chris

<SNIP>
> You realise, at 5 minutes runtime per iteration, that's 83 hours? If you 
> get 
> 100 such entries, that's 345 DAYS of runtime for testing problem #2 
> programs.
>  
> > Derek Parnell writes:
> > > On a similar point I made the assumption that a pattern of {4,6,9} is
> > > equivalent to {1,2,3}. In other words, the actual value of the pattern
> > > characters is not important, only that they represent a unique character 
> > > in
> > > the target word(s).
> > 
> > Yes, that's correct.
> > 
> > Aku writes:
> > > (Problem #1) How is the time calculated?
> > > How many iteration (loops) will it be tested?
> > 
> > I'm planning to run each program once,
> > with a few megabytes of input text.
> 
> *megabytes*? of the same few sentences repeated ad naseum, with 
> different 
> keys? Or will you be feeding it a online book text with unique sentences 
> and 
> the same key for all sentences? 
>  
> Kat
> 
>

new topic » goto parent » topic index » view message » categorize

12. RE: $100 Contest Question

Posted by Ray Smith <smithr at ix.net.au> Mar 03, 2002
541 views

Derek Parnell wrote:

> Okay, so I took the bait. I've now got it down from 4.33 seconds to 1.7
> seconds for that 4,267,462 byte file.

DAMN!

I can only get 2.8 seconds.
I'll have to wait a whole month to see how u did it!

Ray Smith
http://rays-web.com

P.S. Is it possible your PIII 550 is faster than my PIII 650???
P.S.S I can't hope can't I!

new topic » goto parent » topic index » view message » categorize

13. RE: $100 Contest Question

Posted by bensler at mail.com Mar 03, 2002
532 views

What kind of bench test are you using Derek? I get nowhere near those 
kinds of results.

what is your average iteration time?

I use..
  match_pattern({1,2,3,1,4,3,3,2,5,3}) -- ZIGZAGGING
..in an iterated loop

I don't think I can even iterate 50x in 1.48 seconds
And that pattern is on the faster side of the avergage.

I don't understand how you could possibly get it that fast, unless your 
pattern is {1}, only then is my program comparable to your results.

I've still got some tweaking to do, but I don't see it getting much 
faster.

Try...
-- find all 7 letter words with unique letters
match_pattern({1,2,3,4,5,6,7})

I'd like to know what your iteration time is for that one.

Chris

Derek Parnell wrote:
> ----- Original Message -----
> From: "Kat" <gertie at PELL.NET>
> To: "EUforum" <EUforum at topica.com>
> Sent: Monday, March 04, 2002 7:38 AM
> Subject: Re: $100 Contest Question
> 
> 
> > > if a '-' or '\'' is supplied (or some character greater than ASCII 32),
> > > it should be treated as a literal character to be matched. Values
> > > from 0 to 32 represent "meta" characters, or placeholders for
> > > unspecified characters in the pattern. I'll only give you upper case
> > > literal characters, A, B, C, ...
> >
> > So the input file will be all upper()'d already?
> 
> Here, Robert is only talking about the characters in the pattern text 
> used
> to test comp#2. Not the word list or any file used in any competition. 
> If I
> was you, I'd expect mixed case text for the input files in comp#1 and
> comp#3.
> 
> 
> > Robert Craig wrote:
> > > In problem #2, assume that I will make 1000
> > > calls to your function.
> >
> > You realise, at 5 minutes runtime per iteration, that's 83 hours? If you
> get
> > 100 such entries, that's 345 DAYS of runtime for testing problem #2
> > programs.
> 
> There is something really wrong if a single iteration takes that long. 
> I'm
> sure Robert will pull the plug on any program runs longer than 5 
> minutes.
> Currently I'm doing 1280 iterations in 1.48 seconds, and I haven't 
> finished
> optimising it yet.
> 
> > Robert Craig wrote:
> > > I'm planning to run each program once,
> > > with a few megabytes of input text.
> >
> > *megabytes*? of the same few sentences repeated ad naseum, with 
> > different
> > keys? Or will you be feeding it a online book text with unique sentences
> and
> > the same key for all sentences?
> 
> In comp#1, we are required to *encipher* some English text. As a trial, 
> I
> combined all the Euphoria\DOC files in to a single file then copied that
> file to itself a few times until I had a file that was 4,267,462 bytes 
> long.
> My program takes about 4 seconds to encipher this file.
> 
> --------
> Derek
> 
>

new topic » goto parent » topic index » view message » categorize

14. RE: $100 Contest Question

Posted by bensler at mail.com Mar 03, 2002
556 views

Can someone give me a benchmark for problme#2?

Total run time, number of iterations, and the filter/s used
I need to know if I'm in the ball park, or if I need to reconsider my 
implementation.

Chris


euman at bellsouth.net wrote:
> Kat did you try my posted routine yet?
> 
> This builds a alphabetical 'A' to 'Z' sequence
> that allows for text length upto 26 letters
> 
> example
> 
> "A," -- length (1)
> "AD,AH,AL,AM,AN,AS,AT,AU,AW,AX,AY," -- length (2)
> "ABC,ABE,ABO,ACE,ACT,ADA,ADD,ADO,ADS,ADZ,AFT,AGE, etc..etc" -- length 
> (3)
> "ABBA,ABBE,ABBY,ABED,ABEL,ABET,ABLE,ABLY,ABUT,ACES,ACHE,ACID, etc...etc" 
> -- length (4)
> 
> down to length (26)
> then starts over with the next beginning letter (B,C,D out to Z)
> 
> 3.5 sec on my 233mhz but instead of the text I presented it will be 
> UNIQUE numerical values 
> that represent the text.
> 
> Later,
> 
> Euman
> euman at bellsouth.net
> 
> Q: Are we monetarily insane?
> A: YES
> ----- Original Message ----- 
> From: "Kat" <gertie at PELL.NET>
> To: "EUforum" <EUforum at topica.com>
> Sent: Sunday, March 03, 2002 8:02 PM
> Subject: Re: $100 Contest Question
> 
> 
> > On 4 Mar 2002, at 10:20, Derek Parnell wrote:
> > 
> > > 
> > > ----- Original Message -----
> > > From: "Ray Smith" <smithr at ix.net.au>
> > > To: "EUforum" <EUforum at topica.com>
> > > Sent: Monday, March 04, 2002 9:47 AM
> > > Subject: RE: $100 Contest Question
> > > 
> > > 
> > > > My PC at work (PIII 650 Windows 2000 Professional) does this
> > > > in 2.36 seconds (for a 4,121,548 bytes sized file)
> > > >
> > > 
> > > Okay, so I took the bait. I've now got it down from 4.33 seconds to 1.7
> > > seconds for that 4,267,462 byte file.
> > 
> > Ok, i bow out of the contest. It takes me 434 seconds (over 7 minutes) 
> > just 
> > to load a dictionary into memory. Building the dictionary from 3 files 
> > takes 72 
> > minutes. I took the premunge time to sort words by length, and i load 
> > only 
> > words corresponding in length to words present in the cyphered line. You 
> > are 
> > getting times 10x to 140x faster than i am. I concede.
> > 
> > Kat
> > 
> >

new topic » goto parent » topic index » view message » categorize

15. RE: $100 Contest Question

Posted by Ray Smith <smithr at ix.net.au> Mar 03, 2002
536 views

Derek Parnell wrote:
> Okay, so I took the bait. I've now got it down from 4.33 seconds to 1.7
> seconds for that 4,267,462 byte file.

Hmm, I've got mine down between 1.70 and 1.76 seconds!!!!
I wonder how close our algorithms will end up being??

Ray Smith
http://rays-web.com

new topic » goto parent » topic index » view message » categorize

16. RE: $100 Contest Question

Posted by bensler at mail.com Mar 04, 2002
532 views

I privately asked Rob if I was allowed to predefine a hastable in my 
program. The answer is no, it's considered similar to a preformatted 
dictionary file. So, just in case anyone else has the same idea as I did 
:/


Chris

new topic » goto parent » topic index » view message » categorize

17. RE: $100 Contest Question

Posted by bensler at mail.com Mar 04, 2002
534 views

On my PIII 600mhz/64MB :

  Euman's hashtable runs at 0.8s

With FULL inclusion of my program in the bench time:

  My results for Derek's benchtest are:
    Load time    :  0.16
    Total time   :  8.79
    Iterated time:  0.006867

  My results for this benchtest(x100 iterations) are:
    Load time    :  0.16
    Total time   : 15.36
    Iterated time:  0.011

My program isn't valid for submission though, because it uses a 
predefined hashtable.

Derek still has me beat by a long shot.

I definitely need to revamp my algorithm :/


Chris


Martin Stachon wrote:
> From: <bensler at mail.com>
> > Can someone give me a benchmark for problme#2?
> > 
> > Total run time, number of iterations, and the filter/s used
> > I need to know if I'm in the ball park, or if I need to reconsider my 
> > implementation.
> 
> Using this benchmark :
>         {
>             {1,2,3,4,5,4,3,2,1},
>             {1,2,'X'},
>             {'M',1,2,3,4,5},
>             {1,2,3,4,5,6,7,8,9,10},
>             {'E',1,1,2},
>             {1,2,2,1,3},
>             {1,1,2},
>             {1,2,1},
>             {'M',1,2,1,'M'},
>             "MARTIN",
>             {1,2,'X',2,1},
>             {1,2,3,'B',3,4},
>             {1,2,'M',2,1},
>             {'E',1,2,3,1,3}
>         }
>     for p=1 to length(pats) do
>         words = get_words_by_pattern(pats[p])
>     end for
>     ? time()-t
> On my Winchip @200Mhz, Win98. And you?
> 
>     Martin
> 
>

new topic » goto parent » topic index » view message » categorize

18. RE: $100 Contest Question

Posted by "C. K. Lester" <cklester at yahoo.com> Mar 04, 2002
531 views

rforno at tutopia.com wrote:
> Maybe someone is interested in that I casually found the
> word BBUFFALOES instead of BUFFALOES in Junko´s list.
> I don't know if there are other errors.

No wonder my program kept crashing!!! heheh

How'd you find that, btw? Just reading the list?

new topic » goto parent » topic index » view message » categorize

19. RE: $100 Contest Question

Posted by bensler at mail.com Mar 04, 2002
546 views

I got it down to 0.11 with minor tweaks, I think I can get it even 
faster. :)

Chris

euman at bellsouth.net wrote:
> ----- Original Message ----- 
> From: "Derek Parnell" <ddparnell at bigpond.com>
> To: "EUforum" <EUforum at topica.com>
> > 
> > Euman's hashtable runs at 1.16 but I tweaked that a lot and got it to 
> > run at
> > 0.22
> 
> Yes you did, thanks BIG D
> 
> When "I" changed this: 
> h *= 16
> 
> to this:
> h *= 3 -- which doesnt give as Unique a value as before but is still 
> very effective.
> 
> The routine was @250% faster bringing the time from 3.5 sec on my 233mhz 
> to 1.1 sec 
> This is the routine I was talking about sharing in a month or so.
> 
> But when BIG D convinced me that getc( ) would shave another 0.25 sec 
> off the load time.
> I was impressed. He also went a few steps further and now EumsHash runs 
> in at 0.60 sec
> on my 233mhz laptop ( at 200mhz desk). I can imagine figures on PIII 500 or 
> higher machine 
> being (0.0something) or better now. 
> 
> Prolly could beat Junko's Spellchecker I havent coded this so Im not 
> sure.
> 
> sure is BLAZING FAST!
> 
> Euman
> 
>

new topic » goto parent » topic index » view message » categorize

20. Re: RE: $100 Contest Question

Posted by Derek Parnell <ddparnell at bigpond.com> Mar 04, 2002
545 views

5/03/2002 9:47:43 AM, bensler at mail.com wrote:

>
>I got it down to 0.11 with minor tweaks, I think I can get it even 
>faster. :)
>
>Chris

There comes a point of diminishing returns 

But, why bother hashing the words in words.txt file? That's fine for a
spell-checker, but comp#2 and
comp#3 isn't about finding specific words in a dictionary, is it?

new topic » goto parent » topic index » view message » categorize

21. RE: $100 Contest Question

Posted by bensler at mail.com Mar 04, 2002
529 views

Grr, I DID test before I said anything, to be sure. But I double 
checkled, and the values aren't the same :(
I'll have to see if I can still do it.

Like Derek said though, I don't see the purpose of using that table.
What does it matter what the first letter of each word is?
Only a minute set of circumstances would benefit from it.

Chris

euman at bellsouth.net wrote:
> Chris are you sure the output is the same (hash table values)?
> 
> If not I would say that the table lookup speed might differ.
> 
> Right now, the table lookup is 100's time faster than reading in the
> "words.txt" and creating the table. 
> 
> reading and creating the table is 0.60 sec (233mhz) FAST
> imagine what the lookup will be on say a 15000 word .txt file
> like Junkos spell checker is timed at (2 sec)
> 
> Im saying the spellchecker could be @ 1 sec for the same using
> EumsHash( ) but this is just a guestament.
> 
> Euman
> euman at bellsouth.net
> 
> Q: Are we monetarily insane?
> A: YES
> ----- Original Message ----- 
> From: <bensler at mail.com>
> To: "EUforum" <EUforum at topica.com>
> Sent: Monday, March 04, 2002 5:47 PM
> Subject: RE: $100 Contest Question
> 
> 
> > I got it down to 0.11 with minor tweaks, I think I can get it even 
> > faster. :)
> > 
> > Chris
> > 
> > euman at bellsouth.net wrote:
> > > ----- Original Message ----- 
> > > From: "Derek Parnell" <ddparnell at bigpond.com>
> > > To: "EUforum" <EUforum at topica.com>
> > > > 
> > > > Euman's hashtable runs at 1.16 but I tweaked that a lot and got it to 
> > > > run at
> > > > 0.22
> > > 
> > > Yes you did, thanks BIG D
> > > 
> > > When "I" changed this: 
> > > h *= 16
> > > 
> > > to this:
> > > h *= 3 -- which doesnt give as Unique a value as before but is still 
> > > very effective.
> > > 
> > > The routine was @250% faster bringing the time from 3.5 sec on my 233mhz 
> > > 
> > > to 1.1 sec 
> > > This is the routine I was talking about sharing in a month or so.
> > > 
> > > But when BIG D convinced me that getc( ) would shave another 0.25 sec 
> > > off the load time.
> > > I was impressed. He also went a few steps further and now EumsHash runs 
> > > in at 0.60 sec
> > > on my 233mhz laptop ( at 200mhz desk). I can imagine figures on PIII 500
> > > or
> > > 
> > > higher machine 
> > > being (0.0something) or better now. 
> > > 
> > > Prolly could beat Junko's Spellchecker I havent coded this so Im not 
> > > sure.
> > > 
> > > sure is BLAZING FAST!
> > > 
> > > Euman
> > > 
> > >

new topic » goto parent » topic index » view message » categorize

22. RE: $100 Contest Question

Posted by bensler at mail.com Mar 04, 2002
515 views

As a real world practicality, I think that lowercase should be included. 
By splitting the ascii pattern at 32, it leaves no input error cases.

BTW Derek:

  I've got your benchmark down to 0.00305 :) And it's tested for 
correctness.

  I'm sure I can knock that down some too.


Chris


Derek Parnell wrote:
> 5/03/2002 7:43:18 AM, rforno at tutopia.com wrote:
> 
> >
> >Derek:
> >It seems to me I don't understand the rules. Being that Junko's 
> >words.txt
> >contains only upper case, what is the point of:
> >Test({1,2,3,4,'e','d'})  ?
> >Thanks.
> >
> >
> Strictly you're correct. If we read the rules it does imply that the 
> pattern is case sensitive. 
> 
> "Assume that the numbers 0 to 32 are used to indicate the pattern, while 
> the numbers above 32 are 
> the ASCII codes of literal characters to be matched (including 
> apostrophe and hyphen)."
> 
> And thus applying lowercase characters in the pattern to WORDS.TXT would 
> be a waste of time.
> 
> Because the other competition's stressed case-insensitivity, I assumed 
> it would apply to this comp 
> as well. Thank's for pointing out my error.
> 
> ---------
> Derek.
> 
>

new topic » goto parent » topic index » view message » categorize

23. RE: $100 Contest Question

Posted by bensler at mail.com Mar 05, 2002
538 views

I need clarification on this. This is essentially what I asked Rob when 
I asked about predefining a hashtable.

I'm asuming, we cannot asume anything about words.txt except that it 
will be sorted, and all words are capitalized.

Are there other specifications of the file that we CAN count on Rob? 
like number of words, or number of words beginning with each letter. Or 
number of words of each length. What about maximum word length?

Should the routine handle lowercase parameters?

Chris

petelomax at blueyonder.co.uk wrote:
> On Sun, 3 Mar 2002 23:37:20 -0500, Robert Craig
> <rds at RapidEuphoria.com> wrote:
> 
> >but I don't want programs to 
> >take advantage of anything they did in a previous run.)
> 
> Everyone knows that there are 51798 words in the dictionary.
> I've also worked out that one of my arrays will need 122545 entries.
> 
> I assume it is ok to code these constants into the program.
> 
> In contest #2, you specify a library routine. Therefore I expect you
> to invoke my code with
> 
> include myfile
> ...
> select(pattern)
> 
> in a standard test rig rather than the dos prompt redirection thingy.
> Is that right? If not, do you want a library routine and a loading
> program, or do you want them munged into a single standalone program?
> 
> You also specify "a sequence containing all the words". Should this be
> in alphabetical order? I'm going to assume not unless otherwise told.
> 
> Pete Lomax
> 
>

new topic » goto parent » topic index » view message » categorize

OpenEuphoria

1. RE: $100 Contest Question

2. RE: $100 Contest Question

3. RE: $100 Contest Question

4. RE: $100 Contest Question

5. RE: $100 Contest Question

6. RE: $100 Contest Question

7. RE: $100 Contest Question

8. RE: $100 Contest Question

9. RE: $100 Contest Question

10. RE: $100 Contest Question

11. RE: $100 Contest Question

12. RE: $100 Contest Question

13. RE: $100 Contest Question

14. RE: $100 Contest Question

15. RE: $100 Contest Question

16. RE: $100 Contest Question

17. RE: $100 Contest Question

18. RE: $100 Contest Question

19. RE: $100 Contest Question

20. Re: RE: $100 Contest Question

21. RE: $100 Contest Question

22. RE: $100 Contest Question

23. RE: $100 Contest Question

Search

Include:

Quick Links

User menu

Misc Menu