RE: Defining long constants - do-able but not exactly elegant

new topic     » goto parent     » topic index » view thread      » older message » newer message

> -----Original Message-----
> From: Kat [mailto:gertie at PELL.NET]
 
> In this case, i premunged the dictionary to separate 
> word-sized files and 
> used gets() for each word. I had made one file using 
> printf(), and one get(), 
> which took longer and made a *much* bigger file. The getc() 
> version ran the 
> slowest.
> 
> Kat

I'm using a pre-munged dictionary file, and taking about 1.5 seconds to load
on a P-233, 64MB (~3200 sieves/sec).  I put it into a 3-dimensioned sequence
( words[a][b][c] ).  I know the range of 'b' based on 'a' (i.e., word
length--part of the pre-munging was based on length), and read a 4-byte
integer telling me the range of 'c' for the given [a][b] combination right
before those words.  I use get_bytes( fn, a ) to read a word.  I just run a
doubly nested loop, stopping when I get to the end of the file.

When I increment 'a':
words = append( words, repeat( {}, a ) )

Then, I read in the 4 byte integer (error checking omited):

range_c = bytes_to_int( get_bytes( fn, 4 ) )  
words[a][b] = repeat( {}, range_c )

And then I read in range_c words:
for i = 1 to c do
  words[a][b][c] = get_bytes( fn, a )
end for

Since I don't have line breaks, it reduced the size of words.txt (from
Junko's spell checker) by about 100K.  I seem to recall someone (Derek?)
having come up with an optimization for get_bytes(), but most of the time
seems to be spent doing sequence manipulation.

I think it took about 42 seconds to do the munging (not including write
time).  I'm sure I could improve that by pre-allocating sequence elements,
but this way was easier, and not really time critical.  Using getc() was
definitely faster than gets(), which took about 180 seconds to do.  The
writing process was just a nested loop running through all the subsequences
of words:

for i = 1 to length( words ) do
  for j = 1 to i do
    puts( fn, int_to_bytes( length( words[i][j] ) ) )
    for k = 1 to length( words[i][j] ) do
      puts( fn, words[i][j][k] )
    end for
  end for
end for

Matt Lewis

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu