RE: Defining long constants - do-able but not exactly elegant
- Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Mar 14, 2002
- 443 views
> -----Original Message----- > From: Kat [mailto:gertie at PELL.NET] > In this case, i premunged the dictionary to separate > word-sized files and > used gets() for each word. I had made one file using > printf(), and one get(), > which took longer and made a *much* bigger file. The getc() > version ran the > slowest. > > Kat I'm using a pre-munged dictionary file, and taking about 1.5 seconds to load on a P-233, 64MB (~3200 sieves/sec). I put it into a 3-dimensioned sequence ( words[a][b][c] ). I know the range of 'b' based on 'a' (i.e., word length--part of the pre-munging was based on length), and read a 4-byte integer telling me the range of 'c' for the given [a][b] combination right before those words. I use get_bytes( fn, a ) to read a word. I just run a doubly nested loop, stopping when I get to the end of the file. When I increment 'a': words = append( words, repeat( {}, a ) ) Then, I read in the 4 byte integer (error checking omited): range_c = bytes_to_int( get_bytes( fn, 4 ) ) words[a][b] = repeat( {}, range_c ) And then I read in range_c words: for i = 1 to c do words[a][b][c] = get_bytes( fn, a ) end for Since I don't have line breaks, it reduced the size of words.txt (from Junko's spell checker) by about 100K. I seem to recall someone (Derek?) having come up with an optimization for get_bytes(), but most of the time seems to be spent doing sequence manipulation. I think it took about 42 seconds to do the munging (not including write time). I'm sure I could improve that by pre-allocating sequence elements, but this way was easier, and not really time critical. Using getc() was definitely faster than gets(), which took about 180 seconds to do. The writing process was just a nested loop running through all the subsequences of words: for i = 1 to length( words ) do for j = 1 to i do puts( fn, int_to_bytes( length( words[i][j] ) ) ) for k = 1 to length( words[i][j] ) do puts( fn, words[i][j][k] ) end for end for end for Matt Lewis