RE: External sorting

new topic     » goto parent     » topic index » view thread      » older message » newer message

What about a bucket sort? Open n files for writing, where n is the number of 
buckets you have, and as you read in your data to be sorted, stick it in the 
appropriate bucket. Then, open an output file, and go through each bucket in 
order, sort the contents of that bucket, and append it to the output file.

As far as what number you choose 'n' to be, just pick it so that the size of 
each bucket when full will not be more than the amount of memory you want to 
use at a time.

IMO this manner of sorting would require a lot less file I/O than the 
merging sort described earlier... what do you think?
=====================================================
.______<-------------------\__
/ _____<--------------------__|===
||_      <-------------------/
\__| Mr Trick





>From: rforno at tutopia.com
>Reply-To: EUforum at topica.com
>To: EUforum <EUforum at topica.com>
>Subject: RE: External sorting
>Date: Wed, 11 Jun 2003 13:58:58 -0300
>
>
>Well, I once wrote an external sort for a machine that had not one 
>(remember
>Cromemco computers, similar to PC's in the 80's, running Cromix, an
>Unix-like OS?). I used a technique very similar to the NWay EU sort you
>mention. There are many books explaining how to do that, of which I can
>recommend Donald Knuth's "The Art of Computer Programming" (regrettably he
>uses not a high-level language but a strange all-purpose Assembly invented
>by him, called "MIX").
>I promise to write such an external sort, not in the near future... Don't
>take me too seriously in this respect, please ;)
>Andy, according to what you usually post, you are by no means dumb. I think
>once you read a good book or paper on external sorting, you'll be able to
>program it without too much effort.
>But let me briefly explain here how would I do that.
>1) Read in the input file while it fits into memory (to be on the safe 
>side,
>use only 1Mb of memory, for example).
>2) Internally sort the chunk that was read.
>3) Write it to a temporary file, saving the starting and ending addresses 
>in
>a vector (sequence).
>4) Repeat 1 - 3 until all of the input file was read.
>5) Read in the chunks one record at a time, using the address vector
>developed earlier to figure where to read and when to stop, and merge them
>using a technique similar to the NWAY sorting. Write the resulting data to
>the final output file.
>
>What can be a bit complex is how to generalize the sort regarding possible
>key positions, multiple keys, data positions, order (ascending /
>descending), comparison options, etc., but Euphoria will help here, having
>only a data type and the possibility of custom-comparison.
>
>Regards.
>----- Original Message -----
>From: Andy Serpa <ac at onehorseshy.com>
>To: EUforum <EUforum at topica.com>
>Sent: Tuesday, June 10, 2003 8:23 PM
>Subject: External sorting
>
>
> > Has anyone written anything to do external sorting?  That is, sorting
> > data in a file where the file is too large to fit entirely in memory?
> > [writing out to a new file in sorted order.]  Ricardo Forno suggests in
> > genfunc.e that his NWay sorts could be apapted to this task, but I'm too
> > dumb to determine how one would do it.
> >
> > Anyone?
> >
> >
> > TOPICA - Start your own email discussion group. FREE!
> >
> >
>
>
>TOPICA - Start your own email discussion group. FREE!
>
>

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu