1. RE: External sorting
- Posted by Andy Serpa <ac at onehorseshy.com> Jun 11, 2003
- 364 views
Robert Elia wrote: > At 11:23 PM 6/10/03 +0000, you wrote: > > > > >Has anyone written anything to do external sorting? That is, sorting > >data in a file where the file is too large to fit entirely in memory? > >[writing out to a new file in sorted order.] Ricardo Forno suggests in > >genfunc.e that his NWay sorts could be apapted to this task, but I'm too > >dumb to determine how one would do it. > > > >Anyone? > > Andy, > > If you aren't wedded to Euphoria for this, I've been using a > 16-bit DOS app for this called RPS, written by the late Robert Pirko, > for > many years. It's rock solid, very fast and extremely versatile. I > chose > it after examining a lot of others. I'm sure it's still around on the > net > but if you can't find it, I'll be happy to send it. > I'm wedded to the idea of doing it programatically. I'll be sorting delimited files with tabular data on-the-fly "by column" so it needs to be flexible.
2. RE: External sorting
- Posted by Andy Serpa <ac at onehorseshy.com> Jun 11, 2003
- 370 views
Derek Parnell wrote: > > > On Tue, 10 Jun 2003 23:23:12 +0000, Andy Serpa <ac at onehorseshy.com> > wrote: > > > > > Has anyone written anything to do external sorting? That is, sorting > > data in a file where the file is too large to fit entirely in memory? > > [writing out to a new file in sorted order.] Ricardo Forno suggests in > > genfunc.e that his NWay sorts could be apapted to this task, but I'm too > > > > dumb to determine how one would do it. > > > > Anyone? > > > > This takes me back to my IBM mainframe days. > > The general idea is to take into RAM as much as you dare, sort that and > output it to a sort work file. You repeat this, creating as many sort > work > files as required until all the original (unsorted) data has been > processed. This means you end up with a set of sorted files. Next, you > take > two (or more) files at a time and merge them, creating either the final > file or additional work files. You repeat merging the work files until > you > have one completely sorted file. > > There are a few optimisation possible, but this is the general idea. > > To diagram this... > > [UNSORTED] --> (ram sorting) --> sortwk1, sortwk2, ...sortwkn > > [sortwk1] + [sortwk2] --> (merge phase) --> sortmg1 > [sortmg1] + [sortwk3] --> (merge phase) --> sortmg2 > . . . [sortmgn] + [sortwkn] --> (merge phase) --> final > I've just been reading up on it. I think I will be able to come up with something...
3. RE: External sorting
- Posted by rforno at tutopia.com Jun 11, 2003
- 351 views
Well, I once wrote an external sort for a machine that had not one (remember Cromemco computers, similar to PC's in the 80's, running Cromix, an Unix-like OS?). I used a technique very similar to the NWay EU sort you mention. There are many books explaining how to do that, of which I can recommend Donald Knuth's "The Art of Computer Programming" (regrettably he uses not a high-level language but a strange all-purpose Assembly invented by him, called "MIX"). I promise to write such an external sort, not in the near future... Don't take me too seriously in this respect, please ;) Andy, according to what you usually post, you are by no means dumb. I think once you read a good book or paper on external sorting, you'll be able to program it without too much effort. But let me briefly explain here how would I do that. 1) Read in the input file while it fits into memory (to be on the safe side, use only 1Mb of memory, for example). 2) Internally sort the chunk that was read. 3) Write it to a temporary file, saving the starting and ending addresses in a vector (sequence). 4) Repeat 1 - 3 until all of the input file was read. 5) Read in the chunks one record at a time, using the address vector developed earlier to figure where to read and when to stop, and merge them using a technique similar to the NWAY sorting. Write the resulting data to the final output file. What can be a bit complex is how to generalize the sort regarding possible key positions, multiple keys, data positions, order (ascending / descending), comparison options, etc., but Euphoria will help here, having only a data type and the possibility of custom-comparison. Regards. ----- Original Message ----- From: Andy Serpa <ac at onehorseshy.com> To: EUforum <EUforum at topica.com> Sent: Tuesday, June 10, 2003 8:23 PM Subject: External sorting > > > Has anyone written anything to do external sorting? That is, sorting > data in a file where the file is too large to fit entirely in memory? > [writing out to a new file in sorted order.] Ricardo Forno suggests in > genfunc.e that his NWay sorts could be apapted to this task, but I'm too > dumb to determine how one would do it. > > Anyone? > > > > TOPICA - Start your own email discussion group. FREE! > >
4. RE: External sorting
- Posted by mistertrik at hotmail.com Jun 12, 2003
- 348 views
What about a bucket sort? Open n files for writing, where n is the number of buckets you have, and as you read in your data to be sorted, stick it in the appropriate bucket. Then, open an output file, and go through each bucket in order, sort the contents of that bucket, and append it to the output file. As far as what number you choose 'n' to be, just pick it so that the size of each bucket when full will not be more than the amount of memory you want to use at a time. IMO this manner of sorting would require a lot less file I/O than the merging sort described earlier... what do you think? ===================================================== .______<-------------------\__ / _____<--------------------__|=== ||_ <-------------------/ \__| Mr Trick >From: rforno at tutopia.com >Reply-To: EUforum at topica.com >To: EUforum <EUforum at topica.com> >Subject: RE: External sorting >Date: Wed, 11 Jun 2003 13:58:58 -0300 > > >Well, I once wrote an external sort for a machine that had not one >(remember >Cromemco computers, similar to PC's in the 80's, running Cromix, an >Unix-like OS?). I used a technique very similar to the NWay EU sort you >mention. There are many books explaining how to do that, of which I can >recommend Donald Knuth's "The Art of Computer Programming" (regrettably he >uses not a high-level language but a strange all-purpose Assembly invented >by him, called "MIX"). >I promise to write such an external sort, not in the near future... Don't >take me too seriously in this respect, please ;) >Andy, according to what you usually post, you are by no means dumb. I think >once you read a good book or paper on external sorting, you'll be able to >program it without too much effort. >But let me briefly explain here how would I do that. >1) Read in the input file while it fits into memory (to be on the safe >side, >use only 1Mb of memory, for example). >2) Internally sort the chunk that was read. >3) Write it to a temporary file, saving the starting and ending addresses >in >a vector (sequence). >4) Repeat 1 - 3 until all of the input file was read. >5) Read in the chunks one record at a time, using the address vector >developed earlier to figure where to read and when to stop, and merge them >using a technique similar to the NWAY sorting. Write the resulting data to >the final output file. > >What can be a bit complex is how to generalize the sort regarding possible >key positions, multiple keys, data positions, order (ascending / >descending), comparison options, etc., but Euphoria will help here, having >only a data type and the possibility of custom-comparison. > >Regards. >----- Original Message ----- >From: Andy Serpa <ac at onehorseshy.com> >To: EUforum <EUforum at topica.com> >Sent: Tuesday, June 10, 2003 8:23 PM >Subject: External sorting > > > > Has anyone written anything to do external sorting? That is, sorting > > data in a file where the file is too large to fit entirely in memory? > > [writing out to a new file in sorted order.] Ricardo Forno suggests in > > genfunc.e that his NWay sorts could be apapted to this task, but I'm too > > dumb to determine how one would do it. > > > > Anyone? > > > > > > TOPICA - Start your own email discussion group. FREE! > > > > > > >TOPICA - Start your own email discussion group. FREE! > >