Re: External sorting
- Posted by Derek Parnell <ddparnell at bigpond.com> Jun 11, 2003
- 373 views
On Tue, 10 Jun 2003 23:23:12 +0000, Andy Serpa <ac at onehorseshy.com> wrote: > > > Has anyone written anything to do external sorting? That is, sorting > data in a file where the file is too large to fit entirely in memory? > [writing out to a new file in sorted order.] Ricardo Forno suggests in > genfunc.e that his NWay sorts could be apapted to this task, but I'm too > dumb to determine how one would do it. > > Anyone? > This takes me back to my IBM mainframe days. The general idea is to take into RAM as much as you dare, sort that and output it to a sort work file. You repeat this, creating as many sort work files as required until all the original (unsorted) data has been processed. This means you end up with a set of sorted files. Next, you take two (or more) files at a time and merge them, creating either the final file or additional work files. You repeat merging the work files until you have one completely sorted file. There are a few optimisation possible, but this is the general idea. To diagram this... [UNSORTED] --> (ram sorting) --> sortwk1, sortwk2, ...sortwkn [sortwk1] + [sortwk2] --> (merge phase) --> sortmg1 [sortmg1] + [sortwk3] --> (merge phase) --> sortmg2 . . . [sortmgn] + [sortwkn] --> (merge phase) --> final -- Derek -- cheers, Derek Parnell