RE: External sorting
- Posted by Andy Serpa <ac at onehorseshy.com> Jun 11, 2003
- 370 views
Derek Parnell wrote: > > > On Tue, 10 Jun 2003 23:23:12 +0000, Andy Serpa <ac at onehorseshy.com> > wrote: > > > > > Has anyone written anything to do external sorting? That is, sorting > > data in a file where the file is too large to fit entirely in memory? > > [writing out to a new file in sorted order.] Ricardo Forno suggests in > > genfunc.e that his NWay sorts could be apapted to this task, but I'm too > > > > dumb to determine how one would do it. > > > > Anyone? > > > > This takes me back to my IBM mainframe days. > > The general idea is to take into RAM as much as you dare, sort that and > output it to a sort work file. You repeat this, creating as many sort > work > files as required until all the original (unsorted) data has been > processed. This means you end up with a set of sorted files. Next, you > take > two (or more) files at a time and merge them, creating either the final > file or additional work files. You repeat merging the work files until > you > have one completely sorted file. > > There are a few optimisation possible, but this is the general idea. > > To diagram this... > > [UNSORTED] --> (ram sorting) --> sortwk1, sortwk2, ...sortwkn > > [sortwk1] + [sortwk2] --> (merge phase) --> sortmg1 > [sortmg1] + [sortwk3] --> (merge phase) --> sortmg2 > . . . [sortmgn] + [sortwkn] --> (merge phase) --> final > I've just been reading up on it. I think I will be able to come up with something...