Re: External sorting

new topic     » goto parent     » topic index » view thread      » older message » newer message

On Thu, 12 Jun 2003 12:23:32 +1000, <mistertrik at hotmail.com> wrote:

>
>
> What about a bucket sort? Open n files for writing, where n is the number 
> of buckets you have, and as you read in your data to be sorted, stick it 
> in the appropriate bucket. Then, open an output file, and go through each 
> bucket in order, sort the contents of that bucket, and append it to the 
> output file.
>
> As far as what number you choose 'n' to be, just pick it so that the size 
> of each bucket when full will not be more than the amount of memory you 
> want to use at a time.
>
> IMO this manner of sorting would require a lot less file I/O than the 
> merging sort described earlier... what do you think?

One of the issues with this method is that the bucket files are of varying 
length, and any one of them might end up to be larger than RAM can cope 
with. Of course, you could deal with this is recursively - that is, for 
each bucket file that is too big, reprocess it into smaller buckets.

Another issue is working out what 'n' should be. This is normally done by 
knowing the range of values that the key can have or by know how many 
rcords need sorting. This is not always easy. Consider sorting the 
telephone book entries of a large city. The range of names is huge. You 
might like to split on the first 2/3/4 letters of a name but even that can 
give very large bucket files.

The idea, however, can work for some situations.

-- 

cheers,
Derek Parnell

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu