Re: Merge Sort Implementation
- Posted by Spock 1 month ago
- 335 views
-- The worst case is data which is already sorted in the opposite way. This creates -- the same number of initial subsets as there are elements to be sorted. -- -- The best case is for data already sorted. This creates only one initial subset.
This is where I would start optimising. A fast scan for linearly ascending or descending elements, could be done as part of the subset marking phase. Descending blocks get immediately reversed. You should have a minimum subset size rather than allow tiny blocks. Another option could be to first run a fixed length insertion_sort() over the entire block.
I have also experimented with merge_sort() implementations. I did post http://www.rapideuphoria.com/msort.zip but the link don't work no more. It was basically standard merge_sort() but used a fast insertion_sort() for lists of 100 or less elements - which is basically what Timsort does. Since standard merge_sort() is a top-down divide and conquer approach, recursing down only to a certain point makes the optimisation seem very natural.
I would recommend supplementing your MergeSort() with the following routine [which you obviously have to adapt]. And somewhere in MergeSort() put the equivalent of :
-- init integer n = length(x) -- fast sort for small sequences if n <= 100 then return insertion_sortXX(x) end if
function insertion_sortXX(sequence x) -- insertion sort on steroids object temp integer insert, hi, mid -- loop along UNsorted list for i = 2 to length(x) do -- get the first element in the unsorted section temp = x[i] insert = 1 -- start of sorted section hi = i-1 -- end of sorted section -- find the insertion point -- binary search? if i > 15 then while insert <= hi do -- starts at top of sorted list and go to the bottom mid = floor((insert + hi) / 2) if compare(temp, x[mid]) < 0 then hi = mid - 1 else insert = mid + 1 end if end while -- linear search? else while hi do if compare( temp, x[hi] ) >= 0 then insert = hi+1 exit end if hi -= 1 end while end if -- shuffle & insert, but only if needed --if insert < i then x[insert+1 .. i] = x[insert .. i-1] x[insert] = temp --end if end for -- exit return x end function
Testing 2,000,000 random strings of length 10, I get the following results on my machine:
msort 8.5 seconds
merge_sort 9 seconds - from allsorts.ex [1]
MergeSort 10.5 seconds
If you can inline a fast insertion sort, your code might be competitive. Otherwise, it appears I'm still the winner for [stable] sorting.
Spock
[1] original 1993 version not stable. Should use <= instead of < in this line: if compare(a[1], b[1]) < 0 then