1. Merge Sort Implementation
- Posted by DerekParnell (admin) 1 month ago
- 385 views
For an exercise, I wrote a version of the Merge Sort algorithm. Any comments?
This is the basic implementation. I've also got one that uses a custom comparator.
--------------------------------------------------- -- MERGESORT implementation - D J Parnell, Nov/2024 -- -- This treats the data to be sorted as a set of presorted subsets. -- -- The first stage is to identify the start of each of the subsets -- contained in the data. A subset starts at the point where an element -- is less than the previous element. Note that the first subset always -- starts at position 1. -- eg. Data: 'estavmmlodf' -- Subsets start at positions 1, 4, 6, 8, 10 -- The subsets are 'est' 'av' 'mm' 'lo' 'df' -- -- The second stage is to merge pairs of adjacent subsets into a new subset. -- In this example we get 'aestv' 'lmmo' 'df' -- (Note: if the last subset has no pair, we assume an empty subset as its pair) -- -- The third stage is to recalculate the new subset starting positions. -- The new starting positions are simply every second starting position from the previous list of starts. -- In this example, the new starting positions are 1, 6, 10 -- -- We then repeat stages 2 & 3 until we finally only have one pair to merge. After -- merging the final pair, the data is now fully sorted. -- -- -- Note that after each merge, we have basically halved the number of subsets to deal with. -- -- The worst case is data which is already sorted in the opposite way. This creates -- the same number of initial subsets as there are elements to be sorted. -- -- The best case is for data already sorted. This creates only one initial subset. -- -- This implementation works faster than the standard library sort for large sequences. -- ---------------------------------------------- global function MergeSort(object pData) ---------------------------------------------- sequence lSubLists integer lSubListEnd -- Special case: The data is an atom. if atom(pData) then return pData end if -- Find the first lot of sub-lists -- -- For speed, I want to avoid appending. lSubLists = repeat(1, length(pData)+1) -- Maximum possible number of sublists plus 1. lSubListEnd = 1 -- Examine each element and whenever the current element is -- lower than the previous element, start a new sublist. for i = 2 to length(pData) do if compare(pData[i], pData[i-1]) < 0 then -- Start a new sublist at this point. lSubListEnd += 1 lSubLists[lSubListEnd] = i end if end for if lSubListEnd < 2 then -- Special case: One sublist, so the data is already sorted. return pData end if while 1 do -- Compare the sub lists integer x,y integer xe,ye object lMerged integer t for z =1 to lSubListEnd - 1 by 2 do x = lSubLists[z] -- Grab start of SubList X y = lSubLists[z+1] -- Grab start of SubList Y xe = y-1 -- Calc end of SublistX if z <= lSubListEnd - 2 then ye = lSubLists[z+2] -1 -- Calc end of SublistY else ye = length(pData) -- Calc end of SublistY end if -- Merge the two sublists -- lMerged = repeat(0, xe-x+ye-y + 2) -- Pre-allocate space for the merged lists. t = 1 -- Start at merge spot 1 while x <= xe do if y <= ye then -- Still got an element in each sublist to compare. -- Move the not-larger element to the next merge spot. if compare(pData[x], pData[y]) <= 0 then -- equality check to make it a stable sort. lMerged[t] = pData[x] -- Y is larger or equal so move X element x += 1 -- Bump to next X element to compare else lMerged[t] = pData[y] -- X is larger so move Y element y += 1 -- Bump to next Y element to compare end if t += 1 -- Bump to next merge spot else -- List Y is ended but List X hasn't, so -- remainder of X is all greater than last Y entry. lMerged[t..$] = pData[x..xe] exit end if end while -- Repeat until we have tested all the X elements. if y <= ye then --(there might be some Y elements left) -- List X is ended but List Y hasn't, so -- remainder of Y is all greater than last X entry. lMerged[t..$] = pData[y..ye] end if -- Replace the X and Y sublists with their merged set. pData[lSubLists[z]..ye] = lMerged end for -- Repeat for the next pair of sublists. if lSubListEnd = 2 then -- Finally, just a pair of sublists were merged, so exit -- All done. end if -- Get next set of sublists now that the data has been partially sorted. -- We can use the fact that the previous sublist pairs are already merged. integer lHEAD, lTAIL lHEAD = 0 lTAIL = 1 while lTAIL <= lSubListEnd do lHEAD += 1 -- Bump to next sublist start slot lSubLists[lHEAD] = lSubLists[lTAIL] lTAIL += 2 -- Bump to next start of a pair of old sublists end while lSubListEnd = lHEAD -- Update new end of the sublists. end while return pData -- All done. end function
2. Re: Merge Sort Implementation
- Posted by Spock 1 month ago
- 336 views
-- The worst case is data which is already sorted in the opposite way. This creates -- the same number of initial subsets as there are elements to be sorted. -- -- The best case is for data already sorted. This creates only one initial subset.
This is where I would start optimising. A fast scan for linearly ascending or descending elements, could be done as part of the subset marking phase. Descending blocks get immediately reversed. You should have a minimum subset size rather than allow tiny blocks. Another option could be to first run a fixed length insertion_sort() over the entire block.
I have also experimented with merge_sort() implementations. I did post http://www.rapideuphoria.com/msort.zip but the link don't work no more. It was basically standard merge_sort() but used a fast insertion_sort() for lists of 100 or less elements - which is basically what Timsort does. Since standard merge_sort() is a top-down divide and conquer approach, recursing down only to a certain point makes the optimisation seem very natural.
I would recommend supplementing your MergeSort() with the following routine [which you obviously have to adapt]. And somewhere in MergeSort() put the equivalent of :
-- init integer n = length(x) -- fast sort for small sequences if n <= 100 then return insertion_sortXX(x) end if
function insertion_sortXX(sequence x) -- insertion sort on steroids object temp integer insert, hi, mid -- loop along UNsorted list for i = 2 to length(x) do -- get the first element in the unsorted section temp = x[i] insert = 1 -- start of sorted section hi = i-1 -- end of sorted section -- find the insertion point -- binary search? if i > 15 then while insert <= hi do -- starts at top of sorted list and go to the bottom mid = floor((insert + hi) / 2) if compare(temp, x[mid]) < 0 then hi = mid - 1 else insert = mid + 1 end if end while -- linear search? else while hi do if compare( temp, x[hi] ) >= 0 then insert = hi+1 exit end if hi -= 1 end while end if -- shuffle & insert, but only if needed --if insert < i then x[insert+1 .. i] = x[insert .. i-1] x[insert] = temp --end if end for -- exit return x end function
Testing 2,000,000 random strings of length 10, I get the following results on my machine:
msort 8.5 seconds
merge_sort 9 seconds - from allsorts.ex [1]
MergeSort 10.5 seconds
If you can inline a fast insertion sort, your code might be competitive. Otherwise, it appears I'm still the winner for [stable] sorting.
Spock
[1] original 1993 version not stable. Should use <= instead of < in this line: if compare(a[1], b[1]) < 0 then
3. Re: Merge Sort Implementation
- Posted by DerekParnell (admin) 1 month ago
- 320 views
Thanks. All good ideas. I'll play around this weekend.
4. Re: Merge Sort Implementation
- Posted by petelomax 1 month ago
- 238 views
I did post http://www.rapideuphoria.com/msort.zip but the link don't work no more.
I just uploaded my local copy (from 2017) to PCAN: http://phix.x10.mx/pmwiki/pmwiki.php?n=Main.StableSortRoutines