1. Merge Sort Implementation

For an exercise, I wrote a version of the Merge Sort algorithm. Any comments?

This is the basic implementation. I've also got one that uses a custom comparator.

--------------------------------------------------- 
-- MERGESORT implementation - D J Parnell, Nov/2024 
-- 
-- This treats the data to be sorted as a set of presorted subsets. 
-- 
-- The first stage is to identify the start of each of the subsets 
-- contained in the data. A subset starts at the point where an element 
-- is less than the previous element. Note that the first subset always 
-- starts at position 1. 
-- eg.  Data: 'estavmmlodf' 
-- Subsets start at positions 1, 4, 6, 8, 10 
-- The subsets are  'est' 'av' 'mm' 'lo' 'df' 
-- 
-- The second stage is to merge pairs of adjacent subsets into a new subset. 
-- In this example we get 'aestv' 'lmmo' 'df' 
-- (Note: if the last subset has no pair, we assume an empty subset as its pair) 
-- 
-- The third stage is to recalculate the new subset starting positions. 
-- The new starting positions are simply every second starting position from the previous list of starts. 
-- In this example, the new starting positions are 1, 6, 10 
-- 
-- We then repeat stages 2 & 3 until we finally only have one pair to merge. After 
-- merging the final pair, the data is now fully sorted. 
-- 
-- 
-- Note that after each merge, we have basically halved the number of subsets to deal with. 
-- 
-- The worst case is data which is already sorted in the opposite way. This creates 
-- the same number of initial subsets as there are elements to be sorted. 
-- 
-- The best case is for data already sorted. This creates only one initial subset. 
-- 
-- This implementation works faster than the standard library sort for large sequences. 
-- 
 
---------------------------------------------- 
global function MergeSort(object pData) 
---------------------------------------------- 
 
	sequence lSubLists 
	integer lSubListEnd 
 
	-- Special case: The data is an atom. 
	if atom(pData) then 
		return pData 
	end if 
 
	-- Find the first lot of sub-lists -- 
	-- For speed, I want to avoid appending. 
	lSubLists = repeat(1, length(pData)+1)  -- Maximum possible number of sublists plus 1. 
	lSubListEnd = 1 
 
	-- Examine each element and whenever the current element is 
	-- lower than the previous element, start a new sublist. 
	for i = 2 to length(pData) do 
		if compare(pData[i], pData[i-1]) < 0 then 
			-- Start a new sublist at this point. 
			lSubListEnd += 1 
			lSubLists[lSubListEnd] = i 
 
		end if 
	end for 
 
	if lSubListEnd < 2 then 
		-- Special case: One sublist, so the data is already sorted. 
		return pData 
	end if 
 
	while 1 do 
 
		-- Compare the sub lists 
		integer x,y 
		integer xe,ye 
		object lMerged 
		integer t 
 
		for z =1 to lSubListEnd - 1 by 2 do 
 
			x = lSubLists[z]  -- Grab start of SubList X 
			y = lSubLists[z+1]  -- Grab start of SubList Y 
			xe = y-1  -- Calc end of SublistX 
 
			if z <= lSubListEnd - 2 then 
				ye = lSubLists[z+2] -1  -- Calc end of SublistY 
			else 
				ye = length(pData) -- Calc end of SublistY 
			end if 
 
			-- Merge the two sublists -- 
			lMerged = repeat(0, xe-x+ye-y + 2)  -- Pre-allocate space for the merged lists. 
 
			t = 1 -- Start at merge spot 1 
			while x <= xe do 
				if y <= ye then 
					-- Still got an element in each sublist to compare. 
					-- Move the not-larger element to the next merge spot. 
					if compare(pData[x], pData[y]) <= 0 then -- equality check to make it a stable sort. 
						lMerged[t] = pData[x]  -- Y is larger or equal so move X element 
						x += 1 -- Bump to next X element to compare 
					else 
						lMerged[t] = pData[y]  -- X is larger so move Y element 
						y += 1 -- Bump to next Y element to compare 
					end if 
					t += 1   -- Bump to next merge spot 
 
				else 
					-- List Y is ended but List X hasn't, so 
					-- remainder of X is all greater than last Y entry. 
  					lMerged[t..$] = pData[x..xe] 
					exit 
				end if 
 
			end while  -- Repeat until we have tested all the X elements. 
 
			if y <= ye then --(there might be some Y elements left) 
				-- List X is ended but List Y hasn't, so 
				-- remainder of Y is all greater than last X entry. 
  				lMerged[t..$] = pData[y..ye] 
			end if 
 
			-- Replace the X and Y sublists with their merged set. 
			pData[lSubLists[z]..ye] = lMerged 
 
		end for -- Repeat for the next pair of sublists. 
 
		if lSubListEnd = 2 then 
			-- Finally, just a pair of sublists were merged, so 
			exit  -- All done. 
		end if 
 
		-- Get next set of sublists now that the data has been partially sorted. 
		-- We can use the fact that the previous sublist pairs are already merged. 
		integer lHEAD, lTAIL 
		lHEAD = 0 
		lTAIL = 1 
		while lTAIL <= lSubListEnd do 
			lHEAD += 1    -- Bump to next sublist start slot 
			lSubLists[lHEAD] = lSubLists[lTAIL] 
			lTAIL += 2    -- Bump to next start of a pair of old sublists 
		end while 
		lSubListEnd = lHEAD  -- Update new end of the sublists. 
 
	end while 
 
	return pData  -- All done. 
 
end function 
 
new topic     » topic index » view message » categorize

2. Re: Merge Sort Implementation

DerekParnell said...
-- The worst case is data which is already sorted in the opposite way. This creates 
-- the same number of initial subsets as there are elements to be sorted. 
-- 
-- The best case is for data already sorted. This creates only one initial subset. 

This is where I would start optimising. A fast scan for linearly ascending or descending elements, could be done as part of the subset marking phase. Descending blocks get immediately reversed. You should have a minimum subset size rather than allow tiny blocks. Another option could be to first run a fixed length insertion_sort() over the entire block.

I have also experimented with merge_sort() implementations. I did post http://www.rapideuphoria.com/msort.zip but the link don't work no more. It was basically standard merge_sort() but used a fast insertion_sort() for lists of 100 or less elements - which is basically what Timsort does. Since standard merge_sort() is a top-down divide and conquer approach, recursing down only to a certain point makes the optimisation seem very natural.

I would recommend supplementing your MergeSort() with the following routine [which you obviously have to adapt]. And somewhere in MergeSort() put the equivalent of :

        -- init 
	integer n = length(x) 
 
	-- fast sort for small sequences 
	if n <= 100 then 
		return insertion_sortXX(x) 
	end if 
 


 
function insertion_sortXX(sequence x) -- insertion sort on steroids 
	object temp 
	integer insert, hi, mid 
 
	-- loop along UNsorted list 
	for i = 2 to length(x) do 
 
		-- get the first element in the unsorted section 
		temp = x[i] 
 
		insert = 1 -- start of sorted section 
		hi = i-1 -- end of sorted section 
 
		-- find the insertion point 
 
		-- binary search? 
		if i > 15 then 
			while insert <= hi do -- starts at top of sorted list and go to the bottom 
				mid = floor((insert + hi) / 2) 
				if compare(temp, x[mid]) < 0 then 
					hi = mid - 1 
				else 
					insert = mid + 1 
				end if 
			end while 
 
		-- linear search? 
		else 
			while hi do 
				if compare( temp, x[hi] ) >= 0 then 
					insert = hi+1 
					exit 
				end if 
				hi -= 1 
			end while 
 
		end if 
 
		-- shuffle & insert, but only if needed 
		--if insert < i then 
			x[insert+1 .. i] = x[insert .. i-1] 
			x[insert] = temp 
		--end if 
 
	end for 
 
	-- exit 
	return x 
 
end function 

Testing 2,000,000 random strings of length 10, I get the following results on my machine:

msort 8.5 seconds
merge_sort 9 seconds - from allsorts.ex [1]
MergeSort 10.5 seconds

If you can inline a fast insertion sort, your code might be competitive. Otherwise, it appears I'm still the winner for [stable] sorting.

Spock

[1] original 1993 version not stable. Should use <= instead of < in this line: if compare(a[1], b[1]) < 0 then

new topic     » goto parent     » topic index » view message » categorize

3. Re: Merge Sort Implementation

Thanks. All good ideas. I'll play around this weekend.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Merge Sort Implementation

Spock said...

I did post http://www.rapideuphoria.com/msort.zip but the link don't work no more.

I just uploaded my local copy (from 2017) to PCAN: http://phix.x10.mx/pmwiki/pmwiki.php?n=Main.StableSortRoutines

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu