Merge Sort Implementation

new topic     » topic index » view thread      » older message » newer message

For an exercise, I wrote a version of the Merge Sort algorithm. Any comments?

This is the basic implementation. I've also got one that uses a custom comparator.

--------------------------------------------------- 
-- MERGESORT implementation - D J Parnell, Nov/2024 
-- 
-- This treats the data to be sorted as a set of presorted subsets. 
-- 
-- The first stage is to identify the start of each of the subsets 
-- contained in the data. A subset starts at the point where an element 
-- is less than the previous element. Note that the first subset always 
-- starts at position 1. 
-- eg.  Data: 'estavmmlodf' 
-- Subsets start at positions 1, 4, 6, 8, 10 
-- The subsets are  'est' 'av' 'mm' 'lo' 'df' 
-- 
-- The second stage is to merge pairs of adjacent subsets into a new subset. 
-- In this example we get 'aestv' 'lmmo' 'df' 
-- (Note: if the last subset has no pair, we assume an empty subset as its pair) 
-- 
-- The third stage is to recalculate the new subset starting positions. 
-- The new starting positions are simply every second starting position from the previous list of starts. 
-- In this example, the new starting positions are 1, 6, 10 
-- 
-- We then repeat stages 2 & 3 until we finally only have one pair to merge. After 
-- merging the final pair, the data is now fully sorted. 
-- 
-- 
-- Note that after each merge, we have basically halved the number of subsets to deal with. 
-- 
-- The worst case is data which is already sorted in the opposite way. This creates 
-- the same number of initial subsets as there are elements to be sorted. 
-- 
-- The best case is for data already sorted. This creates only one initial subset. 
-- 
-- This implementation works faster than the standard library sort for large sequences. 
-- 
 
---------------------------------------------- 
global function MergeSort(object pData) 
---------------------------------------------- 
 
	sequence lSubLists 
	integer lSubListEnd 
 
	-- Special case: The data is an atom. 
	if atom(pData) then 
		return pData 
	end if 
 
	-- Find the first lot of sub-lists -- 
	-- For speed, I want to avoid appending. 
	lSubLists = repeat(1, length(pData)+1)  -- Maximum possible number of sublists plus 1. 
	lSubListEnd = 1 
 
	-- Examine each element and whenever the current element is 
	-- lower than the previous element, start a new sublist. 
	for i = 2 to length(pData) do 
		if compare(pData[i], pData[i-1]) < 0 then 
			-- Start a new sublist at this point. 
			lSubListEnd += 1 
			lSubLists[lSubListEnd] = i 
 
		end if 
	end for 
 
	if lSubListEnd < 2 then 
		-- Special case: One sublist, so the data is already sorted. 
		return pData 
	end if 
 
	while 1 do 
 
		-- Compare the sub lists 
		integer x,y 
		integer xe,ye 
		object lMerged 
		integer t 
 
		for z =1 to lSubListEnd - 1 by 2 do 
 
			x = lSubLists[z]  -- Grab start of SubList X 
			y = lSubLists[z+1]  -- Grab start of SubList Y 
			xe = y-1  -- Calc end of SublistX 
 
			if z <= lSubListEnd - 2 then 
				ye = lSubLists[z+2] -1  -- Calc end of SublistY 
			else 
				ye = length(pData) -- Calc end of SublistY 
			end if 
 
			-- Merge the two sublists -- 
			lMerged = repeat(0, xe-x+ye-y + 2)  -- Pre-allocate space for the merged lists. 
 
			t = 1 -- Start at merge spot 1 
			while x <= xe do 
				if y <= ye then 
					-- Still got an element in each sublist to compare. 
					-- Move the not-larger element to the next merge spot. 
					if compare(pData[x], pData[y]) <= 0 then -- equality check to make it a stable sort. 
						lMerged[t] = pData[x]  -- Y is larger or equal so move X element 
						x += 1 -- Bump to next X element to compare 
					else 
						lMerged[t] = pData[y]  -- X is larger so move Y element 
						y += 1 -- Bump to next Y element to compare 
					end if 
					t += 1   -- Bump to next merge spot 
 
				else 
					-- List Y is ended but List X hasn't, so 
					-- remainder of X is all greater than last Y entry. 
  					lMerged[t..$] = pData[x..xe] 
					exit 
				end if 
 
			end while  -- Repeat until we have tested all the X elements. 
 
			if y <= ye then --(there might be some Y elements left) 
				-- List X is ended but List Y hasn't, so 
				-- remainder of Y is all greater than last X entry. 
  				lMerged[t..$] = pData[y..ye] 
			end if 
 
			-- Replace the X and Y sublists with their merged set. 
			pData[lSubLists[z]..ye] = lMerged 
 
		end for -- Repeat for the next pair of sublists. 
 
		if lSubListEnd = 2 then 
			-- Finally, just a pair of sublists were merged, so 
			exit  -- All done. 
		end if 
 
		-- Get next set of sublists now that the data has been partially sorted. 
		-- We can use the fact that the previous sublist pairs are already merged. 
		integer lHEAD, lTAIL 
		lHEAD = 0 
		lTAIL = 1 
		while lTAIL <= lSubListEnd do 
			lHEAD += 1    -- Bump to next sublist start slot 
			lSubLists[lHEAD] = lSubLists[lTAIL] 
			lTAIL += 2    -- Bump to next start of a pair of old sublists 
		end while 
		lSubListEnd = lHEAD  -- Update new end of the sublists. 
 
	end while 
 
	return pData  -- All done. 
 
end function 
 
new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu