Re: Merge Sort Implementation

new topic     » goto parent     » topic index » view thread      » older message » newer message
DerekParnell said...
-- The worst case is data which is already sorted in the opposite way. This creates 
-- the same number of initial subsets as there are elements to be sorted. 
-- 
-- The best case is for data already sorted. This creates only one initial subset. 

This is where I would start optimising. A fast scan for linearly ascending or descending elements, could be done as part of the subset marking phase. Descending blocks get immediately reversed. You should have a minimum subset size rather than allow tiny blocks. Another option could be to first run a fixed length insertion_sort() over the entire block.

I have also experimented with merge_sort() implementations. I did post http://www.rapideuphoria.com/msort.zip but the link don't work no more. It was basically standard merge_sort() but used a fast insertion_sort() for lists of 100 or less elements - which is basically what Timsort does. Since standard merge_sort() is a top-down divide and conquer approach, recursing down only to a certain point makes the optimisation seem very natural.

I would recommend supplementing your MergeSort() with the following routine [which you obviously have to adapt]. And somewhere in MergeSort() put the equivalent of :

        -- init 
	integer n = length(x) 
 
	-- fast sort for small sequences 
	if n <= 100 then 
		return insertion_sortXX(x) 
	end if 
 


 
function insertion_sortXX(sequence x) -- insertion sort on steroids 
	object temp 
	integer insert, hi, mid 
 
	-- loop along UNsorted list 
	for i = 2 to length(x) do 
 
		-- get the first element in the unsorted section 
		temp = x[i] 
 
		insert = 1 -- start of sorted section 
		hi = i-1 -- end of sorted section 
 
		-- find the insertion point 
 
		-- binary search? 
		if i > 15 then 
			while insert <= hi do -- starts at top of sorted list and go to the bottom 
				mid = floor((insert + hi) / 2) 
				if compare(temp, x[mid]) < 0 then 
					hi = mid - 1 
				else 
					insert = mid + 1 
				end if 
			end while 
 
		-- linear search? 
		else 
			while hi do 
				if compare( temp, x[hi] ) >= 0 then 
					insert = hi+1 
					exit 
				end if 
				hi -= 1 
			end while 
 
		end if 
 
		-- shuffle & insert, but only if needed 
		--if insert < i then 
			x[insert+1 .. i] = x[insert .. i-1] 
			x[insert] = temp 
		--end if 
 
	end for 
 
	-- exit 
	return x 
 
end function 

Testing 2,000,000 random strings of length 10, I get the following results on my machine:

msort 8.5 seconds
merge_sort 9 seconds - from allsorts.ex [1]
MergeSort 10.5 seconds

If you can inline a fast insertion sort, your code might be competitive. Otherwise, it appears I'm still the winner for [stable] sorting.

Spock

[1] original 1993 version not stable. Should use <= instead of < in this line: if compare(a[1], b[1]) < 0 then

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu