1. Error in median()

There seems to be an error in the median() function where the number of numbers of which the median is to be found is 2.

The median of 7,8,9 is 8.

The median of 7,9 is also 8.

The Euphoria function, part of the stats include, returns 7. It returns 9 if you use it to get the median of 9,7.

It is clear if you look at the code, which includes:

	if length(data_set) < 3 then 
		return data_set[1] 
	end if 

It would be better to return the average (or mean) of the dataset if the length is 2.

new topic     » topic index » view message » categorize

2. Re: Error in median()

From ... somehwere on google

To find the median, the data should be arranged in order from least to greatest. If there is an even number of items in the data set, then the median is found by taking the mean (average) of the two middlemost numbers.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Error in median()

Quite so.

I've reverted to my own function, which I wrote way back before the stats package was available. To use this, I changed median() in the code I'm writing to Median(). Simple enough.

--**************************************************************** 
--**********         global function Median       **************** 
--**************************************************************** 
global function Median (object oInput) 
-- Input: A sequence of atoms 
-- Output: The median of those values 
sequence sTemp 
integer	 iIndex 
 
if atom(oInput) then 
	return oInput 
end if 
 
if length(oInput) = 1 then 
	return oInput[1] 
end if 
 
sTemp = sort(oInput) 
	if IsOdd(length(sTemp)) then 
		iIndex = floor(length(sTemp) / 2) + 1 
		return sTemp[iIndex] 
	else 
		iIndex = length(sTemp) / 2 
		return  (sTemp[iIndex] + sTemp[iIndex + 1]) / 2 
	end if 
 
end function 
new topic     » goto parent     » topic index » view message » categorize

4. Re: Error in median()

       if IsOdd(length(sTemp)) then-- IsOdd IsNot found 
new topic     » goto parent     » topic index » view message » categorize

5. Re: Error in median()

CraigWelch said...

Quite so.

I've reverted to my own function, which I wrote way back before the stats package was available. To use this, I changed median() in the code I'm writing to Median(). Simple enough.

....

Hi Craig,

No doubt your use case differs to mine but there will be similarities. Recently I needed to use a weighted median function. This was for my graphics library to do basic image denoising. Maybe I didn't look hard enough but there didn't seem to be any simple but efficient algorithm on the internet. A naive way would be to take the list of each color channel and replicate each element based on its weight and then take the center element after sorting [you could do this as a quick 'n dirty way of weighting your median function]. But using this method, even for small kernels, the processing times soon become prohibitive. The largest kernel size I could go (using standard sorting) was a measly 7x7 pixels. This was sorting 4K elements per pixel. In a multi mega-pixel image that is not a sensible option. Fortunately, I can take advantage of counting sort using a 256 length sequence of accumulators for each of the 3 color channels. A single pass over the N x N kernel, adding the weights to each accumulator, followed by a linear search for the mid entry of each channel accumulator yields a very fast, but synthetic, median RGB value.

For highest quality I also added a function to select the centermost pixel in 3d space. There is no natural way to directly sort RGB pixels. The algorithm I use computes the accumulated distances from each pixel to all other pixels. Lowest accumulated value is deemed to be the median pixel. Unlike the other algorithm a real pixel value is actually returned. I use this code (in Orac) which is reminiscent of Bubble Sort in its n^2 time complexity. Although it's slow the quality seems to be the best. The trick to the weighting is to multiply the other pixel (see p1, p2 below) with the weights associated to a pixel:

global function weighted_median_rgb(seq s, seq wgts) 
 
	-- get the median rgb value from the list of pixels 
	-- this method is not efficient.. 
	-- but unlike *synth() it picks a value rather than compute one .. 
 
	-- the method is to compute the sum of euclidean distance between all points 
	-- lowest overall score wins 
 
	-- init 
	seq d = repeat(0, ~s) -- distances 
 
	-- loop 
	for i = 1 to ~s-1 do 
		int p1 = s.i 
		int w1 = wgts.i 
		for j = i+1 to ~s do 
			int p2 = s.j 
			int w2 = wgts.j 
			atom dist = ColourDiff2(p1, p2) 
			d.i += dist * w2 -- see how they are opposites here.. 
			d.j += dist * w1 
		end for 
	end for 
 
	-- find minimum 
	int n = 1 
	atom min = d.1 
	for i = 2 to ~d do 
		if d.i < d.n then n = i end if 
	end for 
 
	-- exit 
	return s.n 
 
end function 

Spock

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu