1. Error in median()
- Posted by CraigWelch May 05, 2020
- 1157 views
There seems to be an error in the median() function where the number of numbers of which the median is to be found is 2.
The median of 7,8,9 is 8.
The median of 7,9 is also 8.
The Euphoria function, part of the stats include, returns 7. It returns 9 if you use it to get the median of 9,7.
It is clear if you look at the code, which includes:
if length(data_set) < 3 then return data_set[1] end if
It would be better to return the average (or mean) of the dataset if the length is 2.
2. Re: Error in median()
- Posted by ChrisB (moderator) May 05, 2020
- 1157 views
From ... somehwere on google
To find the median, the data should be arranged in order from least to greatest. If there is an even number of items in the data set, then the median is found by taking the mean (average) of the two middlemost numbers.
3. Re: Error in median()
- Posted by CraigWelch May 05, 2020
- 1173 views
Quite so.
I've reverted to my own function, which I wrote way back before the stats package was available. To use this, I changed median() in the code I'm writing to Median(). Simple enough.
--**************************************************************** --********** global function Median **************** --**************************************************************** global function Median (object oInput) -- Input: A sequence of atoms -- Output: The median of those values sequence sTemp integer iIndex if atom(oInput) then return oInput end if if length(oInput) = 1 then return oInput[1] end if sTemp = sort(oInput) if IsOdd(length(sTemp)) then iIndex = floor(length(sTemp) / 2) + 1 return sTemp[iIndex] else iIndex = length(sTemp) / 2 return (sTemp[iIndex] + sTemp[iIndex + 1]) / 2 end if end function
4. Re: Error in median()
- Posted by Senator May 05, 2020
- 1148 views
if IsOdd(length(sTemp)) then-- IsOdd IsNot found
5. Re: Error in median()
- Posted by Spock May 06, 2020
- 1174 views
Quite so.
I've reverted to my own function, which I wrote way back before the stats package was available. To use this, I changed median() in the code I'm writing to Median(). Simple enough.
....
Hi Craig,
No doubt your use case differs to mine but there will be similarities. Recently I needed to use a weighted median function. This was for my graphics library to do basic image denoising. Maybe I didn't look hard enough but there didn't seem to be any simple but efficient algorithm on the internet. A naive way would be to take the list of each color channel and replicate each element based on its weight and then take the center element after sorting [you could do this as a quick 'n dirty way of weighting your median function]. But using this method, even for small kernels, the processing times soon become prohibitive. The largest kernel size I could go (using standard sorting) was a measly 7x7 pixels. This was sorting 4K elements per pixel. In a multi mega-pixel image that is not a sensible option. Fortunately, I can take advantage of counting sort using a 256 length sequence of accumulators for each of the 3 color channels. A single pass over the N x N kernel, adding the weights to each accumulator, followed by a linear search for the mid entry of each channel accumulator yields a very fast, but synthetic, median RGB value.
For highest quality I also added a function to select the centermost pixel in 3d space. There is no natural way to directly sort RGB pixels. The algorithm I use computes the accumulated distances from each pixel to all other pixels. Lowest accumulated value is deemed to be the median pixel. Unlike the other algorithm a real pixel value is actually returned. I use this code (in Orac) which is reminiscent of Bubble Sort in its n^2 time complexity. Although it's slow the quality seems to be the best. The trick to the weighting is to multiply the other pixel (see p1, p2 below) with the weights associated to a pixel:
global function weighted_median_rgb(seq s, seq wgts) -- get the median rgb value from the list of pixels -- this method is not efficient.. -- but unlike *synth() it picks a value rather than compute one .. -- the method is to compute the sum of euclidean distance between all points -- lowest overall score wins -- init seq d = repeat(0, ~s) -- distances -- loop for i = 1 to ~s-1 do int p1 = s.i int w1 = wgts.i for j = i+1 to ~s do int p2 = s.j int w2 = wgts.j atom dist = ColourDiff2(p1, p2) d.i += dist * w2 -- see how they are opposites here.. d.j += dist * w1 end for end for -- find minimum int n = 1 atom min = d.1 for i = 2 to ~d do if d.i < d.n then n = i end if end for -- exit return s.n end function
Spock