1. Better median function?
- Posted by Tor Gausen <tor.gausen at C2I.NET> Jun 08, 2000
- 457 views
Here is my current median function; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ include sort.e global function median(sequence s) integer l l = length(s) if l = 0 then return 0 elsif l = 1 then return s[1] end if s = sort(s) return s[floor(l / 2) + 1] end function ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Is the sorting really necessary? Surely there must be a more efficient way to find the median value of a one-dimensional sequence ?! Thanks, Tor
2. Re: Better median function?
- Posted by Rolf Schroeder <rolf.schroeder at DESY.DE> Jun 08, 2000
- 464 views
Tor, what do you mean with 'median'? There are lots of possibilities to define median. The most commons are: function ma(sequence s) -- arithmetic mean atom sum if length[s] then sum = 0 for i = 1 to length(s) sum += s[i] end for return sum/length(s) else return s end if end function function mg(sequence s) -- geometric mean atom p -- only for s[i]>0!! intrger l l = length(s) if l then p = 1 for i = 2 to l if s[i] > 0 then p *= power(s[i],l) else return {} end if end for return p else return s end if end function function mv(sequence s) -- ~ vector length atom sum s *= s for i = 1 to length(s) sum += s[i] end for return sqrt(sum) end function The ma() is the most common. Have a nice day, Rolf Tor Gausen wrote: > > Here is my current median function; > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > include sort.e > global function median(sequence s) > integer l > l = length(s) > if l = 0 then > return 0 > elsif l = 1 then > return s[1] > end if > s = sort(s) > return s[floor(l / 2) + 1] > end function > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Is the sorting really necessary? Surely there must > be a more efficient way to find the median value > of a one-dimensional sequence ?! > > Thanks, > Tor
3. Re: Better median function?
- Posted by Rolf Schroeder <rolf.schroeder at DESY.DE> Jun 08, 2000
- 455 views
Correction of my last mail: function mg(sequence s) -- geometric mean atom p -- only for s[i]>0!! intrger l l = length(s) if l then p = s[1] -- !!!!!!!! for i = 2 to l if s[i] > 0 then p *= power(s[i],l) else return {} end if end for return p else return s end if end function
4. Re: Better median function?
- Posted by Rolf Schroeder <rolf.schroeder at DESY.DE> Jun 08, 2000
- 440 views
Correction of my last mail, Nr.: 2 (it's to early) function mg(sequence s) -- geometric mean atom p -- only for s[i]>0!! intrger l l = length(s) if l then p = 1 -- !!! for i = 1 to l -- !!! if s[i] > 0 then p *= power(s[i],l) else return {} end if end for return p else return s end if end function
5. Re: Better median function?
- Posted by Tor Gausen <tor.gausen at C2I.NET> Jun 10, 2000
- 445 views
Rolf, > Tor, what do you mean with 'median'? There are lots of > possibilities to define median. Sorry for the short delay. Thanks for the code snippets. With 'median' I mean 'statistical median', the middle value in a distribution, rather than the average. The simple (and probably inefective) way to do this is what I do now; sort the sequence and return the element in the middle. But is it really necessary to sort the entire sequence just to find which value will end up in the middle? That is my question; I'm looking for a faster way to find the median than by sorting. Thanks, Tor
6. Re: Better median function?
- Posted by Lucius Hilley III <lhilley at CDC.NET> Jun 10, 2000
- 440 views
tor.gausen at c2i.net said: > Sorry for the short delay. Thanks for the code snippets. > With 'median' I mean 'statistical median', the middle value > in a distribution, rather than the average. The simple (and > probably inefective) way to do this is what I do now; sort > the sequence and return the element in the middle. > > But is it really necessary to sort the entire sequence just > to find which value will end up in the middle? That is my > question; I'm looking for a faster way to find the median > than by sorting. > > Thanks, > Tor > No, it isn't neccessary to sort the entire sequence. Just half of it. That doesn't mean you can take the first half and sort it and take the last value. What that means is you sort the entire sequence until half of it has been completely sorted. Hmm... The insert sort and bubble sort are the only 2 type of sorts that I know of that can do this. They are also the slowest sorts possible. So, I believe sorting the entire sequence is going to with something close to the quick sort in speed is going to be faster than using either of those much slower sorts. I do know of a hybrid^ sort that would only ensure the top half to be sorted but I don't know and doubt that it would be any faster than sorting the entire sequence. I can explain this hybrid^ sorting method if you are interested. I created it myself for sorting things that are too large to fit into memory. I doubt I am the first to come up with it but never the less. I did come up with it on my own. :) ^hybrid sort is NOT the name of the sorting method. I am ok!!! A friend of mine has been in jail and is prossibly going through a divorce now. Lucius L. Hilley III
7. Re: Better median function?
- Posted by Jiri Babor <J.Babor at GNS.CRI.NZ> Jun 12, 2000
- 463 views
Tor wrote: > Here is my current median function; > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > include sort.e > global function median(sequence s) > integer l > l = length(s) > if l = 0 then > return 0 > elsif l = 1 then > return s[1] > end if > s = sort(s) > return s[floor(l / 2) + 1] > end function > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Is the sorting really necessary? Surely there must > be a more efficient way to find the median value > of a one-dimensional sequence ?! No, the sort is not necessary, and yes, there are *many* more efficient ways to do it. Just one of them is shown below, but I can think of at least one more potentially much faster! The presented solution is rather long and cumbersome, because I have not bothered to streamline or optimize it terribly much, but it is still typically considerably faster than the function with the sort (and I optimized that one!). And it gets better as the sequences get longer! Enjoy. jiri -- median.ex -- jiri babor -- jbabor at paradise.net.nz -- 00-06-12 include sort.e function odd(integer x) return and_bits(x,1) end function function min(sequence s) atom a integer len len = length(s) if not len then puts(1, "min error: empty sequence") abort(1) elsif len = 1 then return s[1] end if a = s[1] for i=2 to len do if s[i]<a then a = s[i] end if end for return a end function function max(sequence s) atom a integer len len = length(s) if not len then puts(1, "max error: empty sequence") abort(1) elsif len = 1 then return s[1] end if a = s[1] for i=2 to len do if s[i]>a then a = s[i] end if end for return a end function function split(sequence s) sequence s1,s2 atom a,si integer n n = 1 s1 = {} s2 = {} a = s[1] for i=2 to length(s) do si = s[i] if si<a then s1 &= si elsif si>a then s2 &= si else n += 1 end if end for return {s1,s2,a,n} end function function median_with_sort(sequence s) integer len,n len = length(s) n = floor((len+1)/2) if not len then return 0 end if if len > 2 then s = sort(s) end if if and_bits(len,1) then -- odd return s[n] end if return (s[n] + s[n+1])/2 end function function median(sequence s) sequence r integer len,l1,l2,n1,n2,halflength len = length(s) halflength = floor(len/2) l1 = 0 l2 = 0 r = split(s) n1 = length(r[1]) n2 = length(r[2]) while 1 do if (n1+l1) > halflength then l2 += n2 + r[4] r = split(r[1]) elsif (n2+l2) > halflength then l1 += n1 + r[4] r = split(r[2]) else exit end if n1 = length(r[1]) n2 = length(r[2]) end while if odd(len) then return r[3] end if if l1+length(r[1]) = halflength then return (r[3] + max(r[1]))/2 elsif l2+length(r[2]) = halflength then return (r[3] + min(r[2]))/2 end if return r[3] end function -- median -- test -------------------------------------------------------------- include machine.e sequence n,N,s atom dt,t,x integer m,M n = {10,11,100,101,1000,1001} N = {100000,100000,10000,10000,1000,1000} tick_rate(200) for k=1 to length(n) do m = n[k] M = N[k] s = repeat(0,m) for i=1 to m do s[i] = rand(m)-1 end for t = time() for i=1 to M do x = median_with_sort(s) end for dt = time()-t ? dt t = time() for i=1 to M do x = median(s) end for dt = time()-t ? dt puts(1,'\n') end for