1. Better median function?
Here is my current median function;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
include sort.e
global function median(sequence s)
integer l
l = length(s)
if l = 0 then
return 0
elsif l = 1 then
return s[1]
end if
s = sort(s)
return s[floor(l / 2) + 1]
end function
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Is the sorting really necessary? Surely there must
be a more efficient way to find the median value
of a one-dimensional sequence ?!
Thanks,
Tor
2. Re: Better median function?
Tor, what do you mean with 'median'? There are lots of possibilities to
define median. The most commons are:
function ma(sequence s) -- arithmetic mean
atom sum
if length[s] then
sum = 0
for i = 1 to length(s)
sum += s[i]
end for
return sum/length(s)
else
return s
end if
end function
function mg(sequence s) -- geometric mean
atom p -- only for s[i]>0!!
intrger l
l = length(s)
if l then
p = 1
for i = 2 to l
if s[i] > 0 then
p *= power(s[i],l)
else
return {}
end if
end for
return p
else
return s
end if
end function
function mv(sequence s) -- ~ vector length
atom sum
s *= s
for i = 1 to length(s)
sum += s[i]
end for
return sqrt(sum)
end function
The ma() is the most common. Have a nice day, Rolf
Tor Gausen wrote:
>
> Here is my current median function;
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> include sort.e
> global function median(sequence s)
> integer l
> l = length(s)
> if l = 0 then
> return 0
> elsif l = 1 then
> return s[1]
> end if
> s = sort(s)
> return s[floor(l / 2) + 1]
> end function
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Is the sorting really necessary? Surely there must
> be a more efficient way to find the median value
> of a one-dimensional sequence ?!
>
> Thanks,
> Tor
3. Re: Better median function?
Correction of my last mail:
function mg(sequence s) -- geometric mean
atom p -- only for s[i]>0!!
intrger l
l = length(s)
if l then
p = s[1] -- !!!!!!!!
for i = 2 to l
if s[i] > 0 then
p *= power(s[i],l)
else
return {}
end if
end for
return p
else
return s
end if
end function
4. Re: Better median function?
Correction of my last mail, Nr.: 2 (it's to early)
function mg(sequence s) -- geometric mean
atom p -- only for s[i]>0!!
intrger l
l = length(s)
if l then
p = 1 -- !!!
for i = 1 to l -- !!!
if s[i] > 0 then
p *= power(s[i],l)
else
return {}
end if
end for
return p
else
return s
end if
end function
5. Re: Better median function?
Rolf,
> Tor, what do you mean with 'median'? There are lots of
> possibilities to define median.
Sorry for the short delay. Thanks for the code snippets.
With 'median' I mean 'statistical median', the middle value
in a distribution, rather than the average. The simple (and
probably inefective) way to do this is what I do now; sort
the sequence and return the element in the middle.
But is it really necessary to sort the entire sequence just
to find which value will end up in the middle? That is my
question; I'm looking for a faster way to find the median
than by sorting.
Thanks,
Tor
6. Re: Better median function?
tor.gausen at c2i.net said:
> Sorry for the short delay. Thanks for the code snippets.
> With 'median' I mean 'statistical median', the middle value
> in a distribution, rather than the average. The simple (and
> probably inefective) way to do this is what I do now; sort
> the sequence and return the element in the middle.
>
> But is it really necessary to sort the entire sequence just
> to find which value will end up in the middle? That is my
> question; I'm looking for a faster way to find the median
> than by sorting.
>
> Thanks,
> Tor
>
No, it isn't neccessary to sort the entire sequence. Just half of it.
That doesn't mean you can take the first half and sort it and take the last
value. What that means is you sort the entire sequence until half of it has
been completely sorted. Hmm... The insert sort and bubble sort are the only
2 type of sorts that I know of that can do this. They are also the slowest
sorts possible. So, I believe sorting the entire sequence is going to with
something close to the quick sort in speed is going to be faster than using
either of those much slower sorts. I do know of a hybrid^ sort that would
only ensure the top half to be sorted but I don't know and doubt that it
would be any faster than sorting the entire sequence.
I can explain this hybrid^ sorting method if you are interested. I created
it myself for sorting things that are too large to fit into memory. I doubt
I am the first to come up with it but never the less. I did come up with it
on my own. :)
^hybrid sort is NOT the name of the sorting method.
I am ok!!! A friend of mine has been in jail and is prossibly going through a
divorce now.
Lucius L. Hilley III
7. Re: Better median function?
Tor wrote:
> Here is my current median function;
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> include sort.e
> global function median(sequence s)
> integer l
> l = length(s)
> if l = 0 then
> return 0
> elsif l = 1 then
> return s[1]
> end if
> s = sort(s)
> return s[floor(l / 2) + 1]
> end function
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Is the sorting really necessary? Surely there must
> be a more efficient way to find the median value
> of a one-dimensional sequence ?!
No, the sort is not necessary, and yes, there are *many* more
efficient ways to do it. Just one of them is shown below, but I can
think of at least one more potentially much faster!
The presented solution is rather long and cumbersome, because I have
not bothered to streamline or optimize it terribly much, but it is
still typically considerably faster than the function with the sort
(and I optimized that one!). And it gets better as the sequences get
longer!
Enjoy. jiri
-- median.ex
-- jiri babor
-- jbabor at paradise.net.nz
-- 00-06-12
include sort.e
function odd(integer x)
return and_bits(x,1)
end function
function min(sequence s)
atom a
integer len
len = length(s)
if not len then
puts(1, "min error: empty sequence")
abort(1)
elsif len = 1 then
return s[1]
end if
a = s[1]
for i=2 to len do
if s[i]<a then
a = s[i]
end if
end for
return a
end function
function max(sequence s)
atom a
integer len
len = length(s)
if not len then
puts(1, "max error: empty sequence")
abort(1)
elsif len = 1 then
return s[1]
end if
a = s[1]
for i=2 to len do
if s[i]>a then
a = s[i]
end if
end for
return a
end function
function split(sequence s)
sequence s1,s2
atom a,si
integer n
n = 1
s1 = {}
s2 = {}
a = s[1]
for i=2 to length(s) do
si = s[i]
if si<a then
s1 &= si
elsif si>a then
s2 &= si
else
n += 1
end if
end for
return {s1,s2,a,n}
end function
function median_with_sort(sequence s)
integer len,n
len = length(s)
n = floor((len+1)/2)
if not len then return 0 end if
if len > 2 then s = sort(s) end if
if and_bits(len,1) then -- odd
return s[n]
end if
return (s[n] + s[n+1])/2
end function
function median(sequence s)
sequence r
integer len,l1,l2,n1,n2,halflength
len = length(s)
halflength = floor(len/2)
l1 = 0
l2 = 0
r = split(s)
n1 = length(r[1])
n2 = length(r[2])
while 1 do
if (n1+l1) > halflength then
l2 += n2 + r[4]
r = split(r[1])
elsif (n2+l2) > halflength then
l1 += n1 + r[4]
r = split(r[2])
else
exit
end if
n1 = length(r[1])
n2 = length(r[2])
end while
if odd(len) then return r[3] end if
if l1+length(r[1]) = halflength then
return (r[3] + max(r[1]))/2
elsif l2+length(r[2]) = halflength then
return (r[3] + min(r[2]))/2
end if
return r[3]
end function -- median
-- test --------------------------------------------------------------
include machine.e
sequence n,N,s
atom dt,t,x
integer m,M
n = {10,11,100,101,1000,1001}
N = {100000,100000,10000,10000,1000,1000}
tick_rate(200)
for k=1 to length(n) do
m = n[k]
M = N[k]
s = repeat(0,m)
for i=1 to m do
s[i] = rand(m)-1
end for
t = time()
for i=1 to M do
x = median_with_sort(s)
end for
dt = time()-t
? dt
t = time()
for i=1 to M do
x = median(s)
end for
dt = time()-t
? dt
puts(1,'\n')
end for