binary_find()
Here is the fastest overall routine I have found to search a sorted sequence.
Duplicates are allowed, but, when there are, any index in the contiguous range
can be returned. If the sequence does not have the monootony passed in the 3rd
argument, results are undefined.
It easy to check the monotony of a sequence, but slow.
function monotony(sequence s)
object x,y
integer slope,p
if length(s)<2 then
return -2 -- trivially constant sequence
end if
p=3
slope=compare(s[2],s[1])
while not slope and p<=length(s) do
slope=compare(s[p],x)
p+=1
end while
if not slope then
return -2 -- constant sequence
end if
x=s[p-1]
for i=p to length(s) do
y=s[i]
if compare(x,y)=slope then
return 0 -- not monotonic
end if
x=y
end for
return slope -- -1 for decreasing, 1 for increasing
end function
global function binary_find(object x,sequence s,integer slope)
-- finds x in a supposedly sorted sequence s.
-- if x is not in s, return -n, where x is between s[n-1] and s[n]
-- else, if x=s[n], n is returned
integer lo_bound, hi_bound,new_bound,c
hi_bound=length(s)
if hi_bound>=2 then
lo_bound=1
while hi_bound>lo_bound do -- >= always holds
new_bound=floor((hi_bound+lo_bound)/2)
c=compare(x,s[new_bound])
if c=0 then
return new_bound
elsif c=slope then
lo_bound = new_bound+1 -- but x may lie between s[new_bound] and
s[new_bound+1] -> too many useless iterations
else
hi_bound = new_bound-1 -- but x may lie between s[new_bound-1]
and s[new_bound] -> too many useless iterations
end if
end while
-- x is in the vicinity of s[hi_bound]
-- if one of the exceptional cases occurred, then the loop wastefully
iterated
-- and we need to test like when length(s)=1
elsif not hi_bound then
return 0
end if
-- case length(s)=1 is the same as general endgame
c=compare(x,s[hi_bound])
if c=slope then
return -hi_bound-1
elsif c=0 then
return hi_bound
else
return -hi_bound
end if
end function
Outlier detection appears to be too expensive, unless you expect x to be
routinely not between s[1] and s[$].
I had tried to add a 4th argument to control whether the routine would return
any duplicate index, the lowest one or the highest one. The penalty on execution
time is 15%, so I didn't add it. If it is needed, I think a separate function
should handle that more general case.
On my 3.2GHz P4, the average execution time stands around 0.6 microsecond for s
of length 10,000.
This function is probably best located in sort.e.
CChris
|
Not Categorized, Please Help
|
|