binary_find()

new topic     » topic index » view thread      » older message » newer message

Here is the fastest overall routine I have found to search a sorted sequence.
Duplicates are allowed, but, when there are, any index in the contiguous range
can be returned. If the sequence does not have the monootony passed in the 3rd
argument, results are undefined.

It easy to check the monotony of a sequence, but slow.

function monotony(sequence s)
object x,y
integer slope,p

if length(s)<2 then
    return -2 -- trivially constant sequence
end if
p=3
slope=compare(s[2],s[1])
while not slope and p<=length(s) do
    slope=compare(s[p],x)
    p+=1
end while
if not slope then
    return -2 -- constant sequence
end if
x=s[p-1]
for i=p to length(s) do
    y=s[i]
    if compare(x,y)=slope then
        return 0 -- not monotonic
    end if
    x=y
end for
return slope -- -1 for decreasing, 1 for increasing
end function

global function binary_find(object x,sequence s,integer slope)
    -- finds x in a supposedly sorted sequence s.
    -- if x is not in s, return -n, where x is between s[n-1] and s[n]
    -- else, if x=s[n], n is returned
    integer lo_bound, hi_bound,new_bound,c

    hi_bound=length(s)
    if hi_bound>=2 then
        lo_bound=1
        while hi_bound>lo_bound do  -- >= always holds
            new_bound=floor((hi_bound+lo_bound)/2)
            c=compare(x,s[new_bound])
            if c=0 then
                return new_bound
            elsif c=slope then
lo_bound = new_bound+1 -- but x may lie between s[new_bound] and
                s[new_bound+1] -> too many useless iterations
            else
hi_bound = new_bound-1 -- but x may lie between s[new_bound-1]
                and s[new_bound] -> too many useless iterations
            end if
        end while
        -- x is in the vicinity of s[hi_bound]
-- if one of the exceptional cases occurred, then the loop wastefully
        iterated
        -- and we need to test like when length(s)=1
    elsif not hi_bound then
        return 0
    end if
    -- case length(s)=1 is the same as general endgame
    c=compare(x,s[hi_bound])
    if c=slope then
        return -hi_bound-1
    elsif c=0 then
        return hi_bound
    else
        return -hi_bound
    end if
end function

Outlier detection appears to be too expensive, unless you expect x to be
routinely not between s[1] and s[$].
I had tried to add a 4th argument to control whether the routine would return
any duplicate index, the lowest one or the highest one. The penalty on execution
time is 15%, so I didn't add it. If it is needed, I think a separate function
should handle that more general case.

On my 3.2GHz P4, the average execution time stands around 0.6 microsecond for s
of length 10,000.

This function is probably best located in sort.e.

CChris

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu