binary_find()
- Posted by CChris <christian.cuvier at agricultu?e.g?uv.fr> Aug 05, 2007
- 513 views
Here is the fastest overall routine I have found to search a sorted sequence. Duplicates are allowed, but, when there are, any index in the contiguous range can be returned. If the sequence does not have the monootony passed in the 3rd argument, results are undefined. It easy to check the monotony of a sequence, but slow.
function monotony(sequence s) object x,y integer slope,p if length(s)<2 then return -2 -- trivially constant sequence end if p=3 slope=compare(s[2],s[1]) while not slope and p<=length(s) do slope=compare(s[p],x) p+=1 end while if not slope then return -2 -- constant sequence end if x=s[p-1] for i=p to length(s) do y=s[i] if compare(x,y)=slope then return 0 -- not monotonic end if x=y end for return slope -- -1 for decreasing, 1 for increasing end function global function binary_find(object x,sequence s,integer slope) -- finds x in a supposedly sorted sequence s. -- if x is not in s, return -n, where x is between s[n-1] and s[n] -- else, if x=s[n], n is returned integer lo_bound, hi_bound,new_bound,c hi_bound=length(s) if hi_bound>=2 then lo_bound=1 while hi_bound>lo_bound do -- >= always holds new_bound=floor((hi_bound+lo_bound)/2) c=compare(x,s[new_bound]) if c=0 then return new_bound elsif c=slope then lo_bound = new_bound+1 -- but x may lie between s[new_bound] and s[new_bound+1] -> too many useless iterations else hi_bound = new_bound-1 -- but x may lie between s[new_bound-1] and s[new_bound] -> too many useless iterations end if end while -- x is in the vicinity of s[hi_bound] -- if one of the exceptional cases occurred, then the loop wastefully iterated -- and we need to test like when length(s)=1 elsif not hi_bound then return 0 end if -- case length(s)=1 is the same as general endgame c=compare(x,s[hi_bound]) if c=slope then return -hi_bound-1 elsif c=0 then return hi_bound else return -hi_bound end if end function
Outlier detection appears to be too expensive, unless you expect x to be routinely not between s[1] and s[$]. I had tried to add a 4th argument to control whether the routine would return any duplicate index, the lowest one or the highest one. The penalty on execution time is 15%, so I didn't add it. If it is needed, I think a separate function should handle that more general case. On my 3.2GHz P4, the average execution time stands around 0.6 microsecond for s of length 10,000. This function is probably best located in sort.e. CChris