Re: binary search with reference to append

new topic     » goto parent     » topic index » view thread      » older message » newer message

Ok .. ok .. ok ... I get the hint.

Given that we shouldn't change the API, here might be a cleaner version of binary search.

ifdef DEBUG_MODE then 
include std/error.e 
with trace 
end ifdef 
 
--** 
-- finds a "needle" in an ordered "haystack". Start and end point can be given for the search. 
-- This assumes that the sequence to search through is already in strict ascending order. 
-- 
-- Parameters: 
--      # ##needle## : an object to look for 
--      # ##haystack## : a sequence to search in 
--      # ##start_point## : an integer, the index at which to start searching. Defaults to 1. 
--      # ##end_point## : an integer, the end point of the search. Defaults to 0, ie search to end. 
-- 
-- Returns: 
-- An **integer**, either: 
-- # zero ##i##, which means that the slice specified by the ##start_point## 
-- and ##end_point## does not exist. 
-- # a positive integer ##i##, which means ##haystack[i]## equals ##needle##. 
-- # a negative integer, ##-i##, which means that the ##needle## is not in the  
-- part of the ##haystack## that was searched. The absolute value returned is  
-- an indication of where the item might have been in the search area, had it existed. 
-- If the value is the same as the starting point, then the item would have existed 
-- somewhere before the first item. If the value is after the last item then 
-- it would have existed somewhere after the last item searched. Otherwise 
-- the value is the index of where you could insert the ##needle## into the ##haystack##. 
-- 
-- Comments: 
-- * If ##end_point## is less than zero, it represents an offset from the end 
-- of ##haystack##. For example -1 is one back from the last item. 
-- * If the starting point is less than 1, it is assumed to be 1. 
-- * If the ending point is after the last item in the ##haystack##, it is assumed to be the end of the ##haystack##. 
-- * The way this function returns is very similar to what [[:db_find_key]] 
--   does. They use variants of the same algorithm. The latter is all the 
--   more efficient as ##haystack## is long. 
-- * ##haystack## is assumed to be in ascending order. Results are undefined 
--   if it is not.  
-- * If duplicate copies of ##needle## exist in the range searched on 
--   ##haystack##, the lowest index for the ##needle## is always returned. 
-- 
-- See Also: 
-- [[:find]], [[:db_find_key]] 
 
public function binary_search(object needle, sequence haystack, integer start_point = 1, integer end_point = 0) 
    integer lo, hi, mid, c 
    integer sp, ep 
 
     
    sp = start_point 
    if end_point <= 0 then 
        -- End point specified is relative to the end of the sequence. 
        ep = length(haystack) + end_point 
    else 
        ep = end_point 
    end if 
     
    if ep > length(haystack) then 
        ep = length(haystack) 
    end if 
    if sp < 1 then 
        sp = 1 
    end if 
    if sp > ep then 
        if sp = ep + 1 then 
            return -sp  -- Obviously the needle is never in an empty sequence/slice. 
        end if 
        return 0  -- Slice starts after the end. 
    end if 
     
    lo = sp 
    hi = ep 
    while lo <= hi do 
        mid = floor((lo + hi) / 2) 
        c = eu:compare(needle, haystack[mid]) 
        if c < 0 then 
            -- Needle is before current item so adjust the highest point. 
            hi = mid - 1 
        elsif c > 0 then 
            -- Needle is after current item so adjust the lowest point. 
            lo = mid + 1 
        else 
            -- Found it! Now check if its the first occurance. 
            for i = mid-1 to 1 by -1 do 
                if eu:equal(needle, haystack[i]) then 
                    mid  = i 
                end if 
            end for 
            ifdef DEBUG_MODE then 
               if mid != find(needle, haystack) then crash("ASSERT: find() failed") end if 
            end ifdef 
            return mid 
        end if 
    end while 
     
    -- return the position it would have, if inserted now 
    if c > 0 then 
        mid += 1 
    end if 
    ifdef DEBUG_MODE then 
 
        if mid > ep then 
            if mid > ep + 1 then 
                crash("ASSERT: Final index beyond haystack") 
            end if 
           if eu:compare(needle, haystack[ep]) <= 0 then crash("ASSERT: needle not found but lower than last item") end if 
        elsif mid < sp then 
            crash("ASSERT: Final index before haystack") 
        else 
           if eu:compare(needle, haystack[mid]) >= 0 then crash("ASSERT: needle not found but higher than insertion point.") end if 
           if mid > sp and eu:compare(needle, haystack[mid-1]) <= 0 then crash("ASSERT: needle not found but lower than insertion point.") end if 
             
        end if 
         
    end ifdef 
     
    return -mid 
end function 
 
ifdef DEBUG_MODE then 
include std/unittest.e 
constant haystack = "012345678ABCDEFGHIJKLMMMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 
test_equal("binary_search missing #0", -1, binary_search(' ',haystack)) 
test_equal("binary_search missing #1", -10, binary_search('9',haystack)) 
test_equal("binary_search missing #2", -1, binary_search(0,haystack)) 
test_equal("binary_search missing #3", -64, binary_search("rubbish",haystack)) 
test_equal("binary_search missing #4", -64, binary_search(9999,haystack)) 
 
test_equal("binary_search missing subset #1", -5, binary_search('0',haystack, 5, 17)) 
test_equal("binary_search missing subset #2", -18, binary_search('p',haystack, 5, 17)) 
test_equal("binary_search missing subset #3", -17, binary_search('A',haystack, 17)) 
test_equal("binary_search missing subset #4", -8, binary_search('A',haystack, 1, 7)) 
test_equal("binary_search missing subset #5", -62, binary_search('z',haystack, 5, -2)) 
test_equal("binary_search missing subset #6 empty slice", -10, binary_search('z',haystack, 10, 9)) 
 
test_equal("binary_search found first", 1, binary_search('0',haystack)) 
test_equal("binary_search found last", 63, binary_search('z',haystack)) 
test_equal("binary_search found middle#1", 32, binary_search('U',haystack)) 
test_equal("binary_search found middle#2", 33, binary_search('V',haystack)) 
test_equal("binary_search found middle#3", 22, binary_search('M',haystack)) 
test_equal("binary_search found auto sp", 32, binary_search('U',haystack, -100)) 
test_equal("binary_search found auto ep", 32, binary_search('U',haystack, 1, 100)) 
 
test_equal("binary_search found subset #1", 10, binary_search('A',haystack, 5, 17)) 
test_equal("binary_search found subset #1", 10, binary_search('A',haystack, 5, -2)) 
 
test_equal("binary_search bad slice #1", 0, binary_search('A',haystack, 17, 1)) 
 
 
test_equal("binary_search empty input", -1, binary_search('A',{})) 
 
test_equal("binary_search strings", 5, binary_search("cat",{"apple", "bat", "car", "cast", "cat", "category", "dog"})) 
 
test_report() 
end ifdef 
new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu