Re: binary search with reference to append
- Posted by DerekParnell (admin) Jun 15, 2014
- 1904 views
Ok .. ok .. ok ... I get the hint.
Given that we shouldn't change the API, here might be a cleaner version of binary search.
ifdef DEBUG_MODE then include std/error.e with trace end ifdef --** -- finds a "needle" in an ordered "haystack". Start and end point can be given for the search. -- This assumes that the sequence to search through is already in strict ascending order. -- -- Parameters: -- # ##needle## : an object to look for -- # ##haystack## : a sequence to search in -- # ##start_point## : an integer, the index at which to start searching. Defaults to 1. -- # ##end_point## : an integer, the end point of the search. Defaults to 0, ie search to end. -- -- Returns: -- An **integer**, either: -- # zero ##i##, which means that the slice specified by the ##start_point## -- and ##end_point## does not exist. -- # a positive integer ##i##, which means ##haystack[i]## equals ##needle##. -- # a negative integer, ##-i##, which means that the ##needle## is not in the -- part of the ##haystack## that was searched. The absolute value returned is -- an indication of where the item might have been in the search area, had it existed. -- If the value is the same as the starting point, then the item would have existed -- somewhere before the first item. If the value is after the last item then -- it would have existed somewhere after the last item searched. Otherwise -- the value is the index of where you could insert the ##needle## into the ##haystack##. -- -- Comments: -- * If ##end_point## is less than zero, it represents an offset from the end -- of ##haystack##. For example -1 is one back from the last item. -- * If the starting point is less than 1, it is assumed to be 1. -- * If the ending point is after the last item in the ##haystack##, it is assumed to be the end of the ##haystack##. -- * The way this function returns is very similar to what [[:db_find_key]] -- does. They use variants of the same algorithm. The latter is all the -- more efficient as ##haystack## is long. -- * ##haystack## is assumed to be in ascending order. Results are undefined -- if it is not. -- * If duplicate copies of ##needle## exist in the range searched on -- ##haystack##, the lowest index for the ##needle## is always returned. -- -- See Also: -- [[:find]], [[:db_find_key]] public function binary_search(object needle, sequence haystack, integer start_point = 1, integer end_point = 0) integer lo, hi, mid, c integer sp, ep sp = start_point if end_point <= 0 then -- End point specified is relative to the end of the sequence. ep = length(haystack) + end_point else ep = end_point end if if ep > length(haystack) then ep = length(haystack) end if if sp < 1 then sp = 1 end if if sp > ep then if sp = ep + 1 then return -sp -- Obviously the needle is never in an empty sequence/slice. end if return 0 -- Slice starts after the end. end if lo = sp hi = ep while lo <= hi do mid = floor((lo + hi) / 2) c = eu:compare(needle, haystack[mid]) if c < 0 then -- Needle is before current item so adjust the highest point. hi = mid - 1 elsif c > 0 then -- Needle is after current item so adjust the lowest point. lo = mid + 1 else -- Found it! Now check if its the first occurance. for i = mid-1 to 1 by -1 do if eu:equal(needle, haystack[i]) then mid = i end if end for ifdef DEBUG_MODE then if mid != find(needle, haystack) then crash("ASSERT: find() failed") end if end ifdef return mid end if end while -- return the position it would have, if inserted now if c > 0 then mid += 1 end if ifdef DEBUG_MODE then if mid > ep then if mid > ep + 1 then crash("ASSERT: Final index beyond haystack") end if if eu:compare(needle, haystack[ep]) <= 0 then crash("ASSERT: needle not found but lower than last item") end if elsif mid < sp then crash("ASSERT: Final index before haystack") else if eu:compare(needle, haystack[mid]) >= 0 then crash("ASSERT: needle not found but higher than insertion point.") end if if mid > sp and eu:compare(needle, haystack[mid-1]) <= 0 then crash("ASSERT: needle not found but lower than insertion point.") end if end if end ifdef return -mid end function ifdef DEBUG_MODE then include std/unittest.e constant haystack = "012345678ABCDEFGHIJKLMMMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" test_equal("binary_search missing #0", -1, binary_search(' ',haystack)) test_equal("binary_search missing #1", -10, binary_search('9',haystack)) test_equal("binary_search missing #2", -1, binary_search(0,haystack)) test_equal("binary_search missing #3", -64, binary_search("rubbish",haystack)) test_equal("binary_search missing #4", -64, binary_search(9999,haystack)) test_equal("binary_search missing subset #1", -5, binary_search('0',haystack, 5, 17)) test_equal("binary_search missing subset #2", -18, binary_search('p',haystack, 5, 17)) test_equal("binary_search missing subset #3", -17, binary_search('A',haystack, 17)) test_equal("binary_search missing subset #4", -8, binary_search('A',haystack, 1, 7)) test_equal("binary_search missing subset #5", -62, binary_search('z',haystack, 5, -2)) test_equal("binary_search missing subset #6 empty slice", -10, binary_search('z',haystack, 10, 9)) test_equal("binary_search found first", 1, binary_search('0',haystack)) test_equal("binary_search found last", 63, binary_search('z',haystack)) test_equal("binary_search found middle#1", 32, binary_search('U',haystack)) test_equal("binary_search found middle#2", 33, binary_search('V',haystack)) test_equal("binary_search found middle#3", 22, binary_search('M',haystack)) test_equal("binary_search found auto sp", 32, binary_search('U',haystack, -100)) test_equal("binary_search found auto ep", 32, binary_search('U',haystack, 1, 100)) test_equal("binary_search found subset #1", 10, binary_search('A',haystack, 5, 17)) test_equal("binary_search found subset #1", 10, binary_search('A',haystack, 5, -2)) test_equal("binary_search bad slice #1", 0, binary_search('A',haystack, 17, 1)) test_equal("binary_search empty input", -1, binary_search('A',{})) test_equal("binary_search strings", 5, binary_search("cat",{"apple", "bat", "car", "cast", "cat", "category", "dog"})) test_report() end ifdef