Re: binary search with reference to append
- Posted by gimlet Jun 14, 2014
- 1810 views
You have misunderstood.
No, I haven't.
I wasn't addressing that to you, more to Jaygade.
However you have not demonstrated that you understand.
The issues are:
The logic of the input checks is wrong.
Agreed.
The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.
I said: if you haven't searched.
1 It is not possible to search an empty sequence.
2 Therefore you can't return a position except one arbitrarily decided.
3 Therefore you should return NOT FOUND/NO POSITION.
setting mid to sp is arbitrary,
Agreed. Thinking it over, for the empty sequence case, the result should always be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.
-1 is as arbitrary as any other value. Consider what the logic may have done. What relation there might be between lo and sp. -1 doesn't work any better than -sp.
Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.
Exactly. Just accept that if you don't know, a library function can't know. Arbitrary decisions are in general wrong. So accept empty sequence or sp > ep => NOT FOUND/NO POSITION
A point which you seem to have missed. If an empty sequence is passed to bsearch: besearch does not know anything about the sequence, whether it is the whole thing or a slice. If sp and ep have values bsearch cannot know anything about those values except in relation to the (empty) sequence and each other.
You are allowing impossible requirements to tie you in knots.
The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.
GIGO?
No. Design by contract
If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.
No. that is in contradiction of
the caller should check the inputs.
The called function is at liberty to
plough ahead or whatever.
Sounds like GIGO.
Design by contract
Therefore
1 Calling bsearch with an empty sequence should return NOT FOUND.
No search was made.
See the jaygade point above.
2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.
You might have a point here. I have to think about this some more.
Perhaps in this case, a NOTFOUND/INVALID/NOTARESULT/NOTAPOSITION return value would make sense. Like
public constant NOTAPOSITION = ""
I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.
That is last part is silly. You can't seriously mean that.
You are saying bsearch is a mind reader.
3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.
Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.
I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.
In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.
In the examples at the end I am returning 0 to stand for NOTFOUND/NOPOSITION. This functions perfectly well.
Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.
lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.
Have your way if you like. My premises stand.
Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?
Yes: there are.
I also concur.
Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?
The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.
I find it odd that you say this but then have the callee check bounds in your later examples in the same post.
The boundary checking is to ensure that a valid position is returned. A position value outside of the searched range is not acceptable as it implies the value was not in the range A[sp] <= x <=A[ep]. It only happens in the case of a failed search.
2 Binary search does not require a sorted array.
If you consider something like:
A[lo] < x < A[hi] -- lo < hi m = mid(lo,hi) -- We have one of A[m] = x A[lo] < x < A[m] A[m] < x < A[hi]
and so on
That's only true if it's sorted in a particular direction. Here's a counterexample:
-- lo = 1 -- hi = 3 -- x = 2 -- A = {3, 1.5, 1} A[lo] < x < A[hi] -- lo < hi -- but this is not true! -- A[lo] = A[1] = 3 -- A[hi] = A[3] = 1 -- 3 < 2 -- false! -- 2 < 1 -- false! m = mid(lo,hi) -- m now equals 2 -- We have one of A[m] = x -- false, A[m] = A[2] = 1.5 != 2 A[lo] < x < A[m] -- also false A[m] < x < A[hi] -- also false
Not a counter example:
You invalidate the start condition A[lo] < x < A[hi]
and then claim that because binary search fails with an
invalid start condition that it would fail with valid
start conditions!
Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.
I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.
Well: passing a slice rather than a sequence and start and end points is cleaner, lays the responsibility for bounds checking on the caller and, loses no functionality.
One still has to deal with passing in an empty sequence.
(I don't think this is an error).
and returning a valid position value in the case x is not found.
Those issues are determinate within bsearch.
Sounds like a win to me.
Regardless of of the details, I think that what I have shown is:
1 the logic was incorrect.
2 the handling of empty sequences, sp and ep was incorrect.
3 that you, jaygade and many others have trouble with the way bsearch in its original form works.
This is not surprising as bsearch was written with assumptions about all kinds of things built in which need to untangled.
If the end result is that the library function is changed so it doesn't make assumptions I will be very happy. Just as I will be very happy when load_map() is fixed.