Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 15, 2014
- 2054 views
Just as I will be very happy when load_map() is fixed.
What makes you think that it is (still) broken? Do you have examples to demonstrate this?
The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.
I said: if you haven't searched.
1 It is not possible to search an empty sequence.
I disagree. An empty sequence is the fastest to search. Just like searching an empty box.
2 Therefore you can't return a position except one arbitrarily decided.
Agreed. There's no existing position to return.
3 Therefore you should return NOT FOUND/NO POSITION.
Negative numbers are not valid positions. The existing function already conveys this information, one just has to check if the result is <= 0.
But, as per the jaygade point above, it also attempts to return extra information.
setting mid to sp is arbitrary,
Agreed. Thinking it over, for the empty sequence case, the result should always be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.
-1 is as arbitrary as any other value. Consider what the logic may have done. What relation there might be between lo and sp. -1 doesn't work any better than -sp.
It would seem to work best when sp = 1. Anyways, conceded.
Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.
Exactly. Just accept that if you don't know, a library function can't know. Arbitrary decisions are in general wrong. So accept empty sequence or sp > ep => NOT FOUND/NO POSITION
Undecided != Don't know.
A point which you seem to have missed. If an empty sequence is passed to bsearch: besearch does not know anything about the sequence, whether it is the whole thing or a slice.
Irrelevant. bsearch should not care about this.
If sp and ep have values bsearch cannot know anything about those values except in relation to the (empty) sequence and each other.
Agreed ... but what of it?
The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.
GIGO?
[quote gimlet] No. Design by contract
If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.
No. that is in contradiction of
the caller should check the inputs.
The called function is at liberty to
plough ahead or whatever.
Sounds like GIGO.
Design by contract
No. Under design by contract, the called function should be checking those conditions using an assertion an throwing an error or something if they aren't validated.
Quoting wikipedia:
Many programming languages have facilities to make assertions like these. However, DbC considers these contracts to be so crucial to software correctness that they should be part of the design process. In effect, DbC advocates writing the assertions first.
When using contracts, a supplier should not try to verify that the contract conditions are satisfied; the general idea is that code should "fail hard", with contract verification being the safety net ... the supplier throws an exception to inform the client that the precondition has been broken,
Specifically, this
The called function is at liberty to
plough ahead or whatever.
violates design by contract.
2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.
I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.
That is last part is silly. You can't seriously mean that.
Yes, I do.
You are saying bsearch is a mind reader.
It's following certain rules to produce a result which logically follow from those rules.
3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.
Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.
I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.
In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.
In the examples at the end I am returning 0 to stand for NOTFOUND/NOPOSITION. This functions perfectly well.
Agreed. Other than the jaygade point (and the drop in slicing functionality), this works perfectly.
Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.
lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.
Have your way if you like. My premises stand.
Fine. Here are the premises:
1. Messing with parameters causes the culprit function to fail in MWP-style. 2. bsearch messes with lo/hi. 3. lo/hi are not parameters. 4. sp/ep are parameters.
How do you reach the conclusion:
1. bsearch fails in MWP-style?
Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?
The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.
I find it odd that you say this but then have the callee check bounds in your later examples in the same post.
The boundary checking is to ensure that a valid position is returned. A position value outside of the searched range is not acceptable as it implies the value was not in the range A[sp] <= x <=A[ep]. It only happens in the case of a failed search.
Yes, this makes sense and is reasonable. However, you have not resolved the contradiction or stated that this case is one (of many?) possible exceptions to your rule:
The point is that the callee checking creates problems.
2 Binary search does not require a sorted array.
If you consider something like:
A[lo] < x < A[hi] -- lo < hi m = mid(lo,hi) -- We have one of A[m] = x A[lo] < x < A[m] A[m] < x < A[hi]
and so on
That's only true if it's sorted in a particular direction. Here's a counterexample:
-- lo = 1 -- hi = 3 -- x = 2 -- A = {3, 1.5, 1} A[lo] < x < A[hi] -- lo < hi -- but this is not true! -- A[lo] = A[1] = 3 -- A[hi] = A[3] = 1 -- 3 < 2 -- false! -- 2 < 1 -- false! m = mid(lo,hi) -- m now equals 2 -- We have one of A[m] = x -- false, A[m] = A[2] = 1.5 != 2 A[lo] < x < A[m] -- also false A[m] < x < A[hi] -- also false
Not a counter example:
Yes it is. It is a counter-example to what you said:
2 Binary search does not require a sorted array.
You invalidate the start condition A[lo] < x < A[hi]
Yes, I did that. Because that start condition is a veiled form of saying that the array must be sorted. (I guess it's possible to have a sequence that's not completely sorted, which for the right x would still work correctly, as long as all other values touched coincidently happen to have the right order. But sorting is generally the only way to guarantee this for any arbiturary sequence/array/list/etc.)
and then claim that because binary search fails with an
invalid start condition that it would fail with valid
start conditions!
I never said this, I just said that binary search fails with invalid start conditions (one of which is the array being sorted). Now, I never said that binary search would fail with valid start conditions.
Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.
I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.
Well: passing a slice rather than a sequence and start and end points is cleaner, lays the responsibility for bounds checking on the caller
Agreed, so far.
and, loses no functionality.
I disagree. bsearch is losing the functionality to do slicing on behalf of the caller, and potentially the chance to do bounds checking and the life on behalf of the caller.
You may argue that bsearch should not be responsible for this sort of functionality ... and you may be right. We already have other stdlib functions that could do this anyways.
Anyways, I figured that there was probably a good reason for bsearch to work this way, but as original author of the code seems to have changed this mind about it, I'll defer to him: http://openeuphoria.org/forum/m/124611.wc
(Actually, I'm now wondering if the main reason for binary_search() to have slice() functionality was for next_prime() - http://scm.openeuphoria.org/hg/euphoria/file/d0ab9cc672b8/include/primes.e )
Regardless of of the details, I think that what I have shown is:
1 the logic was incorrect.
You have not demonstrated this, except in cases where sp/ep are invalid.
2 the handling of empty sequences, sp and ep was incorrect.
Again, only in cases where sp/ep are invalid.
3 that you, jaygade and many others have trouble with the way bsearch in its original form works.
Agreed. No one seems to like how the function handles the cases where sp/ep are invalid. In addition, there's some wavering on the support for the jaygade point as well.
This is not surprising as bsearch was written with assumptions about all kinds of things built in which need to untangled.
If the end result is that the library function is changed so it doesn't make assumptions I will be very happy.
You are allowing impossible requirements to tie you in knots.
Actually, I think most of the disagreement stands on the jaygade point. This is a design/functionality issue rather than a logical point.
Now, I trust that there was a good reason to have it in the first place, but I'll let the original author explain: http://scm.openeuphoria.org/hg/euphoria/rev/2036d2a1151f
You have misunderstood.
No, I haven't.
I wasn't addressing that to you, more to Jaygade.
However you have not demonstrated that you understand.
This is just a veiled way of stating that I do not understand.
I contend that I have demonstrated that I understand.
In fact, I could go further, and claim that, by your missteps (not understanding binary search and sorting arrays, advocating and then contradicting design by contract) that you understand less.
But this is all besides the point. You are saying that binary_search() is fine, except we should drop the slicing functionality and the jaygade point, as you think that seem to overcomplicate things.
I don't have a strong opinion either way, so I'm more than happy to let the original author make the call.