OpenEuphoria: Forum: Re: binary search with reference to append

Re: binary search with reference to append

new topic » goto parent » topic index » view thread » older message » newer message

Posted by jimcbrown (admin) Jun 15, 2014
2109 views

gimlet said...

Just as I will be very happy when load_map() is fixed.

What makes you think that it is (still) broken? Do you have examples to demonstrate this?

gimlet said...

The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,

jimbrown said...

This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.

gimlet said...

I said: if you haven't searched.

1 It is not possible to search an empty sequence.

I disagree. An empty sequence is the fastest to search. Just like searching an empty box.

gimlet said...

2 Therefore you can't return a position except one arbitrarily decided.

Agreed. There's no existing position to return.

gimlet said...

3 Therefore you should return NOT FOUND/NO POSITION.

Negative numbers are not valid positions. The existing function already conveys this information, one just has to check if the result is <= 0.

But, as per the jaygade point above, it also attempts to return extra information.

gimlet said...

setting mid to sp is arbitrary,

jimbrown said...

Agreed. Thinking it over, for the empty sequence case, the result should always be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.

gimlet said...

-1 is as arbitrary as any other value. Consider what the logic may have done. What relation there might be between lo and sp. -1 doesn't work any better than -sp.

It would seem to work best when sp = 1. Anyways, conceded.

jimbrown said...

Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.

gimlet said...

Exactly. Just accept that if you don't know, a library function can't know. Arbitrary decisions are in general wrong. So accept empty sequence or sp > ep => NOT FOUND/NO POSITION

Undecided != Don't know.

gimlet said...

A point which you seem to have missed. If an empty sequence is passed to bsearch: besearch does not know anything about the sequence, whether it is the whole thing or a slice.

Irrelevant. bsearch should not care about this.

gimlet said...

If sp and ep have values bsearch cannot know anything about those values except in relation to the (empty) sequence and each other.

Agreed ... but what of it?

gimlet said...

The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.

jimbrown said...

GIGO?

[quote gimlet] No. Design by contract

jaygade said...

If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.

gimlet said...

No. that is in contradiction of
the caller should check the inputs.
The called function is at liberty to
plough ahead or whatever.

jimbrown said...

Sounds like GIGO.

gimlet said...

Design by contract

No. Under design by contract, the called function should be checking those conditions using an assertion an throwing an error or something if they aren't validated.

Quoting wikipedia:

wikipedia said...

Many programming languages have facilities to make assertions like these. However, DbC considers these contracts to be so crucial to software correctness that they should be part of the design process. In effect, DbC advocates writing the assertions first.

When using contracts, a supplier should not try to verify that the contract conditions are satisfied; the general idea is that code should "fail hard", with contract verification being the safety net ... the supplier throws an exception to inform the client that the precondition has been broken,

Specifically, this

gimlet said...

The called function is at liberty to
plough ahead or whatever.

violates design by contract.

gimlet said...

2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.

jimbrown said...

I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.

gimlet said...

That is last part is silly. You can't seriously mean that.

Yes, I do.

gimlet said...

You are saying bsearch is a mind reader.

It's following certain rules to produce a result which logically follow from those rules.

gimlet said...

3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.

jimbrown said...

Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.

I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.

In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.

gimlet said...

In the examples at the end I am returning 0 to stand for NOTFOUND/NOPOSITION. This functions perfectly well.

Agreed. Other than the jaygade point (and the drop in slicing functionality), this works perfectly.

gimlet said...

Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.

jimbrown said...

lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.

gimlet said...

Have your way if you like. My premises stand.

Fine. Here are the premises:

1. Messing with parameters causes the culprit function to fail in MWP-style. 2. bsearch messes with lo/hi. 3. lo/hi are not parameters. 4. sp/ep are parameters.

How do you reach the conclusion:

1. bsearch fails in MWP-style?

jaygade said...

Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?

gimlet said...

The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.

jimbrown said...

I find it odd that you say this but then have the callee check bounds in your later examples in the same post.

gimlet said...

The boundary checking is to ensure that a valid position is returned. A position value outside of the searched range is not acceptable as it implies the value was not in the range A[sp] <= x <=A[ep]. It only happens in the case of a failed search.

Yes, this makes sense and is reasonable. However, you have not resolved the contradiction or stated that this case is one (of many?) possible exceptions to your rule:

gimlet said...

The point is that the callee checking creates problems.

gimlet said...

2 Binary search does not require a sorted array.
If you consider something like:

   A[lo] < x < A[hi]         -- lo < hi 
   m = mid(lo,hi) 
-- We have one of 
   A[m] = x 
   A[lo] < x < A[m] 
   A[m] < x < A[hi]

and so on

jimbrown said...

That's only true if it's sorted in a particular direction. Here's a counterexample:

   -- lo = 1 
   -- hi = 3 
   -- x = 2 
   -- A = {3, 1.5, 1} 
   A[lo] < x < A[hi]         -- lo < hi 
   -- but this is not true! 
   -- A[lo] = A[1] = 3 
   -- A[hi] = A[3] = 1 
   -- 3 < 2 -- false! 
   -- 2 < 1 -- false! 
   m = mid(lo,hi) 
   -- m now equals 2 
-- We have one of 
   A[m] = x -- false, A[m] = A[2] = 1.5 != 2 
   A[lo] < x < A[m] -- also false 
   A[m] < x < A[hi] -- also false

gimlet said...

Not a counter example:

Yes it is. It is a counter-example to what you said:

gimlet said...

2 Binary search does not require a sorted array.

gimlet said...

You invalidate the start condition A[lo] < x < A[hi]

Yes, I did that. Because that start condition is a veiled form of saying that the array must be sorted. (I guess it's possible to have a sequence that's not completely sorted, which for the right x would still work correctly, as long as all other values touched coincidently happen to have the right order. But sorting is generally the only way to guarantee this for any arbiturary sequence/array/list/etc.)

gimlet said...

and then claim that because binary search fails with an
invalid start condition that it would fail with valid
start conditions!

I never said this, I just said that binary search fails with invalid start conditions (one of which is the array being sorted). Now, I never said that binary search would fail with valid start conditions.

gimlet said...

Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.

jimbrown said...

I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.

gimlet said...

Well: passing a slice rather than a sequence and start and end points is cleaner, lays the responsibility for bounds checking on the caller

Agreed, so far.

gimlet said...

and, loses no functionality.

I disagree. bsearch is losing the functionality to do slicing on behalf of the caller, and potentially the chance to do bounds checking and the life on behalf of the caller.

You may argue that bsearch should not be responsible for this sort of functionality ... and you may be right. We already have other stdlib functions that could do this anyways.

Anyways, I figured that there was probably a good reason for bsearch to work this way, but as original author of the code seems to have changed this mind about it, I'll defer to him: http://openeuphoria.org/forum/m/124611.wc

(Actually, I'm now wondering if the main reason for binary_search() to have slice() functionality was for next_prime() - http://scm.openeuphoria.org/hg/euphoria/file/d0ab9cc672b8/include/primes.e )

gimlet said...

Regardless of of the details, I think that what I have shown is:

1 the logic was incorrect.

You have not demonstrated this, except in cases where sp/ep are invalid.

gimlet said...

2 the handling of empty sequences, sp and ep was incorrect.

Again, only in cases where sp/ep are invalid.

gimlet said...

3 that you, jaygade and many others have trouble with the way bsearch in its original form works.

Agreed. No one seems to like how the function handles the cases where sp/ep are invalid. In addition, there's some wavering on the support for the jaygade point as well.

gimlet said...

This is not surprising as bsearch was written with assumptions about all kinds of things built in which need to untangled.

If the end result is that the library function is changed so it doesn't make assumptions I will be very happy.

You are allowing impossible requirements to tie you in knots.

Actually, I think most of the disagreement stands on the jaygade point. This is a design/functionality issue rather than a logical point.

Now, I trust that there was a good reason to have it in the first place, but I'll let the original author explain: http://scm.openeuphoria.org/hg/euphoria/rev/2036d2a1151f

gimlet said...

You have misunderstood.

jimcbrown said...

No, I haven't.

gimlet said...

I wasn't addressing that to you, more to Jaygade.
However you have not demonstrated that you understand.

This is just a veiled way of stating that I do not understand.

I contend that I have demonstrated that I understand.

In fact, I could go further, and claim that, by your missteps (not understanding binary search and sorting arrays, advocating and then contradicting design by contract) that you understand less.

But this is all besides the point. You are saying that binary_search() is fine, except we should drop the slicing functionality and the jaygade point, as you think that seem to overcomplicate things.

I don't have a strong opinion either way, so I'm more than happy to let the original author make the call.

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: binary search with reference to append

Search

Include:

Quick Links

User menu

Misc Menu