1. binary search with reference to append

This is an example of why you should require the caller validates the parameters of a called function.

This is a summarised version of the binary_search function in /std/search.e

The notes make interesting reading:

The function messes with its parameters. The result is not pretty.

function bsearch(obj x, seq A, int sp=1, int ep=0) 
	int N, lo, hi, mid, c  
	--------------------- 
	N = length(A) 
	lo = max(sp, 1)              -- lo could be > N 
	hi = min(N, ep)              -- hi could be < lo 
	hi = (lo > hi && N) ? N : hi -- hi could be > N  
	--------------------- 
	mid = sp                     -- this is unwise 
	c = 0                        -- the match (set to 0 - found) 
	 
--  at this point consider 
--     if lo > hi   we have mid = sp, c = 0, a return of -sp (sp may be -ve) 
--     if N = 0 then  
--        if lo <= hi     we have a crash. 
--        if lo > hi      we have mid = sp, c = 0, a return of -sp (sp may be -ve) 
--     if N != 0 then 
--        if lo <= hi  then   the loop executes. 
--           if lo > N     we have a crash. 
--           if lo <= N then 
--              if hi > N  we have a potential crash. 
--              else       we are OK. 
-- 
--  Perhaps this is complicated enough for you to believe there is a problem! 
-- 
--  Perhaps you may say this is all so remote because 'almost always' it suceeds  
--  (code for only a countable infinty of failures). 
 
	while lo <= hi do 
		mid = floor((lo + hi) / 2) 
		c = eu:compare(needle, haystack[mid]) 
		if c < 0 then 
			hi = mid - 1 
		elsif c > 0 then 
			lo = mid + 1 
		else 
			return mid 
		end if 
	end while 
	-- return the position it would have, if inserted now 
	if c > 0 then 
		mid += 1 
	end if 
	return -mid 
	 
--  if A was null then any return was nonsense 
--  if A was not empty was the range sp..ep valid? 
 
end function 

Tell me, in how many ways is that function wrong?

new topic     » topic index » view message » categorize

2. Re: binary search with reference to append

gimlet said...

This is an example of why you should require the caller validates the parameters of a called function.

While I agree with "you should require the caller validates the parameters of a called function", I don't think this is a good example.

gimlet said...

The function messes with its parameters.

No, it does not modify it's parameters.

gimlet said...

	lo = max(sp, 1)              -- lo could be > N 
	hi = min(N, ep)              -- hi could be < lo 
	hi = (lo > hi && N) ? N : hi -- hi could be > N  

Agreed, a sanity bounds check here would not hurt - but if sp or ep is out of bounds, it's an error. (Ideally, it'd be an exception or a flag, not an uncatchable run-time error.)

gimlet said...

	mid = sp                     -- this is unwise 

That is harmless, as in the case where lo <= hi, mid is immediately reset without the old value being used. So this is for the case where lo > hi, in which case the function returns immediately, stating that the record would be inserted into the first position. (Actually, maybe this only makes sense when dealing with the empty sequence?)

gimlet said...

	c = 0                        -- the match (set to 0 - found) 

Again, harmless. In the case where lo <= hi, c is immediately reset without the old value being used as well. Where lo > hi, c=0 is still harmless, it just means we don't adjust the offset returned by 1 (as we would do in the case where lo<=hi but the item was not found and the last object in the haystack search range < needle).

gimlet said...

--  at this point consider 
--     if lo > hi   we have mid = sp, c = 0, a return of -sp (sp may be -ve) 

What the heck is ve? Anyways, this case is designed for the empty sequence. I concede that if sp and ep are both longer than the length of the sequence, the return value here may be weird, causing logic errors or a run-time crash later on.

gimlet said...

--     if N = 0 then  
--        if lo <= hi     we have a crash. 

If both sp and ep are out of bounds. Conceded. Maybe this is the right thing to do, though, since we normally crash on out of bounds errors and it's better to fail early.

gimlet said...

--     if N != 0 then 
--           if lo > N     we have a crash. 
--           if lo <= N then 
--              if hi > N  we have a potential crash. 

So if either sp or ep are out of bounds, we could have a crash. I don't like the 'potential crash' here, I think if it's out of bounds, we should detect it early and report back on it somehow.

So, that's definitely a bug. Conceded.

gimlet said...

--     if N = 0 then  
--        if lo > hi      we have mid = sp, c = 0, a return of -sp (sp may be -ve) 

Didn't you already say this?

gimlet said...

Perhaps you may say this is all so remote because 'almost always' it suceeds

No, you've definitely found something here. 'almost always' is not good enough in this case.

gimlet said...

if A was null then any return was nonsense

"A was null" means "A = {}" ? Assuming that this is what you mean,

Yeah, I need to think about this one some more.

gimlet said...

if A was not empty was the range sp..ep valid?

Yes, that definitely needs to be checked. But what to do if there's an out-of-bounds error?

gimlet said...

Tell me, in how many ways is that function wrong?

Just one - lack of proper bounds checking.

new topic     » goto parent     » topic index » view message » categorize

3. Re: binary search with reference to append

Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?

Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?

If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions. If the function will cause a crash with bad inputs, then I would suggest that it's the caller's responsibility to ensure that it calls the function with the correct parameters.

I mean, we can fail if sequence A isn't sorted in ascending order. Whose responsibility is it to check that condition? The caller or the called function?

At the very least, however, inside the function I would assert that lo <= hi <= N else crash.

	hi = (lo > hi && N) ? N : hi -- hi could be > N   

Wait -- did Euphoria get a ternary operator when I wasn't looking? I just did a search and I couldn't find anything on it. Jim seems to be the search master, however.

There are other bugs in this code too, but I can't correct them right now.

new topic     » goto parent     » topic index » view message » categorize

4. Re: binary search with reference to append

Untested:

-- Precondition: haystack is a list sorted in ascending order. 
--               If provided, startp and endp are less than or equal to 
--               haystack length and startp is less than or equal to endp. 
-- Returns: Index of needle in haystack, or 0 if not found. 
 
function bsearch(object needle, 
                 sequence haystack, 
                 integer startp=1, 
                 integer endp=0) 
 
    integer nItems, lo, hi, midp 
    --------------------- 
    nItems = length(haystack) 
    lo = max({1, startp}) 
 
    if endp > 0 and endp <= nItems then 
        hi = min({endp, nItems}) 
    else 
        hi = nItems 
    end if 
 
    -- We should provide a useful error message here, but crash is good 
    -- enough. 
    if hi > nItems or lo > hi then abort(1) end if 
 
    --------------------- 
 
    while lo <= hi do 
        midp = floor((lo + hi) / 2) 
        switch eu:compare(needle, haystack[midp]) do 
          case -1 then hi = midp - 1 
          case 0 then return midp 
          case +1 then lo = midp + 1 
        end case 
    end while 
 
    -- Not found. 
    return 0 
 
end function 
new topic     » goto parent     » topic index » view message » categorize

5. Re: binary search with reference to append

jaygade said...
	hi = (lo > hi && N) ? N : hi -- hi could be > N   

Wait -- did Euphoria get a ternary operator when I wasn't looking? I just did a search and I couldn't find anything on it. Jim seems to be the search master, however.

Nope, we don't. That's probably meant to be the abridged version of this code:

        if lo > hi and length(haystack) > 0 then 
                hi = length(haystack) 
        end if 
new topic     » goto parent     » topic index » view message » categorize

6. Re: binary search with reference to append

I've edited my code several times but I think it's still not perfect.

This is better. Still untested.

-- Precondition: haystack is a list sorted in ascending order. 
--               If provided, startp and endp are less than or equal to 
--               haystack length and startp is less than or equal to endp. 
-- Returns: Index of needle in haystack, or 0 if not found. 
 
function bsearch(object needle, 
                 sequence haystack, 
                 integer startp=1, 
                 integer endp=0) 
 
    integer nItems, lo, hi, midp 
 
    nItems = length(haystack) 
 
    if endp = 0 then 
        hi = nItems 
    else 
        hi = endp 
    end if 
 
    if hi <= nItems and startp <= hi then 
        lo = startp 
    else 
        abort(1) 
    end if 
 
    while lo <= hi do 
        midp = floor((lo + hi) / 2) 
        switch eu:compare(needle, haystack[midp]) do 
          case -1 then hi = midp - 1 
          case 0 then return midp 
          case +1 then lo = midp + 1 
        end case 
    end while 
 
    -- Not found 
    return 0 
 
end function 

If startp > endp, but both are less than nItems, I think that either failure OR returning 0 is a valid response so long as it's documented. Failure could be better during debugging in order to catch invalid inputs when reasonable.

In this case it is reasonable to check the boundaries of the search, but it is not necessarily reasonable to check to see if the haystack is sorted.

This illustrates that the caller should validate the data before sending it to the bsearch function, and that the bsearch function should reject invalid inputs.

new topic     » goto parent     » topic index » view message » categorize

7. Re: binary search with reference to append

I think I finally got the logic right.

new topic     » goto parent     » topic index » view message » categorize

8. Re: binary search with reference to append

jaygade said...

I think I finally got the logic right.

For the usage case that you've written this for, it looks good.

However, binary_search() has an extra requirement:

-- # a negative integer, ##-i##, with ##i## between adjusted start and end 
--    points. This means that ##needle## is not in the searched slice of 
--    ##haystack##, but would be at index ##i## if it were there. 
-- # a negative integer ##-i## with ##i## out of the searched range. This 
--   means than ##needle##might be either below the start point if ##i## 
--   is below the start point, or above the end point if ##i## is. 

So it might return, say, -5, if the needle is not in the haystack but index 5 is where the needle should be sorted into.

I guess, sometimes you just can't have it all.

new topic     » goto parent     » topic index » view message » categorize

9. Re: binary search with reference to append

Okay, I didn't see the requirements in the OP. I was wondering what the original return -mid was for. That had me scratching my head.

new topic     » goto parent     » topic index » view message » categorize

10. Re: binary search with reference to append

I'd never seen this before. Can someone please enlighten me. In std/search.e I find:

public function binary_search(object needle, sequence haystack, integer start_point = 1, integer end_point = 0) 

In std/primes.e I find:

public function next_prime(integer n, object fail_signal_p = -1, atom time_out_p = 1) 
    ... 
    -- Assumes that most searches will be less than about 1000 
    if n<1009 and 1009<=list_of_primes[$] then 
        i = binary_search(n, list_of_primes, 1 ,169) 
    else 
        i = binary_search(n, list_of_primes) 
    end if 

Is that like a major win in some benchmark or something? Even if list_of_primes almost fills available memory, we're talking like 32 or 64 chops max, so my gut feeling is those "1009" tests are an overhead many more times than a saving, though I could well be wrong. Otherwise I cannot see why there is an end_point, and given that the above is all I could find, I see absolutely no reason whatsoever to have a start point.

Pete

new topic     » goto parent     » topic index » view message » categorize

11. Re: binary search with reference to append

petelomax said...

I'd never seen this before. Can someone please enlighten me.

That's not how I remember first writing the primes routine. I thought that if the next prime was a small one, a sequential search of the first 'X' would be faster than doing a binary search of a large table. I don't know why a ranged binary search is there.

new topic     » goto parent     » topic index » view message » categorize

12. Re: binary search with reference to append

DerekParnell said...
petelomax said...

I'd never seen this before. Can someone please enlighten me.

That's not how I remember first writing the primes routine. I thought that if the next prime was a small one, a sequential search of the first 'X' would be faster than doing a binary search of a large table. I don't know why a ranged binary search is there.

I attempted to trace the history of primes.e back, to find the culprit who changed this from the sequential search to a ranged search.

But, this is the earliest version I can find: http://scm.openeuphoria.org/hg/euphoria/file/d0ab9cc672b8/include/primes.e

new topic     » goto parent     » topic index » view message » categorize

13. Re: binary search with reference to append

You have misunderstood.

The issues are:

The logic of the input checks is wrong.

The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
setting mid to sp is arbitrary,
something is a bit funny at the caller's end)

The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.

Therefore

1 Calling bsearch with an empty sequence should return NOT FOUND.
No search was made.

2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.

3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.

Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.

jaygade said...

Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?

The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.

said...

Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?

Yes: there are.

said...

If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.

No. that is in contradiction of
the caller should check its inputs.
The called function is at liberty to
plough ahead or whatever.

said...

If the function will cause a crash with bad inputs, then I would suggest that it's the caller's responsibility to ensure that it calls the function with the correct parameters.

Quite so.

said...

I mean, we can fail if sequence A isn't sorted in ascending order. Whose responsibility is it to check that condition? The caller or the called function?

1 Well it would have to be the caller as otherwise there would be
no point in using binary search at all.

2 Binary search does not require a sorted array.
If you consider something like:

   A[lo] < x < A[hi]         -- lo < hi 
   m = mid(lo,hi) 
-- We have one of 
   A[m] = x 
   A[lo] < x < A[m] 
   A[m] < x < A[hi] 

and so on

Binary search is the mean value theorem. Ref:
van Gasteren, Netty; Feijen, Wim (1995).
"The Binary Search Revisited"

    nItems = length(haystack)  
  
    if endp = 0 then  
        hi = nItems  
    else  
        hi = endp  
    end if  
  
    if hi <= nItems and startp <= hi then  
        lo = startp  
    else  
        abort(1)  
    end if  
  
    while lo <= hi do  
        midp = floor((lo + hi) / 2)  
        switch eu:compare(needle, haystack[midp]) do  
          case -1 then hi = midp - 1  
          case 0 then return midp  
          case +1 then lo = midp + 1  
        end case  
    end while  
  
    -- Not found  
    return 0  
said...

If startp > endp, but both are less than nItems, I think that either failure OR returning 0 is a valid response so long as it's documented. Failure could be better during debugging in order to catch invalid inputs when reasonable.

In this case it is reasonable to check the boundaries of the search, but it is not necessarily reasonable to check to see if the haystack is sorted.

This illustrates that the caller should validate the data before sending it to the bsearch function, and that the bsearch function should reject invalid inputs.

You say the caller should validate.
Yet you still insist on bsearch rejecting invalid inputs.
Surely all bsearch needs to do is return correct results for
valid input.

function bsearch(seq A, obj x, int sp, int ep) 
   int lo = sp 
   int hi = ep 
   int m, c 
 
   if length(A) = 0 or sp > ep then return 0 end if 
 
   while lo <= hi do  
      m = floor((lo+hi)/2)  
      c = compare(x, A[m]) 
      switch c do  
      case -1 then hi = m-1  
      case  0 then return m  
      case +1 then lo = m+1  
      end switch  
    end while  
    -- Not found 
    if c = 1 then m = m+1 end if  
    if m > sp and m <= ep then  
       return -m 
    else 
       return 0               -- no position found 
    end if 

Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.

function bsearch(seq A, obj x) 
   int lo = 1 
   int hi = length(A) 
   int m, c 
 
   if hi < lo then return 0 end if 
 
   while lo <= hi do  
      m = floor((lo+hi)/2)  
      c = compare(x, A[m]) 
      switch c do  
      case -1 then hi = m-1  
      case  0 then return m  
      case +1 then lo = m+1  
      end switch  
   end while  
   -- Not found  
   if c > 0 then m = m+1 end if 
   if m > 1 and m <= length(A) then  
      return -m  
    else 
       return 0               -- no position found 
    end if 
new topic     » goto parent     » topic index » view message » categorize

14. Re: binary search with reference to append

gimlet said...

You have misunderstood.

No, I haven't.

gimlet said...

The issues are:

The logic of the input checks is wrong.

Agreed.

gimlet said...

The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,

This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.

gimlet said...

setting mid to sp is arbitrary,

Agreed. Thinking it over, for the empty sequence case, the result should laways be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.

Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.

gimlet said...

something is a bit funny at the caller's end)

Non sequitor?

gimlet said...

The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.

GIGO?

gimlet said...
jaygade said...

If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.

No. that is in contradiction of
the caller should check its inputs.
The called function is at liberty to
plough ahead or whatever.

Sounds like GIGO.

gimlet said...

Therefore

1 Calling bsearch with an empty sequence should return NOT FOUND.
No search was made.

See the jaygade point above.

gimlet said...

2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.

You might have a point here. I have to think about this some more.

Perhaps in this case, a NOTFOUND/INVALID/NOTARESULT/NOTAPOSITION return value would make sense. Like

public constant NOTAPOSITION = ""

I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.

gimlet said...

3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.

Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.

I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.

In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.

gimlet said...

Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.

lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.

gimlet said...
jaygade said...

Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?

Yes: there are.

I also concur.

gimlet said...
jaygade said...

Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?

The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.

I find it odd that you say this but then have the callee check bounds in your later examples in the same post.

gimlet said...

2 Binary search does not require a sorted array.
If you consider something like:

   A[lo] < x < A[hi]         -- lo < hi 
   m = mid(lo,hi) 
-- We have one of 
   A[m] = x 
   A[lo] < x < A[m] 
   A[m] < x < A[hi] 

and so on

That's only true if it's sorted in a particular direction. Here's a counterexample:

   -- lo = 1 
   -- hi = 3 
   -- x = 2 
   -- A = {3, 1.5, 1} 
   A[lo] < x < A[hi]         -- lo < hi 
   -- but this is not true! 
   -- A[lo] = A[1] = 3 
   -- A[hi] = A[3] = 1 
   -- 3 < 2 -- false! 
   -- 2 < 1 -- false! 
   m = mid(lo,hi) 
   -- m now equals 2 
-- We have one of 
   A[m] = x -- false, A[m] = A[2] = 1.5 != 2 
   A[lo] < x < A[m] -- also false 
   A[m] < x < A[hi] -- also false 
gimlet said...

You say the caller should validate.
Yet you still insist on bsearch rejecting invalid inputs.
Surely all bsearch needs to do is return correct results for
valid input.

But it already does! It's only the invalid input cases that are being argued!

gimlet said...

Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.

I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.

new topic     » goto parent     » topic index » view message » categorize

15. Re: binary search with reference to append

gimlet said...

You say the caller should validate.
Yet you still insist on bsearch rejecting invalid inputs.
Surely all bsearch needs to do is return correct results for
valid input.

I agree that the caller should be responsible for checking the data for validity, which is why I wrote the function to crash on bad input instead of silently continuing.

The reason for that is that when you write your first rough draft, it is easy to miss such conditions. Therefore the check in the callee is an aid to debugging -- when it crashes then it says to the programmer that there is a logic error upstream. We don't want a logic error to put our program or our output data into an unknown state with a bug which could turn intermittent and/or very difficult to track down otherwise.

new topic     » goto parent     » topic index » view message » categorize

16. Re: binary search with reference to append

I can't argue for or against sending a slice to the function. However, sp and ep ARE optional arguments which will probably not be used most of the time.

In addition, nothing prevents you from sending a slice to the function as written. It should work just fine with a slice.

So call the function with a slice and skip using sp and ep. Problem solved.

I'm not entirely convinced on the added functionality of returning a "possible" index as a negative number though. It almost seems like the job of another function, that should return a euphoria tuple sequence such as {index, FOUND} or {index, NOT_FOUND}.

The more I read about other languages and learn about tuples, the more I can relate them to Euphoria's sequences and find the long-proposed idea of the {A, B} = {val1, val2} syntax.

[EDIT] Nice with the edit history.

new topic     » goto parent     » topic index » view message » categorize

17. Re: binary search with reference to append

gimlet said...

You have misunderstood.

The issues are:

The logic of the input checks is wrong.

The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
setting mid to sp is arbitrary,
something is a bit funny at the caller's end)

The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.

I think you are exactly correct with this.

The case where the table is empty, the starting point is beyond the end of the table and the ending point is before the starting point, all should be treated as an error condition and a simple zero returned.

gimlet said...

Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.

Yes. This is a much better approach.

new topic     » goto parent     » topic index » view message » categorize

18. Re: binary search with reference to append

jimcbrown said...
gimlet said...

You have misunderstood.

No, I haven't.

I wasn't addressing that to you, more to Jaygade.
However you have not demonstrated that you understand.

jimbrown said...
gimlet said...

The issues are:

The logic of the input checks is wrong.

Agreed.

gimlet said...

The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,

This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.

I said: if you haven't searched.

1 It is not possible to search an empty sequence.
2 Therefore you can't return a position except one arbitrarily decided.
3 Therefore you should return NOT FOUND/NO POSITION.

jimbrown said...
gimlet said...

setting mid to sp is arbitrary,

Agreed. Thinking it over, for the empty sequence case, the result should always be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.

-1 is as arbitrary as any other value. Consider what the logic may have done. What relation there might be between lo and sp. -1 doesn't work any better than -sp.

jimbrown said...

Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.

Exactly. Just accept that if you don't know, a library function can't know. Arbitrary decisions are in general wrong. So accept empty sequence or sp > ep => NOT FOUND/NO POSITION

A point which you seem to have missed. If an empty sequence is passed to bsearch: besearch does not know anything about the sequence, whether it is the whole thing or a slice. If sp and ep have values bsearch cannot know anything about those values except in relation to the (empty) sequence and each other.

You are allowing impossible requirements to tie you in knots.

jimbrown said...
gimlet said...

The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.

GIGO?

No. Design by contract

jimbrown said...
gimlet said...
jaygade said...

If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.

No. that is in contradiction of
the caller should check the inputs.
The called function is at liberty to
plough ahead or whatever.

Sounds like GIGO.

Design by contract

jimbrown said...
gimlet said...

Therefore

1 Calling bsearch with an empty sequence should return NOT FOUND.
No search was made.

See the jaygade point above.

gimlet said...

2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.

You might have a point here. I have to think about this some more.

Perhaps in this case, a NOTFOUND/INVALID/NOTARESULT/NOTAPOSITION return value would make sense. Like

public constant NOTAPOSITION = ""

I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.

That is last part is silly. You can't seriously mean that.
You are saying bsearch is a mind reader.

jimbrown said...
gimlet said...

3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.

Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.

I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.

In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.

In the examples at the end I am returning 0 to stand for NOTFOUND/NOPOSITION. This functions perfectly well.

jimbrown said...
gimlet said...

Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.

lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.

Have your way if you like. My premises stand.

jimbrown said...
gimlet said...
jaygade said...

Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?

Yes: there are.

I also concur.

gimlet said...
jaygade said...

Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?

The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.

I find it odd that you say this but then have the callee check bounds in your later examples in the same post.

The boundary checking is to ensure that a valid position is returned. A position value outside of the searched range is not acceptable as it implies the value was not in the range A[sp] <= x <=A[ep]. It only happens in the case of a failed search.

jimbrown said...
gimlet said...

2 Binary search does not require a sorted array.
If you consider something like:

   A[lo] < x < A[hi]         -- lo < hi 
   m = mid(lo,hi) 
-- We have one of 
   A[m] = x 
   A[lo] < x < A[m] 
   A[m] < x < A[hi] 

and so on

That's only true if it's sorted in a particular direction. Here's a counterexample:

   -- lo = 1 
   -- hi = 3 
   -- x = 2 
   -- A = {3, 1.5, 1} 
   A[lo] < x < A[hi]         -- lo < hi 
   -- but this is not true! 
   -- A[lo] = A[1] = 3 
   -- A[hi] = A[3] = 1 
   -- 3 < 2 -- false! 
   -- 2 < 1 -- false! 
   m = mid(lo,hi) 
   -- m now equals 2 
-- We have one of 
   A[m] = x -- false, A[m] = A[2] = 1.5 != 2 
   A[lo] < x < A[m] -- also false 
   A[m] < x < A[hi] -- also false 

Not a counter example:

You invalidate the start condition A[lo] < x < A[hi]
and then claim that because binary search fails with an
invalid start condition that it would fail with valid
start conditions!

gimlet said...

Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.

said...

I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.

Well: passing a slice rather than a sequence and start and end points is cleaner, lays the responsibility for bounds checking on the caller and, loses no functionality.

One still has to deal with passing in an empty sequence.
(I don't think this is an error).

and returning a valid position value in the case x is not found.

Those issues are determinate within bsearch.

Sounds like a win to me.

Regardless of of the details, I think that what I have shown is:

1 the logic was incorrect.

2 the handling of empty sequences, sp and ep was incorrect.

3 that you, jaygade and many others have trouble with the way bsearch in its original form works.

This is not surprising as bsearch was written with assumptions about all kinds of things built in which need to untangled.

If the end result is that the library function is changed so it doesn't make assumptions I will be very happy. Just as I will be very happy when load_map() is fixed.

new topic     » goto parent     » topic index » view message » categorize

19. Re: binary search with reference to append

I have (little) opinion on how binary search is implemented in the standard library. Before this thread I didn't even know.

You just presented an interesting problem to me, as well as a theory question. Heck, your own code was some kind of weird bastardization between C and Euphoria. I've also expressed my opinion on the theory question -- that the caller should validate the data, but that the callee should crash for known bad values or ranges.

If you want to propose changes to the standard library, then propose them.

new topic     » goto parent     » topic index » view message » categorize

20. Re: binary search with reference to append

gimlet said...

Just as I will be very happy when load_map() is fixed.

What makes you think that it is (still) broken? Do you have examples to demonstrate this?

gimlet said...

The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,

jimbrown said...

This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.

gimlet said...

I said: if you haven't searched.

1 It is not possible to search an empty sequence.

I disagree. An empty sequence is the fastest to search. Just like searching an empty box.

gimlet said...

2 Therefore you can't return a position except one arbitrarily decided.

Agreed. There's no existing position to return.

gimlet said...

3 Therefore you should return NOT FOUND/NO POSITION.

Negative numbers are not valid positions. The existing function already conveys this information, one just has to check if the result is <= 0.

But, as per the jaygade point above, it also attempts to return extra information.

gimlet said...

setting mid to sp is arbitrary,

jimbrown said...

Agreed. Thinking it over, for the empty sequence case, the result should always be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.

gimlet said...

-1 is as arbitrary as any other value. Consider what the logic may have done. What relation there might be between lo and sp. -1 doesn't work any better than -sp.

It would seem to work best when sp = 1. Anyways, conceded.

jimbrown said...

Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.

gimlet said...

Exactly. Just accept that if you don't know, a library function can't know. Arbitrary decisions are in general wrong. So accept empty sequence or sp > ep => NOT FOUND/NO POSITION

Undecided != Don't know.

gimlet said...

A point which you seem to have missed. If an empty sequence is passed to bsearch: besearch does not know anything about the sequence, whether it is the whole thing or a slice.

Irrelevant. bsearch should not care about this.

gimlet said...

If sp and ep have values bsearch cannot know anything about those values except in relation to the (empty) sequence and each other.

Agreed ... but what of it?

gimlet said...

The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.

jimbrown said...

GIGO?

[quote gimlet] No. Design by contract

jaygade said...

If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.

gimlet said...

No. that is in contradiction of
the caller should check the inputs.
The called function is at liberty to
plough ahead or whatever.

jimbrown said...

Sounds like GIGO.

gimlet said...

Design by contract

No. Under design by contract, the called function should be checking those conditions using an assertion an throwing an error or something if they aren't validated.

Quoting wikipedia:

wikipedia said...

Many programming languages have facilities to make assertions like these. However, DbC considers these contracts to be so crucial to software correctness that they should be part of the design process. In effect, DbC advocates writing the assertions first.

When using contracts, a supplier should not try to verify that the contract conditions are satisfied; the general idea is that code should "fail hard", with contract verification being the safety net ... the supplier throws an exception to inform the client that the precondition has been broken,

Specifically, this

gimlet said...

The called function is at liberty to
plough ahead or whatever.

violates design by contract.

gimlet said...

2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.

jimbrown said...

I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.

gimlet said...

That is last part is silly. You can't seriously mean that.

Yes, I do.

gimlet said...

You are saying bsearch is a mind reader.

It's following certain rules to produce a result which logically follow from those rules.

gimlet said...

3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.

jimbrown said...

Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.

I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.

In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.

gimlet said...

In the examples at the end I am returning 0 to stand for NOTFOUND/NOPOSITION. This functions perfectly well.

Agreed. Other than the jaygade point (and the drop in slicing functionality), this works perfectly.

gimlet said...

Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.

jimbrown said...

lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.

gimlet said...

Have your way if you like. My premises stand.

Fine. Here are the premises:

1. Messing with parameters causes the culprit function to fail in MWP-style. 2. bsearch messes with lo/hi. 3. lo/hi are not parameters. 4. sp/ep are parameters.

How do you reach the conclusion:

1. bsearch fails in MWP-style?

jaygade said...

Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?

gimlet said...

The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.

jimbrown said...

I find it odd that you say this but then have the callee check bounds in your later examples in the same post.

gimlet said...

The boundary checking is to ensure that a valid position is returned. A position value outside of the searched range is not acceptable as it implies the value was not in the range A[sp] <= x <=A[ep]. It only happens in the case of a failed search.

Yes, this makes sense and is reasonable. However, you have not resolved the contradiction or stated that this case is one (of many?) possible exceptions to your rule:

gimlet said...

The point is that the callee checking creates problems.

gimlet said...

2 Binary search does not require a sorted array.
If you consider something like:

   A[lo] < x < A[hi]         -- lo < hi 
   m = mid(lo,hi) 
-- We have one of 
   A[m] = x 
   A[lo] < x < A[m] 
   A[m] < x < A[hi] 

and so on

jimbrown said...

That's only true if it's sorted in a particular direction. Here's a counterexample:

   -- lo = 1 
   -- hi = 3 
   -- x = 2 
   -- A = {3, 1.5, 1} 
   A[lo] < x < A[hi]         -- lo < hi 
   -- but this is not true! 
   -- A[lo] = A[1] = 3 
   -- A[hi] = A[3] = 1 
   -- 3 < 2 -- false! 
   -- 2 < 1 -- false! 
   m = mid(lo,hi) 
   -- m now equals 2 
-- We have one of 
   A[m] = x -- false, A[m] = A[2] = 1.5 != 2 
   A[lo] < x < A[m] -- also false 
   A[m] < x < A[hi] -- also false 
gimlet said...

Not a counter example:

Yes it is. It is a counter-example to what you said:

gimlet said...

2 Binary search does not require a sorted array.

gimlet said...

You invalidate the start condition A[lo] < x < A[hi]

Yes, I did that. Because that start condition is a veiled form of saying that the array must be sorted. (I guess it's possible to have a sequence that's not completely sorted, which for the right x would still work correctly, as long as all other values touched coincidently happen to have the right order. But sorting is generally the only way to guarantee this for any arbiturary sequence/array/list/etc.)

gimlet said...

and then claim that because binary search fails with an
invalid start condition that it would fail with valid
start conditions!

I never said this, I just said that binary search fails with invalid start conditions (one of which is the array being sorted). Now, I never said that binary search would fail with valid start conditions.

gimlet said...

Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.

jimbrown said...

I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.

gimlet said...

Well: passing a slice rather than a sequence and start and end points is cleaner, lays the responsibility for bounds checking on the caller

Agreed, so far.

gimlet said...

and, loses no functionality.

I disagree. bsearch is losing the functionality to do slicing on behalf of the caller, and potentially the chance to do bounds checking and the life on behalf of the caller.

You may argue that bsearch should not be responsible for this sort of functionality ... and you may be right. We already have other stdlib functions that could do this anyways.

Anyways, I figured that there was probably a good reason for bsearch to work this way, but as original author of the code seems to have changed this mind about it, I'll defer to him: http://openeuphoria.org/forum/m/124611.wc

(Actually, I'm now wondering if the main reason for binary_search() to have slice() functionality was for next_prime() - http://scm.openeuphoria.org/hg/euphoria/file/d0ab9cc672b8/include/primes.e )

gimlet said...

Regardless of of the details, I think that what I have shown is:

1 the logic was incorrect.

You have not demonstrated this, except in cases where sp/ep are invalid.

gimlet said...

2 the handling of empty sequences, sp and ep was incorrect.

Again, only in cases where sp/ep are invalid.

gimlet said...

3 that you, jaygade and many others have trouble with the way bsearch in its original form works.

Agreed. No one seems to like how the function handles the cases where sp/ep are invalid. In addition, there's some wavering on the support for the jaygade point as well.

gimlet said...

This is not surprising as bsearch was written with assumptions about all kinds of things built in which need to untangled.

If the end result is that the library function is changed so it doesn't make assumptions I will be very happy.

You are allowing impossible requirements to tie you in knots.

Actually, I think most of the disagreement stands on the jaygade point. This is a design/functionality issue rather than a logical point.

Now, I trust that there was a good reason to have it in the first place, but I'll let the original author explain: http://scm.openeuphoria.org/hg/euphoria/rev/2036d2a1151f

gimlet said...

You have misunderstood.

jimcbrown said...

No, I haven't.

gimlet said...

I wasn't addressing that to you, more to Jaygade.
However you have not demonstrated that you understand.

This is just a veiled way of stating that I do not understand.

I contend that I have demonstrated that I understand.

In fact, I could go further, and claim that, by your missteps (not understanding binary search and sorting arrays, advocating and then contradicting design by contract) that you understand less.

But this is all besides the point. You are saying that binary_search() is fine, except we should drop the slicing functionality and the jaygade point, as you think that seem to overcomplicate things.

I don't have a strong opinion either way, so I'm more than happy to let the original author make the call.

new topic     » goto parent     » topic index » view message » categorize

21. Re: binary search with reference to append

Ok .. ok .. ok ... I get the hint.

Given that we shouldn't change the API, here might be a cleaner version of binary search.

ifdef DEBUG_MODE then 
include std/error.e 
with trace 
end ifdef 
 
--** 
-- finds a "needle" in an ordered "haystack". Start and end point can be given for the search. 
-- This assumes that the sequence to search through is already in strict ascending order. 
-- 
-- Parameters: 
--      # ##needle## : an object to look for 
--      # ##haystack## : a sequence to search in 
--      # ##start_point## : an integer, the index at which to start searching. Defaults to 1. 
--      # ##end_point## : an integer, the end point of the search. Defaults to 0, ie search to end. 
-- 
-- Returns: 
-- An **integer**, either: 
-- # zero ##i##, which means that the slice specified by the ##start_point## 
-- and ##end_point## does not exist. 
-- # a positive integer ##i##, which means ##haystack[i]## equals ##needle##. 
-- # a negative integer, ##-i##, which means that the ##needle## is not in the  
-- part of the ##haystack## that was searched. The absolute value returned is  
-- an indication of where the item might have been in the search area, had it existed. 
-- If the value is the same as the starting point, then the item would have existed 
-- somewhere before the first item. If the value is after the last item then 
-- it would have existed somewhere after the last item searched. Otherwise 
-- the value is the index of where you could insert the ##needle## into the ##haystack##. 
-- 
-- Comments: 
-- * If ##end_point## is less than zero, it represents an offset from the end 
-- of ##haystack##. For example -1 is one back from the last item. 
-- * If the starting point is less than 1, it is assumed to be 1. 
-- * If the ending point is after the last item in the ##haystack##, it is assumed to be the end of the ##haystack##. 
-- * The way this function returns is very similar to what [[:db_find_key]] 
--   does. They use variants of the same algorithm. The latter is all the 
--   more efficient as ##haystack## is long. 
-- * ##haystack## is assumed to be in ascending order. Results are undefined 
--   if it is not.  
-- * If duplicate copies of ##needle## exist in the range searched on 
--   ##haystack##, the lowest index for the ##needle## is always returned. 
-- 
-- See Also: 
-- [[:find]], [[:db_find_key]] 
 
public function binary_search(object needle, sequence haystack, integer start_point = 1, integer end_point = 0) 
    integer lo, hi, mid, c 
    integer sp, ep 
 
     
    sp = start_point 
    if end_point <= 0 then 
        -- End point specified is relative to the end of the sequence. 
        ep = length(haystack) + end_point 
    else 
        ep = end_point 
    end if 
     
    if ep > length(haystack) then 
        ep = length(haystack) 
    end if 
    if sp < 1 then 
        sp = 1 
    end if 
    if sp > ep then 
        if sp = ep + 1 then 
            return -sp  -- Obviously the needle is never in an empty sequence/slice. 
        end if 
        return 0  -- Slice starts after the end. 
    end if 
     
    lo = sp 
    hi = ep 
    while lo <= hi do 
        mid = floor((lo + hi) / 2) 
        c = eu:compare(needle, haystack[mid]) 
        if c < 0 then 
            -- Needle is before current item so adjust the highest point. 
            hi = mid - 1 
        elsif c > 0 then 
            -- Needle is after current item so adjust the lowest point. 
            lo = mid + 1 
        else 
            -- Found it! Now check if its the first occurance. 
            for i = mid-1 to 1 by -1 do 
                if eu:equal(needle, haystack[i]) then 
                    mid  = i 
                end if 
            end for 
            ifdef DEBUG_MODE then 
               if mid != find(needle, haystack) then crash("ASSERT: find() failed") end if 
            end ifdef 
            return mid 
        end if 
    end while 
     
    -- return the position it would have, if inserted now 
    if c > 0 then 
        mid += 1 
    end if 
    ifdef DEBUG_MODE then 
 
        if mid > ep then 
            if mid > ep + 1 then 
                crash("ASSERT: Final index beyond haystack") 
            end if 
           if eu:compare(needle, haystack[ep]) <= 0 then crash("ASSERT: needle not found but lower than last item") end if 
        elsif mid < sp then 
            crash("ASSERT: Final index before haystack") 
        else 
           if eu:compare(needle, haystack[mid]) >= 0 then crash("ASSERT: needle not found but higher than insertion point.") end if 
           if mid > sp and eu:compare(needle, haystack[mid-1]) <= 0 then crash("ASSERT: needle not found but lower than insertion point.") end if 
             
        end if 
         
    end ifdef 
     
    return -mid 
end function 
 
ifdef DEBUG_MODE then 
include std/unittest.e 
constant haystack = "012345678ABCDEFGHIJKLMMMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 
test_equal("binary_search missing #0", -1, binary_search(' ',haystack)) 
test_equal("binary_search missing #1", -10, binary_search('9',haystack)) 
test_equal("binary_search missing #2", -1, binary_search(0,haystack)) 
test_equal("binary_search missing #3", -64, binary_search("rubbish",haystack)) 
test_equal("binary_search missing #4", -64, binary_search(9999,haystack)) 
 
test_equal("binary_search missing subset #1", -5, binary_search('0',haystack, 5, 17)) 
test_equal("binary_search missing subset #2", -18, binary_search('p',haystack, 5, 17)) 
test_equal("binary_search missing subset #3", -17, binary_search('A',haystack, 17)) 
test_equal("binary_search missing subset #4", -8, binary_search('A',haystack, 1, 7)) 
test_equal("binary_search missing subset #5", -62, binary_search('z',haystack, 5, -2)) 
test_equal("binary_search missing subset #6 empty slice", -10, binary_search('z',haystack, 10, 9)) 
 
test_equal("binary_search found first", 1, binary_search('0',haystack)) 
test_equal("binary_search found last", 63, binary_search('z',haystack)) 
test_equal("binary_search found middle#1", 32, binary_search('U',haystack)) 
test_equal("binary_search found middle#2", 33, binary_search('V',haystack)) 
test_equal("binary_search found middle#3", 22, binary_search('M',haystack)) 
test_equal("binary_search found auto sp", 32, binary_search('U',haystack, -100)) 
test_equal("binary_search found auto ep", 32, binary_search('U',haystack, 1, 100)) 
 
test_equal("binary_search found subset #1", 10, binary_search('A',haystack, 5, 17)) 
test_equal("binary_search found subset #1", 10, binary_search('A',haystack, 5, -2)) 
 
test_equal("binary_search bad slice #1", 0, binary_search('A',haystack, 17, 1)) 
 
 
test_equal("binary_search empty input", -1, binary_search('A',{})) 
 
test_equal("binary_search strings", 5, binary_search("cat",{"apple", "bat", "car", "cast", "cat", "category", "dog"})) 
 
test_report() 
end ifdef 
new topic     » goto parent     » topic index » view message » categorize

22. Re: binary search with reference to append

Derek,

This is the last I will say on this as it is getting personal.

What I objected to in the first place was the logical prelude in binary_search which:

1 was erroneous.

2 was making inferences about the call which went beyond what was reasonable.

It seems to me that the prelude code is defensive programming with a vengeance. It seems to be trying to make sense of start_points and end_points which are invalid, returning positions which do not make good sense and so on.

Your code is not correct, sorry.

I think what Jaygade wrote or the first of mine (with sp and ep) are both simpler and better.

sp = start_point  
    if end_point <= 0 then  
        -- End point specified is relative to the end of the sequence.  
        ep = length(haystack) + end_point  
    else  
        ep = end_point             -- ep=3 => length?? (G) 
    end if  
      
    if ep > length(haystack) then  -- this cannot happen (G) 
        ep = length(haystack)  
    end if  
    if sp < 1 then  
        sp = 1  
    end if  
    if sp > ep then  
        if sp = ep + 1 then        -- if ep = length??   (G) 
            return -sp  -- Obviously the needle is never in an empty sequence/slice.  
                        -- if sp > ep then why assume the slice starts at sp? 
                        -- with ep = length -sp is out of range. (G) 
        end if  
        return 0  -- Slice starts after the end.  
    end if  

Look it's hard to write this kind of code because you are forever faced with questions:

What do I do in this particular situation?,
Have I covered every possible case?,
What is the meaning of all this?
Should I even try to cover this?

new topic     » goto parent     » topic index » view message » categorize

23. Re: binary search with reference to append

gimlet said...

This is the last I will say on this as it is getting personal.

What is getting personal? I don't see that myself. I'm not offended by any of these exchanges. I welcome being challenged by my peers to do better.

gimlet said...

What I objected to in the first place was the logical prelude in binary_search which:

1 was erroneous.

2 was making inferences about the call which went beyond what was reasonable.

It seems to me that the prelude code is defensive programming with a vengeance. It seems to be trying to make sense of start_points and end_points which are invalid, returning positions which do not make good sense and so on.

Yes, I understand that these are you concerns with the current library code. However, I'm sure you also recognise that some of your arguments, like some of mine, are based on opinion rather than empirical evidence.

gimlet said...

Your code is not correct, sorry.

I thank you for your concern. I cannot see the errors in the code I've just presented, but I'd like to have them pointed out to me so I can improve my coding abilities. (I've been learning how to be a better programmer since 1972, by the way).

gimlet said...

I think what Jaygade wrote or the first of mine (with sp and ep) are both simpler and better.

Whether that code is "both simpler and better" is a point of view. In any case, the published API needs to be respected, so therefore we are stuck with having to deal with the optional parameters for starting_point and ending_point. I think that if we were writing this function for inclusion into the library now, we would not have these optional parameters. In hindsight, they are probably a poor design decision. But here we are ... we are stuck with them. So let's just manage the best we can.

gimlet said...
sp = start_point  
    if end_point <= 0 then  
        -- End point specified is relative to the end of the sequence.  
        ep = length(haystack) + end_point  
    else  
        ep = end_point             -- ep=3 => length?? (G) 
    end if  
      
    if ep > length(haystack) then  -- this cannot happen (G) 
        ep = length(haystack)  
    end if  
    if sp < 1 then  
        sp = 1  
    end if  
    if sp > ep then  
        if sp = ep + 1 then        -- if ep = length??   (G) 
            return -sp  -- Obviously the needle is never in an empty sequence/slice.  
                        -- if sp > ep then why assume the slice starts at sp? 
                        -- with ep = length -sp is out of range. (G) 
        end if  
        return 0  -- Slice starts after the end.  
    end if  

I'm assuming that the code extract above is some sort of attempt to demonstrate the errors in my code. I'm guessing this as you don't actually say what its here for. So sorry if I'm getting that wrong.

I have no idea what the comment "ep=3 => length?? " is meant to be saying to me.

Again, the comment "this cannot happen" doesn't make a lot of sense to me. As far as I can make out, it is quite possible to call this function with an end point specified that is greater than the length of the physical sequence to be searched, and I'm sure you agree with that, so I'm unsure as to the meaning of this comment.

if sp > ep then why assume the slice starts at sp? 
with ep = length -sp is out of range. (G) 

I 'assume' the slice starts at 'sp' because that what the caller told me it started at. That's the entire point of having this optional parameter --- to tell me where the slice starts.

I know that "with ep = length -sp is out of range.", and that again is the point of the code. This function is to return a negative value when the caller specified slice is valid AND the requested item in not in that slice. The magnitude of the returned value is an indication of where the item might have been located if it had of been in the slice. That is what the published API says (though not very well I have to admit).

gimlet said...

Look it's hard to write this kind of code because you are forever faced with questions:

What do I do in this particular situation?,
Have I covered every possible case?,
What is the meaning of all this?
Should I even try to cover this?

No argument here. That is why we strive to write better code and to have it peer reviewed. Just because writing good code is difficult, that is no reason to shirk our responsibility to do exactly that: to write good code.

Thank for for helping me to be a better programmer.

new topic     » goto parent     » topic index » view message » categorize

24. Re: binary search with reference to append

DerekParnell said...

I've been learning how to be a better programmer since 1972, by the way.

Wow! Your skills are older than BSD UNIX!

  • jaw and pants drop to the floor*
new topic     » goto parent     » topic index » view message » categorize

25. Re: binary search with reference to append

DerekParnell said...

I have no idea what the comment "ep=3 => length?? " is meant to be saying to me.

Could it be a typo for "ep=0 => length?? " ?

In that case, I'd read the meaning as "why the heck do you treat ep=0 to be the same as the length of the haystack, and why the heck do you treat ep=-3 to be the same as the length of the haystack minus 3 elements, or 3 elements up from the end of the haystack?" or something.

But of course, the reason we do that is straightforward and obvious.

new topic     » goto parent     » topic index » view message » categorize

26. Re: binary search with reference to append

DerekParnell said...

I think that if we were writing this function for inclusion into the library now, we would not have these optional parameters. In hindsight, they are probably a poor design decision. But here we are ... we are stuck with them.

It seems perfectly reasonable to me, if everyone else agrees, to remove them. Of course that would be with a suitable entry in the release notes and both the manual and bsearch.e containing something like

-- Compatibility Note: 
--  Prior to 4.2 bsearch() allowed the start and end points to be specified. This was agreed to be removed (http://openeuphoria.org/forum/124592.wc),  
--   if necessary replace eg res = bsearch(needle,haystack,5,10) with res = bsearch(needle,haystack[5..10]), remembering to add 4 to abs(res). 

Obviously, that is, if appropriate code changes to manage without them do not adversely affect the performance of next_prime().

Pete

new topic     » goto parent     » topic index » view message » categorize

27. Re: binary search with reference to append

petelomax said...
DerekParnell said...

I think that if we were writing this function for inclusion into the library now, we would not have these optional parameters. In hindsight, they are probably a poor design decision. But here we are ... we are stuck with them.

It seems perfectly reasonable to me, if everyone else agrees, to remove them. Of course that would be with a suitable entry in the release notes and both the manual and bsearch.e containing something like

Well, maybe. But this violates a Linus Torvald's maxim: We don't break user apis.

If it's in alpha or beta, then maybe. This made it to a bunch of 4.0.x releases, though.

Otherwise, the only time we chose to break this rule was with namespaces, and that was because we thought there was no way to make sane namespaces with the then-existing rules.

petelomax said...

remembering to add 4 to abs(res).

Hmm, actually that's a good reason to keep it, same reason we have find_from()/match_from(). I always have trouble with that math when I have to do it myself.

petelomax said...

Obviously, that is, if appropriate code changes to manage without them do not adversely affect the performance of next_prime().

I don't think we make a full shallow copy of a sequence (let alone a deep one) when we make a slice, so I doubt we'd see any performance issues here because of just this.

new topic     » goto parent     » topic index » view message » categorize

28. Re: binary search with reference to append

I'm okay with breaking user APIs; we're talking about Euphoria not Unix. Programming languages do this all the time with major releases.

The key would be to deprecate the parameters for at least one point release, then ignore them going forward. If that's what you wanted to do.

However, the function works as-is, the documentation can state that the parameters are deprecated and to pass a slice instead, and defer the code change indefinitely.

new topic     » goto parent     » topic index » view message » categorize

29. Re: binary search with reference to append

jaygade said...

I'm okay with breaking user APIs; we're talking about Euphoria not Unix. Programming languages do this all the time with major releases.

I thought the way we did things was.

X.Y.Z

Z - patchlevel release. bugfixes and new routines in stdlib.

Y - minor release. new language features, keywords, syntax, etc.

X - major release, backwards incompatibility.

So this would have to be pushed back to 5.0.0 if we follow this.

jaygade said...

The key would be to deprecate the parameters for at least one point release, then ignore them going forward. If that's what you wanted to do.

I'm not sure I like this - there's a chance that legacy code would not break, but return a logic error (it found the match, but it's outside the range of the slice, and thus should not have reported the match), raising hard to debug issues later on...

jaygade said...

However, the function works as-is, the documentation can state that the parameters are deprecated and to pass a slice instead, and defer the code change indefinitely.

Good point. This works.

new topic     » goto parent     » topic index » view message » categorize

30. Re: binary search with reference to append

jaygade said...

I'm okay with breaking user APIs; we're talking about Euphoria not Unix. Programming languages do this all the time with major releases.

The key would be to deprecate the parameters for at least one point release, then ignore them going forward. If that's what you wanted to do.

However, the function works as-is, the documentation can state that the parameters are deprecated and to pass a slice instead, and defer the code change indefinitely.

Isn't there a 'deprecated' keyword in eu4.1 for this kind of situation?

deprecate public procedure say_hello(sequence name) 
    printf(1, "Hello, %s\n", { name }) 
end procedure 

Isn't this a good reason to release 4.1 as fast as possible?

Andreas

new topic     » goto parent     » topic index » view message » categorize

31. Re: binary search with reference to append

andi49 said...

Isn't there a 'deprecated' keyword in eu4.1 for this kind of situation?

No, but we could probably add one.

andi49 said...
deprecate public procedure say_hello(sequence name) 
    printf(1, "Hello, %s\n", { name }) 
end procedure 

Isn't this a good reason to release 4.1 as fast as possible?

Andreas

Yes!

new topic     » goto parent     » topic index » view message » categorize

32. Re: binary search with reference to append

jimcbrown said...
andi49 said...

Isn't there a 'deprecated' keyword in eu4.1 for this kind of situation?

No, but we could probably add one.

It's a kind of funny. My Version of Eu4.1 has support for the 'deprecated' keyword and it works. It's also in the Doc's for 4.1.

Section: 4.2.4 Deprecation

http://euphoria.indonesianet.de/eu4zip/doc41/html/lang_decl.html#_140_deprecation

Andreas

new topic     » goto parent     » topic index » view message » categorize

33. Re: binary search with reference to append

andi49 said...
jimcbrown said...
andi49 said...

Isn't there a 'deprecated' keyword in eu4.1 for this kind of situation?

No, but we could probably add one.

It's a kind of funny. My Version of Eu4.1 has support for the 'deprecated' keyword and it works. It's also in the Doc's for 4.1.

Section: 4.2.4 Deprecation

http://euphoria.indonesianet.de/eu4zip/doc41/html/lang_decl.html#_140_deprecation

Andreas

Oops. I stand corrected.

new topic     » goto parent     » topic index » view message » categorize

34. Re: binary search with reference to append

I don't think that the deprecate keyword would work in this case however, since we're deprecating optional parameters and not an entire routine.

Unless deprecate works in the formal parameter declaration?

I would think that editing the doc comment of the function to mention that the sp and ep parameters are deprecated would be sufficient. "Don't use these, use a slice instead. These optional parameters will no longer be valid at some future point."

new topic     » goto parent     » topic index » view message » categorize

35. Re: binary search with reference to append

jimcbrown said...
gimlet said...

Just as I will be very happy when load_map() is fixed.

What makes you think that it is (still) broken? Do you have examples to demonstrate this?

Wow!

I wrote a long article about what was wrong with load_map and you fix a few warts.

Then you say this??

Jim,

You do not like what I have to say,

You disregard what Dijkstra, van Gasteren, Feijin, Simon Peyton Jones, Lennart Augustsson, Ralf Hinze, Paul Hudak, John Hughes, Erik Meijer, Philip Wadler

have to say because you know better??

Really?

new topic     » goto parent     » topic index » view message » categorize

36. Re: binary search with reference to append

gimlet said...
jimcbrown said...
gimlet said...

Just as I will be very happy when load_map() is fixed.

What makes you think that it is (still) broken? Do you have examples to demonstrate this?

Wow!

I wrote a long article about what was wrong with load_map and you fix a few warts.

Then you say this??

I made a few quick fixes to deal with all the warts I knew of. You're right that libraries shouldn't ship with broken functions.

gimlet said...

Jim,

You do not like what I have to say,

You disregard what Dijkstra, van Gasteren, Feijin, Simon Peyton Jones, Lennart Augustsson, Ralf Hinze, Paul Hudak, John Hughes, Erik Meijer, Philip Wadler

have to say because you know better??

Really?

I think Derek has a better handle on the internals of the library and he is working on a proper rewrite. I trust him to resolve any potential issues the cosmetic fixes I appled would not (e.g. any performance issues, for example). In the meantime, I applied those quick fixes to prevent the issues you reported from holding up the release.

Do you have any examples where load_map() still breaks? This is the third time I have asked you for this.

new topic     » goto parent     » topic index » view message » categorize

37. Re: binary search with reference to append

Jim,

To you no.

They are in what I wrote, if you don't want to check fine.

You fixed the few issues I pointed out to Matt. You ignored everything else. The whole thing is a disaster.

If you imagine that the function actually works then use it for something that matters to you. I certainly won't until it's fixed and probably not then as I distrust more and more of the Euphoria library functions.

I have no confidence that you want to fix things. You want to defend Euphoria against all criticism. You want to drive critics away.

So, why should I help you?

new topic     » goto parent     » topic index » view message » categorize

38. Re: binary search with reference to append

gimlet said...

You fixed the few issues I pointed out to Matt.

Like I said, I fixed the issues in functionality. Most of what you wrote is a critique of the algorithms used in the code. These I left to Derek to address.

gimlet said...

You ignored everything else. The whole thing is a disaster.

I didn't ignore anything. I just took the easy part and left the hard stuff to Derek. blink

gimlet said...

If you imagine that the function actually works then use it for something that matters to you.

I have and it works fine for what I need.

gimlet said...

I certainly won't until it's fixed and probably not then as I distrust more and more of the Euphoria library functions.

This sounds more like a personal problem than anything to do with us. Still, whatever floats your boat.

gimlet said...

I have no confidence that you want to fix things.

I applied a fix to load_map(), and then you say I don't want to fix things?

gimlet said...

You want to defend Euphoria against all criticism.

I agree with Derek here - constructive criticism is good. It improves the language and the standard library.

However, I confess that I do want to defend against unconstructive and unproductive criticism.

gimlet said...

Jim,

To you no.

They are in what I wrote, if you don't want to check fine.

So, why should I help you?

I won't ask again. Since you can't point to a single example where load_map(), given valid input, fails to produce valid output, I maintain my confidence that load_map() is not functionally broken.

Anyways, this load_map() argument is becoming quite repetitive. Unless you (or anyone else) has any new evidence, I think we should just agree to disagree and drop the subject.

gimlet said...

You want to drive critics away.

I welcome constructive criticism. Generally, I only want to drive trolls away.

new topic     » goto parent     » topic index » view message » categorize

39. Re: binary search with reference to append

gimlet said...

I wasn't addressing that to you, more to Jaygade. However you have not demonstrated that you understand.

This is just a veiled way of stating that I do not understand.

I contend that I have demonstrated that I understand.

not veiled Jim.

In fact, I could go further, and claim that, by your missteps (not understanding binary search and sorting arrays, advocating and then contradicting design by contract) that you understand less.

than??
I never said you did not understand binary search.
I did say the array does not have to be sorted.
Your 'counter-example' is no counter example.
You did not check my sources.

and yet You are right as always.

But this is all besides the point. You are saying that binary_search() is fine, except we should drop the slicing functionality and the jaygade point, as you think that seem to overcomplicate things.

and wasn't this accepted?

Jim,

If you are going to ban everyone who disagrees Euphoria will die.

You want to ban me because I doubted your qualifications. Perhaps I was wrong, perhaps I was not. Banning me doesn't prove anything other than that you can ban people (for whatever reason you provide).

new topic     » goto parent     » topic index » view message » categorize

40. Re: binary search with reference to append

gimlet said...

I never said you did not understand binary search.

Agreed. Where did that come from?

gimlet said...

I did say the array does not have to be sorted.

But it does! http://algorithms.openmymind.net/search/binarysearch.html http://en.wikipedia.org/wiki/Binary_search_algorithm

gimlet said...

Your 'counter-example' is no counter example.

Yes, it is.

jimcbrown said...

But this is all besides the point. You are saying that binary_search() is fine, except we should drop the slicing functionality and the jaygade point, as you think that seem to overcomplicate things.

gimlet said...

and wasn't this accepted?

Sort of. Everyone seems to agree with it theoretically, but due solely to backwards compatibility concerns, this change won't happen.

gimlet said...

Jim,

If you are going to ban everyone who disagrees Euphoria will die.

If I ban everyone who's ever disagreed with me? Agreed. I'd have to ban Derek and Matt and Shawn and euphoric and Jeremy and ... dang, probably the entire community. Almost everyone who has ever posted on the forum!

Of course, I have no desire to ban people who merely disagree with me on an academic subject. That would just be plain silly. So, what's your point?

gimlet said...

You want to ban me

When did I say this? I did suggest extended moderation may be in the offing, but that's not the same thing as a ban.

gimlet said...

because I doubted your qualifications.

This has nothing to do with it.

new topic     » goto parent     » topic index » view message » categorize

41. behavior of slice

jimcbrown said...

I don't think we make a full shallow copy of a sequence (let alone a deep one) when we make a slice, so I doubt we'd see any performance issues here because of just this.

I would be very surprised if this was not the case. A result of a slice operation has a distinct length, if the base pointer points inside of another sequence's base pointer's members we are not protected by ref counting scheme employed by EUPHORIA. There must be a shallow copy of the members of the passed sequence in a slice. We must create a new sequence when ever we do a slice operation.

Shawn

new topic     » goto parent     » topic index » view message » categorize

42. version policy of EUPHORIA

jimcbrown said...

I thought the way we did things was.

X.Y.Z

Z - patchlevel release. bugfixes and new routines in stdlib.

Y - minor release. new language features, keywords, syntax, etc.

X - major release, backwards incompatibility.

So this would have to be pushed back to 5.0.0 if we follow this.

This is exactly the way we have been doing things. When it comes to deciding what the required version of a library you are using, you cannot count on others to behave so well. Another dev group can decide to change the meaning of alpha channel from 1 means full transparency to 1 means full opacity in a patch version. This kind of thing happened in a game library I tried. I believe we should look to this example, as a guide of what not to do.

Patch level changes mean you may have to put namespaces on symbols that conflict with new library routines. Minor release changes mean you may have rename a symbol because it may conflict with a builtin. Major release means you will have to overhaul the code to get it up to the newest version.

Shawn

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu