1. binary search with reference to append
- Posted by gimlet Jun 13, 2014
- 2029 views
This is an example of why you should require the caller validates the parameters of a called function.
This is a summarised version of the binary_search function in /std/search.e
The notes make interesting reading:
The function messes with its parameters. The result is not pretty.
function bsearch(obj x, seq A, int sp=1, int ep=0) int N, lo, hi, mid, c --------------------- N = length(A) lo = max(sp, 1) -- lo could be > N hi = min(N, ep) -- hi could be < lo hi = (lo > hi && N) ? N : hi -- hi could be > N --------------------- mid = sp -- this is unwise c = 0 -- the match (set to 0 - found) -- at this point consider -- if lo > hi we have mid = sp, c = 0, a return of -sp (sp may be -ve) -- if N = 0 then -- if lo <= hi we have a crash. -- if lo > hi we have mid = sp, c = 0, a return of -sp (sp may be -ve) -- if N != 0 then -- if lo <= hi then the loop executes. -- if lo > N we have a crash. -- if lo <= N then -- if hi > N we have a potential crash. -- else we are OK. -- -- Perhaps this is complicated enough for you to believe there is a problem! -- -- Perhaps you may say this is all so remote because 'almost always' it suceeds -- (code for only a countable infinty of failures). while lo <= hi do mid = floor((lo + hi) / 2) c = eu:compare(needle, haystack[mid]) if c < 0 then hi = mid - 1 elsif c > 0 then lo = mid + 1 else return mid end if end while -- return the position it would have, if inserted now if c > 0 then mid += 1 end if return -mid -- if A was null then any return was nonsense -- if A was not empty was the range sp..ep valid? end function
Tell me, in how many ways is that function wrong?
2. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 13, 2014
- 1992 views
This is an example of why you should require the caller validates the parameters of a called function.
While I agree with "you should require the caller validates the parameters of a called function", I don't think this is a good example.
The function messes with its parameters.
No, it does not modify it's parameters.
lo = max(sp, 1) -- lo could be > N hi = min(N, ep) -- hi could be < lo hi = (lo > hi && N) ? N : hi -- hi could be > N
Agreed, a sanity bounds check here would not hurt - but if sp or ep is out of bounds, it's an error. (Ideally, it'd be an exception or a flag, not an uncatchable run-time error.)
mid = sp -- this is unwise
That is harmless, as in the case where lo <= hi, mid is immediately reset without the old value being used. So this is for the case where lo > hi, in which case the function returns immediately, stating that the record would be inserted into the first position. (Actually, maybe this only makes sense when dealing with the empty sequence?)
c = 0 -- the match (set to 0 - found)
Again, harmless. In the case where lo <= hi, c is immediately reset without the old value being used as well. Where lo > hi, c=0 is still harmless, it just means we don't adjust the offset returned by 1 (as we would do in the case where lo<=hi but the item was not found and the last object in the haystack search range < needle).
-- at this point consider -- if lo > hi we have mid = sp, c = 0, a return of -sp (sp may be -ve)
What the heck is ve? Anyways, this case is designed for the empty sequence. I concede that if sp and ep are both longer than the length of the sequence, the return value here may be weird, causing logic errors or a run-time crash later on.
-- if N = 0 then -- if lo <= hi we have a crash.
If both sp and ep are out of bounds. Conceded. Maybe this is the right thing to do, though, since we normally crash on out of bounds errors and it's better to fail early.
-- if N != 0 then -- if lo > N we have a crash. -- if lo <= N then -- if hi > N we have a potential crash.
So if either sp or ep are out of bounds, we could have a crash. I don't like the 'potential crash' here, I think if it's out of bounds, we should detect it early and report back on it somehow.
So, that's definitely a bug. Conceded.
-- if N = 0 then -- if lo > hi we have mid = sp, c = 0, a return of -sp (sp may be -ve)
Didn't you already say this?
Perhaps you may say this is all so remote because 'almost always' it suceeds
No, you've definitely found something here. 'almost always' is not good enough in this case.
if A was null then any return was nonsense
"A was null" means "A = {}" ? Assuming that this is what you mean,
Yeah, I need to think about this one some more.
if A was not empty was the range sp..ep valid?
Yes, that definitely needs to be checked. But what to do if there's an out-of-bounds error?
Tell me, in how many ways is that function wrong?
Just one - lack of proper bounds checking.
3. Re: binary search with reference to append
- Posted by jaygade Jun 13, 2014
- 1965 views
Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?
Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?
If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions. If the function will cause a crash with bad inputs, then I would suggest that it's the caller's responsibility to ensure that it calls the function with the correct parameters.
I mean, we can fail if sequence A isn't sorted in ascending order. Whose responsibility is it to check that condition? The caller or the called function?
At the very least, however, inside the function I would assert that lo <= hi <= N else crash.
hi = (lo > hi && N) ? N : hi -- hi could be > N
Wait -- did Euphoria get a ternary operator when I wasn't looking? I just did a search and I couldn't find anything on it. Jim seems to be the search master, however.
There are other bugs in this code too, but I can't correct them right now.
4. Re: binary search with reference to append
- Posted by jaygade Jun 13, 2014
- 1971 views
Untested:
-- Precondition: haystack is a list sorted in ascending order. -- If provided, startp and endp are less than or equal to -- haystack length and startp is less than or equal to endp. -- Returns: Index of needle in haystack, or 0 if not found. function bsearch(object needle, sequence haystack, integer startp=1, integer endp=0) integer nItems, lo, hi, midp --------------------- nItems = length(haystack) lo = max({1, startp}) if endp > 0 and endp <= nItems then hi = min({endp, nItems}) else hi = nItems end if -- We should provide a useful error message here, but crash is good -- enough. if hi > nItems or lo > hi then abort(1) end if --------------------- while lo <= hi do midp = floor((lo + hi) / 2) switch eu:compare(needle, haystack[midp]) do case -1 then hi = midp - 1 case 0 then return midp case +1 then lo = midp + 1 end case end while -- Not found. return 0 end function
5. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 13, 2014
- 1974 views
hi = (lo > hi && N) ? N : hi -- hi could be > N
Wait -- did Euphoria get a ternary operator when I wasn't looking? I just did a search and I couldn't find anything on it. Jim seems to be the search master, however.
Nope, we don't. That's probably meant to be the abridged version of this code:
if lo > hi and length(haystack) > 0 then hi = length(haystack) end if
6. Re: binary search with reference to append
- Posted by jaygade Jun 13, 2014
- 1963 views
I've edited my code several times but I think it's still not perfect.
This is better. Still untested.
-- Precondition: haystack is a list sorted in ascending order. -- If provided, startp and endp are less than or equal to -- haystack length and startp is less than or equal to endp. -- Returns: Index of needle in haystack, or 0 if not found. function bsearch(object needle, sequence haystack, integer startp=1, integer endp=0) integer nItems, lo, hi, midp nItems = length(haystack) if endp = 0 then hi = nItems else hi = endp end if if hi <= nItems and startp <= hi then lo = startp else abort(1) end if while lo <= hi do midp = floor((lo + hi) / 2) switch eu:compare(needle, haystack[midp]) do case -1 then hi = midp - 1 case 0 then return midp case +1 then lo = midp + 1 end case end while -- Not found return 0 end function
If startp > endp, but both are less than nItems, I think that either failure OR returning 0 is a valid response so long as it's documented. Failure could be better during debugging in order to catch invalid inputs when reasonable.
In this case it is reasonable to check the boundaries of the search, but it is not necessarily reasonable to check to see if the haystack is sorted.
This illustrates that the caller should validate the data before sending it to the bsearch function, and that the bsearch function should reject invalid inputs.
7. Re: binary search with reference to append
- Posted by jaygade Jun 13, 2014
- 1946 views
I think I finally got the logic right.
8. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 13, 2014
- 1947 views
I think I finally got the logic right.
For the usage case that you've written this for, it looks good.
However, binary_search() has an extra requirement:
-- # a negative integer, ##-i##, with ##i## between adjusted start and end -- points. This means that ##needle## is not in the searched slice of -- ##haystack##, but would be at index ##i## if it were there. -- # a negative integer ##-i## with ##i## out of the searched range. This -- means than ##needle##might be either below the start point if ##i## -- is below the start point, or above the end point if ##i## is.
So it might return, say, -5, if the needle is not in the haystack but index 5 is where the needle should be sorted into.
I guess, sometimes you just can't have it all.
9. Re: binary search with reference to append
- Posted by jaygade Jun 13, 2014
- 1954 views
Okay, I didn't see the requirements in the OP. I was wondering what the original return -mid was for. That had me scratching my head.
10. Re: binary search with reference to append
- Posted by petelomax Jun 13, 2014
- 1916 views
I'd never seen this before. Can someone please enlighten me. In std/search.e I find:
public function binary_search(object needle, sequence haystack, integer start_point = 1, integer end_point = 0)
In std/primes.e I find:
public function next_prime(integer n, object fail_signal_p = -1, atom time_out_p = 1) ... -- Assumes that most searches will be less than about 1000 if n<1009 and 1009<=list_of_primes[$] then i = binary_search(n, list_of_primes, 1 ,169) else i = binary_search(n, list_of_primes) end if
Is that like a major win in some benchmark or something? Even if list_of_primes almost fills available memory, we're talking like 32 or 64 chops max, so my gut feeling is those "1009" tests are an overhead many more times than a saving, though I could well be wrong. Otherwise I cannot see why there is an end_point, and given that the above is all I could find, I see absolutely no reason whatsoever to have a start point.
Pete
11. Re: binary search with reference to append
- Posted by DerekParnell (admin) Jun 14, 2014
- 1917 views
I'd never seen this before. Can someone please enlighten me.
That's not how I remember first writing the primes routine. I thought that if the next prime was a small one, a sequential search of the first 'X' would be faster than doing a binary search of a large table. I don't know why a ranged binary search is there.
12. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 14, 2014
- 1873 views
I'd never seen this before. Can someone please enlighten me.
That's not how I remember first writing the primes routine. I thought that if the next prime was a small one, a sequential search of the first 'X' would be faster than doing a binary search of a large table. I don't know why a ranged binary search is there.
I attempted to trace the history of primes.e back, to find the culprit who changed this from the sequential search to a ranged search.
But, this is the earliest version I can find: http://scm.openeuphoria.org/hg/euphoria/file/d0ab9cc672b8/include/primes.e
13. Re: binary search with reference to append
- Posted by gimlet Jun 14, 2014
- 1889 views
You have misunderstood.
The issues are:
The logic of the input checks is wrong.
The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
setting mid to sp is arbitrary,
something is a bit funny at the caller's end)
The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.
Therefore
1 Calling bsearch with an empty sequence should return NOT FOUND.
No search was made.
2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.
3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.
Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.
Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?
The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.
Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?
Yes: there are.
If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.
No. that is in contradiction of
the caller should check its inputs.
The called function is at liberty to
plough ahead or whatever.
If the function will cause a crash with bad inputs, then I would suggest that it's the caller's responsibility to ensure that it calls the function with the correct parameters.
Quite so.
I mean, we can fail if sequence A isn't sorted in ascending order. Whose responsibility is it to check that condition? The caller or the called function?
1 Well it would have to be the caller as otherwise there would be
no point in using binary search at all.
2 Binary search does not require a sorted array.
If you consider something like:
A[lo] < x < A[hi] -- lo < hi m = mid(lo,hi) -- We have one of A[m] = x A[lo] < x < A[m] A[m] < x < A[hi]
and so on
Binary search is the mean value theorem. Ref:
van Gasteren, Netty; Feijen, Wim (1995).
"The Binary Search Revisited"
nItems = length(haystack) if endp = 0 then hi = nItems else hi = endp end if if hi <= nItems and startp <= hi then lo = startp else abort(1) end if while lo <= hi do midp = floor((lo + hi) / 2) switch eu:compare(needle, haystack[midp]) do case -1 then hi = midp - 1 case 0 then return midp case +1 then lo = midp + 1 end case end while -- Not found return 0
If startp > endp, but both are less than nItems, I think that either failure OR returning 0 is a valid response so long as it's documented. Failure could be better during debugging in order to catch invalid inputs when reasonable.
In this case it is reasonable to check the boundaries of the search, but it is not necessarily reasonable to check to see if the haystack is sorted.
This illustrates that the caller should validate the data before sending it to the bsearch function, and that the bsearch function should reject invalid inputs.
You say the caller should validate.
Yet you still insist on bsearch rejecting invalid inputs.
Surely all bsearch needs to do is return correct results for
valid input.
function bsearch(seq A, obj x, int sp, int ep) int lo = sp int hi = ep int m, c if length(A) = 0 or sp > ep then return 0 end if while lo <= hi do m = floor((lo+hi)/2) c = compare(x, A[m]) switch c do case -1 then hi = m-1 case 0 then return m case +1 then lo = m+1 end switch end while -- Not found if c = 1 then m = m+1 end if if m > sp and m <= ep then return -m else return 0 -- no position found end if
Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.
function bsearch(seq A, obj x) int lo = 1 int hi = length(A) int m, c if hi < lo then return 0 end if while lo <= hi do m = floor((lo+hi)/2) c = compare(x, A[m]) switch c do case -1 then hi = m-1 case 0 then return m case +1 then lo = m+1 end switch end while -- Not found if c > 0 then m = m+1 end if if m > 1 and m <= length(A) then return -m else return 0 -- no position found end if
14. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 14, 2014
- 1851 views
You have misunderstood.
No, I haven't.
The issues are:
The logic of the input checks is wrong.
Agreed.
The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.
setting mid to sp is arbitrary,
Agreed. Thinking it over, for the empty sequence case, the result should laways be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.
Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.
something is a bit funny at the caller's end)
Non sequitor?
The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.
GIGO?
If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.
No. that is in contradiction of
the caller should check its inputs.
The called function is at liberty to
plough ahead or whatever.
Sounds like GIGO.
Therefore
1 Calling bsearch with an empty sequence should return NOT FOUND.
No search was made.
See the jaygade point above.
2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.
You might have a point here. I have to think about this some more.
Perhaps in this case, a NOTFOUND/INVALID/NOTARESULT/NOTAPOSITION return value would make sense. Like
public constant NOTAPOSITION = ""
I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.
3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.
Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.
I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.
In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.
Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.
lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.
Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?
Yes: there are.
I also concur.
Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?
The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.
I find it odd that you say this but then have the callee check bounds in your later examples in the same post.
2 Binary search does not require a sorted array.
If you consider something like:
A[lo] < x < A[hi] -- lo < hi m = mid(lo,hi) -- We have one of A[m] = x A[lo] < x < A[m] A[m] < x < A[hi]
and so on
That's only true if it's sorted in a particular direction. Here's a counterexample:
-- lo = 1 -- hi = 3 -- x = 2 -- A = {3, 1.5, 1} A[lo] < x < A[hi] -- lo < hi -- but this is not true! -- A[lo] = A[1] = 3 -- A[hi] = A[3] = 1 -- 3 < 2 -- false! -- 2 < 1 -- false! m = mid(lo,hi) -- m now equals 2 -- We have one of A[m] = x -- false, A[m] = A[2] = 1.5 != 2 A[lo] < x < A[m] -- also false A[m] < x < A[hi] -- also false
You say the caller should validate.
Yet you still insist on bsearch rejecting invalid inputs.
Surely all bsearch needs to do is return correct results for
valid input.
But it already does! It's only the invalid input cases that are being argued!
Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.
I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.
15. Re: binary search with reference to append
- Posted by jaygade Jun 14, 2014
- 1822 views
You say the caller should validate.
Yet you still insist on bsearch rejecting invalid inputs.
Surely all bsearch needs to do is return correct results for
valid input.
I agree that the caller should be responsible for checking the data for validity, which is why I wrote the function to crash on bad input instead of silently continuing.
The reason for that is that when you write your first rough draft, it is easy to miss such conditions. Therefore the check in the callee is an aid to debugging -- when it crashes then it says to the programmer that there is a logic error upstream. We don't want a logic error to put our program or our output data into an unknown state with a bug which could turn intermittent and/or very difficult to track down otherwise.
16. Re: binary search with reference to append
- Posted by jaygade Jun 14, 2014
- 1827 views
I can't argue for or against sending a slice to the function. However, sp and ep ARE optional arguments which will probably not be used most of the time.
In addition, nothing prevents you from sending a slice to the function as written. It should work just fine with a slice.
So call the function with a slice and skip using sp and ep. Problem solved.
I'm not entirely convinced on the added functionality of returning a "possible" index as a negative number though. It almost seems like the job of another function, that should return a euphoria tuple sequence such as {index, FOUND} or {index, NOT_FOUND}.
The more I read about other languages and learn about tuples, the more I can relate them to Euphoria's sequences and find the long-proposed idea of the {A, B} = {val1, val2} syntax.
[EDIT] Nice with the edit history.
17. Re: binary search with reference to append
- Posted by DerekParnell (admin) Jun 14, 2014
- 1875 views
You have misunderstood.
The issues are:
The logic of the input checks is wrong.
The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
setting mid to sp is arbitrary,
something is a bit funny at the caller's end)
The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.
I think you are exactly correct with this.
The case where the table is empty, the starting point is beyond the end of the table and the ending point is before the starting point, all should be treated as an error condition and a simple zero returned.
Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.
Yes. This is a much better approach.
18. Re: binary search with reference to append
- Posted by gimlet Jun 14, 2014
- 1841 views
You have misunderstood.
No, I haven't.
I wasn't addressing that to you, more to Jaygade.
However you have not demonstrated that you understand.
The issues are:
The logic of the input checks is wrong.
Agreed.
The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.
I said: if you haven't searched.
1 It is not possible to search an empty sequence.
2 Therefore you can't return a position except one arbitrarily decided.
3 Therefore you should return NOT FOUND/NO POSITION.
setting mid to sp is arbitrary,
Agreed. Thinking it over, for the empty sequence case, the result should always be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.
-1 is as arbitrary as any other value. Consider what the logic may have done. What relation there might be between lo and sp. -1 doesn't work any better than -sp.
Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.
Exactly. Just accept that if you don't know, a library function can't know. Arbitrary decisions are in general wrong. So accept empty sequence or sp > ep => NOT FOUND/NO POSITION
A point which you seem to have missed. If an empty sequence is passed to bsearch: besearch does not know anything about the sequence, whether it is the whole thing or a slice. If sp and ep have values bsearch cannot know anything about those values except in relation to the (empty) sequence and each other.
You are allowing impossible requirements to tie you in knots.
The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.
GIGO?
No. Design by contract
If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.
No. that is in contradiction of
the caller should check the inputs.
The called function is at liberty to
plough ahead or whatever.
Sounds like GIGO.
Design by contract
Therefore
1 Calling bsearch with an empty sequence should return NOT FOUND.
No search was made.
See the jaygade point above.
2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.
You might have a point here. I have to think about this some more.
Perhaps in this case, a NOTFOUND/INVALID/NOTARESULT/NOTAPOSITION return value would make sense. Like
public constant NOTAPOSITION = ""
I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.
That is last part is silly. You can't seriously mean that.
You are saying bsearch is a mind reader.
3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.
Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.
I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.
In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.
In the examples at the end I am returning 0 to stand for NOTFOUND/NOPOSITION. This functions perfectly well.
Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.
lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.
Have your way if you like. My premises stand.
Are there any cases where wrong parameters will return erroneous or indeterminate information instead of crashing?
Yes: there are.
I also concur.
Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?
The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.
I find it odd that you say this but then have the callee check bounds in your later examples in the same post.
The boundary checking is to ensure that a valid position is returned. A position value outside of the searched range is not acceptable as it implies the value was not in the range A[sp] <= x <=A[ep]. It only happens in the case of a failed search.
2 Binary search does not require a sorted array.
If you consider something like:
A[lo] < x < A[hi] -- lo < hi m = mid(lo,hi) -- We have one of A[m] = x A[lo] < x < A[m] A[m] < x < A[hi]
and so on
That's only true if it's sorted in a particular direction. Here's a counterexample:
-- lo = 1 -- hi = 3 -- x = 2 -- A = {3, 1.5, 1} A[lo] < x < A[hi] -- lo < hi -- but this is not true! -- A[lo] = A[1] = 3 -- A[hi] = A[3] = 1 -- 3 < 2 -- false! -- 2 < 1 -- false! m = mid(lo,hi) -- m now equals 2 -- We have one of A[m] = x -- false, A[m] = A[2] = 1.5 != 2 A[lo] < x < A[m] -- also false A[m] < x < A[hi] -- also false
Not a counter example:
You invalidate the start condition A[lo] < x < A[hi]
and then claim that because binary search fails with an
invalid start condition that it would fail with valid
start conditions!
Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.
I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.
Well: passing a slice rather than a sequence and start and end points is cleaner, lays the responsibility for bounds checking on the caller and, loses no functionality.
One still has to deal with passing in an empty sequence.
(I don't think this is an error).
and returning a valid position value in the case x is not found.
Those issues are determinate within bsearch.
Sounds like a win to me.
Regardless of of the details, I think that what I have shown is:
1 the logic was incorrect.
2 the handling of empty sequences, sp and ep was incorrect.
3 that you, jaygade and many others have trouble with the way bsearch in its original form works.
This is not surprising as bsearch was written with assumptions about all kinds of things built in which need to untangled.
If the end result is that the library function is changed so it doesn't make assumptions I will be very happy. Just as I will be very happy when load_map() is fixed.
19. Re: binary search with reference to append
- Posted by jaygade Jun 15, 2014
- 1826 views
I have (little) opinion on how binary search is implemented in the standard library. Before this thread I didn't even know.
You just presented an interesting problem to me, as well as a theory question. Heck, your own code was some kind of weird bastardization between C and Euphoria. I've also expressed my opinion on the theory question -- that the caller should validate the data, but that the callee should crash for known bad values or ranges.
If you want to propose changes to the standard library, then propose them.
20. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 15, 2014
- 1996 views
Just as I will be very happy when load_map() is fixed.
What makes you think that it is (still) broken? Do you have examples to demonstrate this?
The handling of empty sequences is wrong.
(if you haven't searched you haven't got a position,
This is complicated by the requirement to return a "potential" position for the new item. I'll call this the jaygade point, in honor of the recent work jaygade has done in this area.
I said: if you haven't searched.
1 It is not possible to search an empty sequence.
I disagree. An empty sequence is the fastest to search. Just like searching an empty box.
2 Therefore you can't return a position except one arbitrarily decided.
Agreed. There's no existing position to return.
3 Therefore you should return NOT FOUND/NO POSITION.
Negative numbers are not valid positions. The existing function already conveys this information, one just has to check if the result is <= 0.
But, as per the jaygade point above, it also attempts to return extra information.
setting mid to sp is arbitrary,
Agreed. Thinking it over, for the empty sequence case, the result should always be -1, as any item inserted into an empty sequence will be inserted into the first position. The way the default parameters are set up, this is what actually happens, but it could be made more explicit I suppose.
-1 is as arbitrary as any other value. Consider what the logic may have done. What relation there might be between lo and sp. -1 doesn't work any better than -sp.
It would seem to work best when sp = 1. Anyways, conceded.
Now, if the user specifies an empty sequence and an invalid range (e.g. 5..9) then it gets weirder. I'm still not sure what to do here.
Exactly. Just accept that if you don't know, a library function can't know. Arbitrary decisions are in general wrong. So accept empty sequence or sp > ep => NOT FOUND/NO POSITION
Undecided != Don't know.
A point which you seem to have missed. If an empty sequence is passed to bsearch: besearch does not know anything about the sequence, whether it is the whole thing or a slice.
Irrelevant. bsearch should not care about this.
If sp and ep have values bsearch cannot know anything about those values except in relation to the (empty) sequence and each other.
Agreed ... but what of it?
The principles of design involved are:
caller contracts to provide valid input.
called contracts to provide valid output for valid input.
GIGO?
[quote gimlet] No. Design by contract
If the called function can return bad output for bad inputs, then the called function should be responsible for checking those conditions.
No. that is in contradiction of
the caller should check the inputs.
The called function is at liberty to
plough ahead or whatever.
Sounds like GIGO.
Design by contract
No. Under design by contract, the called function should be checking those conditions using an assertion an throwing an error or something if they aren't validated.
Quoting wikipedia:
Many programming languages have facilities to make assertions like these. However, DbC considers these contracts to be so crucial to software correctness that they should be part of the design process. In effect, DbC advocates writing the assertions first.
When using contracts, a supplier should not try to verify that the contract conditions are satisfied; the general idea is that code should "fail hard", with contract verification being the safety net ... the supplier throws an exception to inform the client that the precondition has been broken,
Specifically, this
The called function is at liberty to
plough ahead or whatever.
violates design by contract.
2 Calling bsearch with sp > ep is the same as calling with an
empty sequence.
Returning -sp is simply wrong.
I see where the idea to return -sp came from though. If we return -1 for an empty sequence searching over the entire range, then conceptually {}[5..4] is still an empty sequence, and returning -sp (or -5 in this example) just signifies that if you filled the sequence up to the starting point of the range, then you'd want to insert this item at position 5.
That is last part is silly. You can't seriously mean that.
Yes, I do.
You are saying bsearch is a mind reader.
It's following certain rules to produce a result which logically follow from those rules.
3 Calling bsearch with out of bounds sp, ep should result in
'whatever'. It could crash return a value.
It's the caller's problem.
Again you have a good point here. If only for one minor hiccup, I'd completely agree with you here.
I hate how euphoria's error handling works. No way to catch exceptions. It's embarrassing. Hence, I like to promote the idea that the stdlib return a INVALID_RESULT in place of crashing.
In lieu of being able to throw a catchable ArrayOutOfBoundsException here, I think I've settled on favoring returning NOTFOUND/NOTAPOSITION/INVALIDPARAMETERS or something in this case.
In the examples at the end I am returning 0 to stand for NOTFOUND/NOPOSITION. This functions perfectly well.
Agreed. Other than the jaygade point (and the drop in slicing functionality), this works perfectly.
Most of what I wrote was itemising how things go wrong if bsearch messes with its parameters. And fiddling lo and hi in arbitrary ways is most definitely messing even though sp and ep are not changed.
lo/hi are not parameters. sp and ep are parameters. Thus your basic premises are wrong.
Have your way if you like. My premises stand.
Fine. Here are the premises:
1. Messing with parameters causes the culprit function to fail in MWP-style. 2. bsearch messes with lo/hi. 3. lo/hi are not parameters. 4. sp/ep are parameters.
How do you reach the conclusion:
1. bsearch fails in MWP-style?
Did you mean the caller should check bounds (which is true, IMO) or that the called function should check bounds (which seems to be your example)?
The caller should check.
Here the callee is checking.
The point is that the callee checking creates problems.
I find it odd that you say this but then have the callee check bounds in your later examples in the same post.
The boundary checking is to ensure that a valid position is returned. A position value outside of the searched range is not acceptable as it implies the value was not in the range A[sp] <= x <=A[ep]. It only happens in the case of a failed search.
Yes, this makes sense and is reasonable. However, you have not resolved the contradiction or stated that this case is one (of many?) possible exceptions to your rule:
The point is that the callee checking creates problems.
2 Binary search does not require a sorted array.
If you consider something like:
A[lo] < x < A[hi] -- lo < hi m = mid(lo,hi) -- We have one of A[m] = x A[lo] < x < A[m] A[m] < x < A[hi]
and so on
That's only true if it's sorted in a particular direction. Here's a counterexample:
-- lo = 1 -- hi = 3 -- x = 2 -- A = {3, 1.5, 1} A[lo] < x < A[hi] -- lo < hi -- but this is not true! -- A[lo] = A[1] = 3 -- A[hi] = A[3] = 1 -- 3 < 2 -- false! -- 2 < 1 -- false! m = mid(lo,hi) -- m now equals 2 -- We have one of A[m] = x -- false, A[m] = A[2] = 1.5 != 2 A[lo] < x < A[m] -- also false A[m] < x < A[hi] -- also false
Not a counter example:
Yes it is. It is a counter-example to what you said:
2 Binary search does not require a sorted array.
You invalidate the start condition A[lo] < x < A[hi]
Yes, I did that. Because that start condition is a veiled form of saying that the array must be sorted. (I guess it's possible to have a sequence that's not completely sorted, which for the right x would still work correctly, as long as all other values touched coincidently happen to have the right order. But sorting is generally the only way to guarantee this for any arbiturary sequence/array/list/etc.)
and then claim that because binary search fails with an
invalid start condition that it would fail with valid
start conditions!
I never said this, I just said that binary search fails with invalid start conditions (one of which is the array being sorted). Now, I never said that binary search would fail with valid start conditions.
Even simpler would be to drop parameters sp, ep altogether
and pass the slice A[sp..ep] and x.
I wonder if you have a legitmate point, or you are simply just trying to get functionality removed from the stdlib.
Well: passing a slice rather than a sequence and start and end points is cleaner, lays the responsibility for bounds checking on the caller
Agreed, so far.
and, loses no functionality.
I disagree. bsearch is losing the functionality to do slicing on behalf of the caller, and potentially the chance to do bounds checking and the life on behalf of the caller.
You may argue that bsearch should not be responsible for this sort of functionality ... and you may be right. We already have other stdlib functions that could do this anyways.
Anyways, I figured that there was probably a good reason for bsearch to work this way, but as original author of the code seems to have changed this mind about it, I'll defer to him: http://openeuphoria.org/forum/m/124611.wc
(Actually, I'm now wondering if the main reason for binary_search() to have slice() functionality was for next_prime() - http://scm.openeuphoria.org/hg/euphoria/file/d0ab9cc672b8/include/primes.e )
Regardless of of the details, I think that what I have shown is:
1 the logic was incorrect.
You have not demonstrated this, except in cases where sp/ep are invalid.
2 the handling of empty sequences, sp and ep was incorrect.
Again, only in cases where sp/ep are invalid.
3 that you, jaygade and many others have trouble with the way bsearch in its original form works.
Agreed. No one seems to like how the function handles the cases where sp/ep are invalid. In addition, there's some wavering on the support for the jaygade point as well.
This is not surprising as bsearch was written with assumptions about all kinds of things built in which need to untangled.
If the end result is that the library function is changed so it doesn't make assumptions I will be very happy.
You are allowing impossible requirements to tie you in knots.
Actually, I think most of the disagreement stands on the jaygade point. This is a design/functionality issue rather than a logical point.
Now, I trust that there was a good reason to have it in the first place, but I'll let the original author explain: http://scm.openeuphoria.org/hg/euphoria/rev/2036d2a1151f
You have misunderstood.
No, I haven't.
I wasn't addressing that to you, more to Jaygade.
However you have not demonstrated that you understand.
This is just a veiled way of stating that I do not understand.
I contend that I have demonstrated that I understand.
In fact, I could go further, and claim that, by your missteps (not understanding binary search and sorting arrays, advocating and then contradicting design by contract) that you understand less.
But this is all besides the point. You are saying that binary_search() is fine, except we should drop the slicing functionality and the jaygade point, as you think that seem to overcomplicate things.
I don't have a strong opinion either way, so I'm more than happy to let the original author make the call.
21. Re: binary search with reference to append
- Posted by DerekParnell (admin) Jun 15, 2014
- 1862 views
Ok .. ok .. ok ... I get the hint.
Given that we shouldn't change the API, here might be a cleaner version of binary search.
ifdef DEBUG_MODE then include std/error.e with trace end ifdef --** -- finds a "needle" in an ordered "haystack". Start and end point can be given for the search. -- This assumes that the sequence to search through is already in strict ascending order. -- -- Parameters: -- # ##needle## : an object to look for -- # ##haystack## : a sequence to search in -- # ##start_point## : an integer, the index at which to start searching. Defaults to 1. -- # ##end_point## : an integer, the end point of the search. Defaults to 0, ie search to end. -- -- Returns: -- An **integer**, either: -- # zero ##i##, which means that the slice specified by the ##start_point## -- and ##end_point## does not exist. -- # a positive integer ##i##, which means ##haystack[i]## equals ##needle##. -- # a negative integer, ##-i##, which means that the ##needle## is not in the -- part of the ##haystack## that was searched. The absolute value returned is -- an indication of where the item might have been in the search area, had it existed. -- If the value is the same as the starting point, then the item would have existed -- somewhere before the first item. If the value is after the last item then -- it would have existed somewhere after the last item searched. Otherwise -- the value is the index of where you could insert the ##needle## into the ##haystack##. -- -- Comments: -- * If ##end_point## is less than zero, it represents an offset from the end -- of ##haystack##. For example -1 is one back from the last item. -- * If the starting point is less than 1, it is assumed to be 1. -- * If the ending point is after the last item in the ##haystack##, it is assumed to be the end of the ##haystack##. -- * The way this function returns is very similar to what [[:db_find_key]] -- does. They use variants of the same algorithm. The latter is all the -- more efficient as ##haystack## is long. -- * ##haystack## is assumed to be in ascending order. Results are undefined -- if it is not. -- * If duplicate copies of ##needle## exist in the range searched on -- ##haystack##, the lowest index for the ##needle## is always returned. -- -- See Also: -- [[:find]], [[:db_find_key]] public function binary_search(object needle, sequence haystack, integer start_point = 1, integer end_point = 0) integer lo, hi, mid, c integer sp, ep sp = start_point if end_point <= 0 then -- End point specified is relative to the end of the sequence. ep = length(haystack) + end_point else ep = end_point end if if ep > length(haystack) then ep = length(haystack) end if if sp < 1 then sp = 1 end if if sp > ep then if sp = ep + 1 then return -sp -- Obviously the needle is never in an empty sequence/slice. end if return 0 -- Slice starts after the end. end if lo = sp hi = ep while lo <= hi do mid = floor((lo + hi) / 2) c = eu:compare(needle, haystack[mid]) if c < 0 then -- Needle is before current item so adjust the highest point. hi = mid - 1 elsif c > 0 then -- Needle is after current item so adjust the lowest point. lo = mid + 1 else -- Found it! Now check if its the first occurance. for i = mid-1 to 1 by -1 do if eu:equal(needle, haystack[i]) then mid = i end if end for ifdef DEBUG_MODE then if mid != find(needle, haystack) then crash("ASSERT: find() failed") end if end ifdef return mid end if end while -- return the position it would have, if inserted now if c > 0 then mid += 1 end if ifdef DEBUG_MODE then if mid > ep then if mid > ep + 1 then crash("ASSERT: Final index beyond haystack") end if if eu:compare(needle, haystack[ep]) <= 0 then crash("ASSERT: needle not found but lower than last item") end if elsif mid < sp then crash("ASSERT: Final index before haystack") else if eu:compare(needle, haystack[mid]) >= 0 then crash("ASSERT: needle not found but higher than insertion point.") end if if mid > sp and eu:compare(needle, haystack[mid-1]) <= 0 then crash("ASSERT: needle not found but lower than insertion point.") end if end if end ifdef return -mid end function ifdef DEBUG_MODE then include std/unittest.e constant haystack = "012345678ABCDEFGHIJKLMMMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" test_equal("binary_search missing #0", -1, binary_search(' ',haystack)) test_equal("binary_search missing #1", -10, binary_search('9',haystack)) test_equal("binary_search missing #2", -1, binary_search(0,haystack)) test_equal("binary_search missing #3", -64, binary_search("rubbish",haystack)) test_equal("binary_search missing #4", -64, binary_search(9999,haystack)) test_equal("binary_search missing subset #1", -5, binary_search('0',haystack, 5, 17)) test_equal("binary_search missing subset #2", -18, binary_search('p',haystack, 5, 17)) test_equal("binary_search missing subset #3", -17, binary_search('A',haystack, 17)) test_equal("binary_search missing subset #4", -8, binary_search('A',haystack, 1, 7)) test_equal("binary_search missing subset #5", -62, binary_search('z',haystack, 5, -2)) test_equal("binary_search missing subset #6 empty slice", -10, binary_search('z',haystack, 10, 9)) test_equal("binary_search found first", 1, binary_search('0',haystack)) test_equal("binary_search found last", 63, binary_search('z',haystack)) test_equal("binary_search found middle#1", 32, binary_search('U',haystack)) test_equal("binary_search found middle#2", 33, binary_search('V',haystack)) test_equal("binary_search found middle#3", 22, binary_search('M',haystack)) test_equal("binary_search found auto sp", 32, binary_search('U',haystack, -100)) test_equal("binary_search found auto ep", 32, binary_search('U',haystack, 1, 100)) test_equal("binary_search found subset #1", 10, binary_search('A',haystack, 5, 17)) test_equal("binary_search found subset #1", 10, binary_search('A',haystack, 5, -2)) test_equal("binary_search bad slice #1", 0, binary_search('A',haystack, 17, 1)) test_equal("binary_search empty input", -1, binary_search('A',{})) test_equal("binary_search strings", 5, binary_search("cat",{"apple", "bat", "car", "cast", "cat", "category", "dog"})) test_report() end ifdef
22. Re: binary search with reference to append
- Posted by gimlet Jun 15, 2014
- 1788 views
Derek,
This is the last I will say on this as it is getting personal.
What I objected to in the first place was the logical prelude in binary_search which:
1 was erroneous.
2 was making inferences about the call which went beyond what was reasonable.
It seems to me that the prelude code is defensive programming with a vengeance. It seems to be trying to make sense of start_points and end_points which are invalid, returning positions which do not make good sense and so on.
Your code is not correct, sorry.
I think what Jaygade wrote or the first of mine (with sp and ep) are both simpler and better.
sp = start_point if end_point <= 0 then -- End point specified is relative to the end of the sequence. ep = length(haystack) + end_point else ep = end_point -- ep=3 => length?? (G) end if if ep > length(haystack) then -- this cannot happen (G) ep = length(haystack) end if if sp < 1 then sp = 1 end if if sp > ep then if sp = ep + 1 then -- if ep = length?? (G) return -sp -- Obviously the needle is never in an empty sequence/slice. -- if sp > ep then why assume the slice starts at sp? -- with ep = length -sp is out of range. (G) end if return 0 -- Slice starts after the end. end if
Look it's hard to write this kind of code because you are forever faced with questions:
What do I do in this particular situation?,
Have I covered every possible case?,
What is the meaning of all this?
Should I even try to cover this?
23. Re: binary search with reference to append
- Posted by DerekParnell (admin) Jun 15, 2014
- 1770 views
This is the last I will say on this as it is getting personal.
What is getting personal? I don't see that myself. I'm not offended by any of these exchanges. I welcome being challenged by my peers to do better.
What I objected to in the first place was the logical prelude in binary_search which:
1 was erroneous.
2 was making inferences about the call which went beyond what was reasonable.
It seems to me that the prelude code is defensive programming with a vengeance. It seems to be trying to make sense of start_points and end_points which are invalid, returning positions which do not make good sense and so on.
Yes, I understand that these are you concerns with the current library code. However, I'm sure you also recognise that some of your arguments, like some of mine, are based on opinion rather than empirical evidence.
Your code is not correct, sorry.
I thank you for your concern. I cannot see the errors in the code I've just presented, but I'd like to have them pointed out to me so I can improve my coding abilities. (I've been learning how to be a better programmer since 1972, by the way).
I think what Jaygade wrote or the first of mine (with sp and ep) are both simpler and better.
Whether that code is "both simpler and better" is a point of view. In any case, the published API needs to be respected, so therefore we are stuck with having to deal with the optional parameters for starting_point and ending_point. I think that if we were writing this function for inclusion into the library now, we would not have these optional parameters. In hindsight, they are probably a poor design decision. But here we are ... we are stuck with them. So let's just manage the best we can.
sp = start_point if end_point <= 0 then -- End point specified is relative to the end of the sequence. ep = length(haystack) + end_point else ep = end_point -- ep=3 => length?? (G) end if if ep > length(haystack) then -- this cannot happen (G) ep = length(haystack) end if if sp < 1 then sp = 1 end if if sp > ep then if sp = ep + 1 then -- if ep = length?? (G) return -sp -- Obviously the needle is never in an empty sequence/slice. -- if sp > ep then why assume the slice starts at sp? -- with ep = length -sp is out of range. (G) end if return 0 -- Slice starts after the end. end if
I'm assuming that the code extract above is some sort of attempt to demonstrate the errors in my code. I'm guessing this as you don't actually say what its here for. So sorry if I'm getting that wrong.
I have no idea what the comment "ep=3 => length?? " is meant to be saying to me.
Again, the comment "this cannot happen" doesn't make a lot of sense to me. As far as I can make out, it is quite possible to call this function with an end point specified that is greater than the length of the physical sequence to be searched, and I'm sure you agree with that, so I'm unsure as to the meaning of this comment.
if sp > ep then why assume the slice starts at sp? with ep = length -sp is out of range. (G)
I 'assume' the slice starts at 'sp' because that what the caller told me it started at. That's the entire point of having this optional parameter --- to tell me where the slice starts.
I know that "with ep = length -sp is out of range.", and that again is the point of the code. This function is to return a negative value when the caller specified slice is valid AND the requested item in not in that slice. The magnitude of the returned value is an indication of where the item might have been located if it had of been in the slice. That is what the published API says (though not very well I have to admit).
Look it's hard to write this kind of code because you are forever faced with questions:
What do I do in this particular situation?,
Have I covered every possible case?,
What is the meaning of all this?
Should I even try to cover this?
No argument here. That is why we strive to write better code and to have it peer reviewed. Just because writing good code is difficult, that is no reason to shirk our responsibility to do exactly that: to write good code.
Thank for for helping me to be a better programmer.
24. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 15, 2014
- 1769 views
I've been learning how to be a better programmer since 1972, by the way.
Wow! Your skills are older than BSD UNIX!
- jaw and pants drop to the floor*
25. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 15, 2014
- 1758 views
I have no idea what the comment "ep=3 => length?? " is meant to be saying to me.
Could it be a typo for "ep=0 => length?? " ?
In that case, I'd read the meaning as "why the heck do you treat ep=0 to be the same as the length of the haystack, and why the heck do you treat ep=-3 to be the same as the length of the haystack minus 3 elements, or 3 elements up from the end of the haystack?" or something.
But of course, the reason we do that is straightforward and obvious.
26. Re: binary search with reference to append
- Posted by petelomax Jun 15, 2014
- 1765 views
I think that if we were writing this function for inclusion into the library now, we would not have these optional parameters. In hindsight, they are probably a poor design decision. But here we are ... we are stuck with them.
It seems perfectly reasonable to me, if everyone else agrees, to remove them. Of course that would be with a suitable entry in the release notes and both the manual and bsearch.e containing something like
-- Compatibility Note: -- Prior to 4.2 bsearch() allowed the start and end points to be specified. This was agreed to be removed (http://openeuphoria.org/forum/124592.wc), -- if necessary replace eg res = bsearch(needle,haystack,5,10) with res = bsearch(needle,haystack[5..10]), remembering to add 4 to abs(res).
Obviously, that is, if appropriate code changes to manage without them do not adversely affect the performance of next_prime().
Pete
27. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 15, 2014
- 1767 views
I think that if we were writing this function for inclusion into the library now, we would not have these optional parameters. In hindsight, they are probably a poor design decision. But here we are ... we are stuck with them.
It seems perfectly reasonable to me, if everyone else agrees, to remove them. Of course that would be with a suitable entry in the release notes and both the manual and bsearch.e containing something like
Well, maybe. But this violates a Linus Torvald's maxim: We don't break user apis.
If it's in alpha or beta, then maybe. This made it to a bunch of 4.0.x releases, though.
Otherwise, the only time we chose to break this rule was with namespaces, and that was because we thought there was no way to make sane namespaces with the then-existing rules.
remembering to add 4 to abs(res).
Hmm, actually that's a good reason to keep it, same reason we have find_from()/match_from(). I always have trouble with that math when I have to do it myself.
Obviously, that is, if appropriate code changes to manage without them do not adversely affect the performance of next_prime().
I don't think we make a full shallow copy of a sequence (let alone a deep one) when we make a slice, so I doubt we'd see any performance issues here because of just this.
28. Re: binary search with reference to append
- Posted by jaygade Jun 15, 2014
- 1774 views
I'm okay with breaking user APIs; we're talking about Euphoria not Unix. Programming languages do this all the time with major releases.
The key would be to deprecate the parameters for at least one point release, then ignore them going forward. If that's what you wanted to do.
However, the function works as-is, the documentation can state that the parameters are deprecated and to pass a slice instead, and defer the code change indefinitely.
29. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 15, 2014
- 1750 views
I'm okay with breaking user APIs; we're talking about Euphoria not Unix. Programming languages do this all the time with major releases.
I thought the way we did things was.
X.Y.Z
Z - patchlevel release. bugfixes and new routines in stdlib.
Y - minor release. new language features, keywords, syntax, etc.
X - major release, backwards incompatibility.
So this would have to be pushed back to 5.0.0 if we follow this.
The key would be to deprecate the parameters for at least one point release, then ignore them going forward. If that's what you wanted to do.
I'm not sure I like this - there's a chance that legacy code would not break, but return a logic error (it found the match, but it's outside the range of the slice, and thus should not have reported the match), raising hard to debug issues later on...
However, the function works as-is, the documentation can state that the parameters are deprecated and to pass a slice instead, and defer the code change indefinitely.
Good point. This works.
30. Re: binary search with reference to append
- Posted by andi49 Jun 15, 2014
- 1699 views
I'm okay with breaking user APIs; we're talking about Euphoria not Unix. Programming languages do this all the time with major releases.
The key would be to deprecate the parameters for at least one point release, then ignore them going forward. If that's what you wanted to do.
However, the function works as-is, the documentation can state that the parameters are deprecated and to pass a slice instead, and defer the code change indefinitely.
Isn't there a 'deprecated' keyword in eu4.1 for this kind of situation?
deprecate public procedure say_hello(sequence name) printf(1, "Hello, %s\n", { name }) end procedure
Isn't this a good reason to release 4.1 as fast as possible?
Andreas
31. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 15, 2014
- 1692 views
Isn't there a 'deprecated' keyword in eu4.1 for this kind of situation?
No, but we could probably add one.
deprecate public procedure say_hello(sequence name) printf(1, "Hello, %s\n", { name }) end procedure
Isn't this a good reason to release 4.1 as fast as possible?
Andreas
Yes!
32. Re: binary search with reference to append
- Posted by andi49 Jun 16, 2014
- 1665 views
Isn't there a 'deprecated' keyword in eu4.1 for this kind of situation?
No, but we could probably add one.
It's a kind of funny. My Version of Eu4.1 has support for the 'deprecated' keyword and it works. It's also in the Doc's for 4.1.
Section: 4.2.4 Deprecation
http://euphoria.indonesianet.de/eu4zip/doc41/html/lang_decl.html#_140_deprecation
Andreas
33. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 16, 2014
- 1661 views
Isn't there a 'deprecated' keyword in eu4.1 for this kind of situation?
No, but we could probably add one.
It's a kind of funny. My Version of Eu4.1 has support for the 'deprecated' keyword and it works. It's also in the Doc's for 4.1.
Section: 4.2.4 Deprecation
http://euphoria.indonesianet.de/eu4zip/doc41/html/lang_decl.html#_140_deprecation
Andreas
Oops. I stand corrected.
34. Re: binary search with reference to append
- Posted by jaygade Jun 16, 2014
- 1664 views
I don't think that the deprecate keyword would work in this case however, since we're deprecating optional parameters and not an entire routine.
Unless deprecate works in the formal parameter declaration?
I would think that editing the doc comment of the function to mention that the sp and ep parameters are deprecated would be sufficient. "Don't use these, use a slice instead. These optional parameters will no longer be valid at some future point."
35. Re: binary search with reference to append
- Posted by gimlet Jun 22, 2014
- 1652 views
Just as I will be very happy when load_map() is fixed.
What makes you think that it is (still) broken? Do you have examples to demonstrate this?
Wow!
I wrote a long article about what was wrong with load_map and you fix a few warts.
Then you say this??
Jim,
You do not like what I have to say,
You disregard what Dijkstra, van Gasteren, Feijin, Simon Peyton Jones, Lennart Augustsson, Ralf Hinze, Paul Hudak, John Hughes, Erik Meijer, Philip Wadler
have to say because you know better??
Really?
36. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 22, 2014
- 1805 views
Just as I will be very happy when load_map() is fixed.
What makes you think that it is (still) broken? Do you have examples to demonstrate this?
Wow!
I wrote a long article about what was wrong with load_map and you fix a few warts.
Then you say this??
I made a few quick fixes to deal with all the warts I knew of. You're right that libraries shouldn't ship with broken functions.
Jim,
You do not like what I have to say,
You disregard what Dijkstra, van Gasteren, Feijin, Simon Peyton Jones, Lennart Augustsson, Ralf Hinze, Paul Hudak, John Hughes, Erik Meijer, Philip Wadler
have to say because you know better??
Really?
I think Derek has a better handle on the internals of the library and he is working on a proper rewrite. I trust him to resolve any potential issues the cosmetic fixes I appled would not (e.g. any performance issues, for example). In the meantime, I applied those quick fixes to prevent the issues you reported from holding up the release.
Do you have any examples where load_map() still breaks? This is the third time I have asked you for this.
37. Re: binary search with reference to append
- Posted by gimlet Jun 22, 2014
- 1648 views
Jim,
To you no.
They are in what I wrote, if you don't want to check fine.
You fixed the few issues I pointed out to Matt. You ignored everything else. The whole thing is a disaster.
If you imagine that the function actually works then use it for something that matters to you. I certainly won't until it's fixed and probably not then as I distrust more and more of the Euphoria library functions.
I have no confidence that you want to fix things. You want to defend Euphoria against all criticism. You want to drive critics away.
So, why should I help you?
38. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 22, 2014
- 1672 views
You fixed the few issues I pointed out to Matt.
Like I said, I fixed the issues in functionality. Most of what you wrote is a critique of the algorithms used in the code. These I left to Derek to address.
You ignored everything else. The whole thing is a disaster.
I didn't ignore anything. I just took the easy part and left the hard stuff to Derek.
If you imagine that the function actually works then use it for something that matters to you.
I have and it works fine for what I need.
I certainly won't until it's fixed and probably not then as I distrust more and more of the Euphoria library functions.
This sounds more like a personal problem than anything to do with us. Still, whatever floats your boat.
I have no confidence that you want to fix things.
I applied a fix to load_map(), and then you say I don't want to fix things?
You want to defend Euphoria against all criticism.
I agree with Derek here - constructive criticism is good. It improves the language and the standard library.
However, I confess that I do want to defend against unconstructive and unproductive criticism.
Jim,
To you no.
They are in what I wrote, if you don't want to check fine.
So, why should I help you?
I won't ask again. Since you can't point to a single example where load_map(), given valid input, fails to produce valid output, I maintain my confidence that load_map() is not functionally broken.
Anyways, this load_map() argument is becoming quite repetitive. Unless you (or anyone else) has any new evidence, I think we should just agree to disagree and drop the subject.
You want to drive critics away.
I welcome constructive criticism. Generally, I only want to drive trolls away.
39. Re: binary search with reference to append
- Posted by gimlet Jun 22, 2014
- 1599 views
gimlet said...
I wasn't addressing that to you, more to Jaygade. However you have not demonstrated that you understand.
This is just a veiled way of stating that I do not understand.
I contend that I have demonstrated that I understand.
In fact, I could go further, and claim that, by your missteps (not understanding binary search and sorting arrays, advocating and then contradicting design by contract) that you understand less.
I never said you did not understand binary search.
I did say the array does not have to be sorted.
Your 'counter-example' is no counter example.
You did not check my sources.
But this is all besides the point. You are saying that binary_search() is fine, except we should drop the slicing functionality and the jaygade point, as you think that seem to overcomplicate things.
Jim,
If you are going to ban everyone who disagrees Euphoria will die.
You want to ban me because I doubted your qualifications. Perhaps I was wrong, perhaps I was not. Banning me doesn't prove anything other than that you can ban people (for whatever reason you provide).
40. Re: binary search with reference to append
- Posted by jimcbrown (admin) Jun 22, 2014
- 1665 views
I never said you did not understand binary search.
Agreed. Where did that come from?
I did say the array does not have to be sorted.
But it does! http://algorithms.openmymind.net/search/binarysearch.html http://en.wikipedia.org/wiki/Binary_search_algorithm
Your 'counter-example' is no counter example.
Yes, it is.
But this is all besides the point. You are saying that binary_search() is fine, except we should drop the slicing functionality and the jaygade point, as you think that seem to overcomplicate things.
and wasn't this accepted?
Sort of. Everyone seems to agree with it theoretically, but due solely to backwards compatibility concerns, this change won't happen.
Jim,
If you are going to ban everyone who disagrees Euphoria will die.
If I ban everyone who's ever disagreed with me? Agreed. I'd have to ban Derek and Matt and Shawn and euphoric and Jeremy and ... dang, probably the entire community. Almost everyone who has ever posted on the forum!
Of course, I have no desire to ban people who merely disagree with me on an academic subject. That would just be plain silly. So, what's your point?
You want to ban me
When did I say this? I did suggest extended moderation may be in the offing, but that's not the same thing as a ban.
because I doubted your qualifications.
This has nothing to do with it.
41. behavior of slice
- Posted by SDPringle Jun 24, 2014
- 1515 views
I don't think we make a full shallow copy of a sequence (let alone a deep one) when we make a slice, so I doubt we'd see any performance issues here because of just this.
I would be very surprised if this was not the case. A result of a slice operation has a distinct length, if the base pointer points inside of another sequence's base pointer's members we are not protected by ref counting scheme employed by EUPHORIA. There must be a shallow copy of the members of the passed sequence in a slice. We must create a new sequence when ever we do a slice operation.
Shawn
42. version policy of EUPHORIA
- Posted by SDPringle Jun 24, 2014
- 1445 views
I thought the way we did things was.
X.Y.Z
Z - patchlevel release. bugfixes and new routines in stdlib.
Y - minor release. new language features, keywords, syntax, etc.
X - major release, backwards incompatibility.
So this would have to be pushed back to 5.0.0 if we follow this.
This is exactly the way we have been doing things. When it comes to deciding what the required version of a library you are using, you cannot count on others to behave so well. Another dev group can decide to change the meaning of alpha channel from 1 means full transparency to 1 means full opacity in a patch version. This kind of thing happened in a game library I tried. I believe we should look to this example, as a guide of what not to do.
Patch level changes mean you may have to put namespaces on symbols that conflict with new library routines. Minor release changes mean you may have rename a symbol because it may conflict with a builtin. Major release means you will have to overhaul the code to get it up to the newest version.
Shawn