1. stdlib: Binary search function

This has been an oft requested functionality, and was in principle agreed, even
by Rob:
http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=binary_find
http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=bfind

Some implementations have been mentioned already (G. Boehme, J. Babor,...).

Now, the devil is in the details. What shall the function do?
* accept only strictly increasing sequences
* accept only strictly monotoni sequences
* accept loosely monotonic sequences (with possible duplicates)
* only cope with <any selection above>, not checking

I think everyone agrees it should return like db_find_key(), ie n>0 when found,
n<0 with ordered insertion resulting in item lading at position -n.

And another point which wasn't agreed upon: would this go to sort.e or
sequence.e?

The version I use in sets.e has a from and to extra parameters. They could be
easily defaulted. Monoony could be passed as a parameter too, depending on the
option chosen.

CChris

new topic     » topic index » view message » categorize

2. Re: stdlib: Binary search function

CChris wrote:
> 
> 
> This has been an oft requested functionality, and was in principle agreed,
> even
> by Rob:
> <a
> href="http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=binary_find">http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=binary_find</a>
> <a
> href="http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=bfind">http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=bfind</a>
> 
> Some implementations have been mentioned already (G. Boehme, J. Babor,...).
> 
> Now, the devil is in the details. What shall the function do?
> * accept only strictly increasing sequences
> * accept only strictly monotoni sequences
> * accept loosely monotonic sequences (with possible duplicates)
> * only cope with <any selection above>, not checking
> 
> I think everyone agrees it should return like db_find_key(), ie n>0 when
> found,
> n<0 with ordered insertion resulting in item lading at position -n.
> 
> And another point which wasn't agreed upon: would this go to sort.e or
> sequence.e?
> 
> The version I use in sets.e has a from and to extra parameters. They could be
> easily defaulted. Monoony could be passed as a parameter too, depending on the
> option chosen.
> 
> CChris

Well, if the Devil is in the details, lets see the details. 
Babor, Boehme, and Otto's bfind is short sweet and simple.



global function bfind(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted prior to this call
-- Gabriel Boehme's code
-- Earlier Joe Otto's routine is almost identical.
integer lo, hi, mid, c
   lo = 1
   hi = length(s)

   while lo <= hi do
      mid = floor((lo + hi) / 2)
      c = compare(x, s[mid])
      if    c < 0 then  -- x < s[mid]
         hi = mid - 1
      elsif c > 0 then  -- x > s[mid]
         lo = mid + 1
      else              -- x = s[mid]
         return mid
      end if
   end while

   return 0
end function


It sounds as though your binary find may be a bit of
a swiss army knife approach. Please post your bfind
code from sets.e and let us play arround with it--
But if there are many details, there might be more Devils!



Ken Rhodes
Folding at Home: http://folding.stanford.edu/
100% MicroSoft Free
SuSE Linux 10.3
No AdWare, SpyWare, or Viruses!
Life is Good,  smile

new topic     » goto parent     » topic index » view message » categorize

3. Re: stdlib: Binary search function

integer found   --flag

function findPos(sequence s, sequence ss)
  --bynary search in a sorted array of strings
  --it returns position of found string or position where it could be iserted
  --which one is found we know by the flag

  integer lo,hi,mid,c
  lo = 1
  hi = length(ss)
  
  c = compare(s,ss[1])
  if c<0 then
    found = 0
    return 1
  elsif not c then
    found = 1
    return 1
  end if
  
  c = compare(s,ss[hi]) 
    if c>0 then
    found = 0
    return hi+1
  elsif not c then
    found = 1
    return hi
  end if  

  while 1 do
    w = hi - lo
    if not w then
      found = 0
      return hi
    elsif w<10 then
      lo += 1
      while 1 do
        c = compare(s,ss[lo])
        if not c then
          found = 1
          return lo
        elsif c<0 then
          found = 0
          return lo
        else
          lo += 1
        end if
      end while
    else  --(hi-lo)>=10
      mid = floor((lo + hi) / 2)
      c = compare(s,ss[mid])
      if c<0 then -- x < ss[mid]
        hi = mid
      elsif c>0 then  -- x > ss[mid]
        lo = mid
      else      -- x = ss[mid]
        found = 1
        return mid
      end if
    end if
  end while
end function  --findPos

new topic     » goto parent     » topic index » view message » categorize

4. Re: stdlib: Binary search function

Kenneth Rhodes wrote:
> 
> CChris wrote:
> > 
> > 
> > This has been an oft requested functionality, and was in principle agreed,
> > even
> > by Rob:
> > <a
> > href="http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=binary_find">http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=binary_find</a>
> > <a
> > href="http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=bfind">http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=bfind</a>
> > 
> > Some implementations have been mentioned already (G. Boehme, J. Babor,...).
> > 
> > Now, the devil is in the details. What shall the function do?
> > * accept only strictly increasing sequences
> > * accept only strictly monotoni sequences
> > * accept loosely monotonic sequences (with possible duplicates)
> > * only cope with <any selection above>, not checking
> > 
> > I think everyone agrees it should return like db_find_key(), ie n>0 when
> > found,
> > n<0 with ordered insertion resulting in item lading at position -n.
> > 
> > And another point which wasn't agreed upon: would this go to sort.e or
> > sequence.e?
> > 
> > The version I use in sets.e has a from and to extra parameters. They could
> > be
> > easily defaulted. Monoony could be passed as a parameter too, depending on
> > the
> > option chosen.
> > 
> > CChris
> 
> Well, if the Devil is in the details, lets see the details. 
> Babor, Boehme, and Otto's bfind is short sweet and simple.
> 
> 
> }}}
<eucode>
> global function bfind(object x, sequence s)
> -- does a binary search on a sorted sequence
> -- returns index location if found, 0 if not
> -- assumes that sequence s has been sorted prior to this call
> -- Gabriel Boehme's code
> -- Earlier Joe Otto's routine is almost identical.
> integer lo, hi, mid, c
>    lo = 1
>    hi = length(s)
> 
>    while lo <= hi do
>       mid = floor((lo + hi) / 2)
>       c = compare(x, s[mid])
>       if    c < 0 then  -- x < s[mid]
>          hi = mid - 1
>       elsif c > 0 then  -- x > s[mid]
>          lo = mid + 1
>       else              -- x = s[mid]
>          return mid
>       end if
>    end while
> 
>    return 0
> end function
> </eucode>
{{{

> 
> It sounds as though your binary find may be a bit of
> a swiss army knife approach. Please post your bfind
> code from sets.e and let us play arround with it--
> But if there are many details, there might be more Devils!
> 
> 
> Ken Rhodes
> Folding at Home: <a
> href="http://folding.stanford.edu/">http://folding.stanford.edu/</a>
> 100% MicroSoft Free
> SuSE Linux 10.3
> No AdWare, SpyWare, or Viruses!
> Life is Good,  smile

The bfind() I use (can posy it only tonight) only assumes sorted sequences
without duplicates, as that's all it needs, and haas extra parms (lo and hi in
the above code).

Given the discussions referenced above, is this enough?

I forgot to mention: according to whether you expect hits to be more frequent
than misses, or the other way round, you can tweak the algorithm differently for
more overall speed.

In the routine you posted, if the sequence is not sorted, which is not checked,
results are undefined (but there is a result).

CChris

new topic     » goto parent     » topic index » view message » categorize

5. Re: stdlib: Binary search function

Kenneth Rhodes wrote:
> 
> CChris wrote:
> > 
> > 
> > This has been an oft requested functionality, and was in principle agreed,
> > even
> > by Rob:
> > <a
> > href="http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=binary_find">http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=binary_find</a>
> > <a
> > href="http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=bfind">http://www.openeuphoria.org/cgi-bin/esearch.exu?fromMonth=6&fromYear=1&toMonth=5&toYear=D&postedBy=&keywords=bfind</a>
> > 
> > Some implementations have been mentioned already (G. Boehme, J. Babor,...).
> > 
> > Now, the devil is in the details. What shall the function do?
> > * accept only strictly increasing sequences
> > * accept only strictly monotoni sequences
> > * accept loosely monotonic sequences (with possible duplicates)
> > * only cope with <any selection above>, not checking
> > 
> > I think everyone agrees it should return like db_find_key(), ie n>0 when
> > found,
> > n<0 with ordered insertion resulting in item lading at position -n.
> > 
> > And another point which wasn't agreed upon: would this go to sort.e or
> > sequence.e?
> > 
> > The version I use in sets.e has a from and to extra parameters. They could
> > be
> > easily defaulted. Monoony could be passed as a parameter too, depending on
> > the
> > option chosen.
> > 
> > CChris
> 
> Well, if the Devil is in the details, lets see the details. 
> Babor, Boehme, and Otto's bfind is short sweet and simple.
> 
> 
> }}}
<eucode>
> global function bfind(object x, sequence s)
> -- does a binary search on a sorted sequence
> -- returns index location if found, 0 if not
> -- assumes that sequence s has been sorted prior to this call
> -- Gabriel Boehme's code
> -- Earlier Joe Otto's routine is almost identical.
> integer lo, hi, mid, c
>    lo = 1
>    hi = length(s)
> 
>    while lo <= hi do
>       mid = floor((lo + hi) / 2)
>       c = compare(x, s[mid])
>       if    c < 0 then  -- x < s[mid]
>          hi = mid - 1
>       elsif c > 0 then  -- x > s[mid]
>          lo = mid + 1
>       else              -- x = s[mid]
>          return mid
>       end if
>    end while
> 
>    return 0
> end function
> </eucode>
{{{

> 
> It sounds as though your binary find may be a bit of
> a swiss army knife approach. Please post your bfind
> code from sets.e and let us play arround with it--
> But if there are many details, there might be more Devils!
> 
> 
> Ken Rhodes
> Folding at Home: <a
> href="http://folding.stanford.edu/">http://folding.stanford.edu/</a>
> 100% MicroSoft Free
> SuSE Linux 10.3
> No AdWare, SpyWare, or Viruses!
> Life is Good,  smile

I realised I forgot about your request:
constant SETS_1 = {-1,1,-2} -- endgame for length 1
function bfind_(object x,sequence s,integer startpoint,integer endpoint)
-- the actul bfind() sanitizes args, then calls this
-- assumes s sorted with no duplicates, length(s)>=startpoint<=endpoint>0. No
checking.
  integer mid,c

  if length(s)=0 then
    return -1
  end if
  if startpoint<=0 then
    startpoint=1
  end if
  if endpoint>length(s) then
    endpoint=length(s)
  end if
-- check for outliers, which are worst cases 
  c=compare(x,s[startpoint])
  if c=-1 then
    return -1
  elsif c=0 then
    return 1
  end if
  c=compare(x,s[endpoint])
  if c=1 then
    return -endpoint-1
  elsif c=0 then
    return endpoint
  end if
  -- main loop
  while endpoint-startpoint>1 do
-- handling of duplicates
-- if equal(s[startpoint],s[endpoint]) then
--   return startpoint
-- end if
    mid=floor((startpoint+endpoint)/2)
    c=compare(x,s[mid])
    if c=1 then
      startpoint=mid
    elsif c=-1 then
      endpoint=mid
    else
      return mid
    end if
  end while
  return -startpoint
end function


One can change the order of the tests if it is thought that there will be more
hits than misses or the other way round.

Results are undefined is s is not increasing. When x has more than one instance
in s, the commented out handling code causes the choice of the exac index to be
undefined amongst the several possible answers. If s is decreasing, and the slope
is knon somehow, then some 1 and -1 may be replaced by ±slope.

CChris

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu