1. find function

Below is a function that can be used in place of Euphoria's find() IF the
list to be searched is sorted ( in ascending order).
I have written it to manage a word list for my windows editor ( which is
progressing slowly sad  ) but I thought that some may find it useful and
maybe someone could optimize it especially for a small list where find()
currently has the advantage. For any list the search time for almost all
elements is fairly constant regardless of where they are located. In the
case of a massive list where the element to be located is NOT near the start
of the list then this function really shows it's colours. I did have a look
at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose and
may be be less "clean" to implement. I don't know if the name is appropriate
but there is a 'integer divide by 2' part in the code so I thought "What the
heck, why not!".

Any feedback is appreciated.

Yours Truly
Mike

vulcan at win.co.nz

global function binary_find(object element, sequence list)
 -- find element in list, same as find()
 integer start, finish, low, high, range, jump, d

 start=1
 finish=length(list)

 low=start
 high=finish

 while 1 do
  range=high - low

  if range < 2 then
   d = find(element, list[low..high])
   if d then d += low - 1 end if
   return d
  else
   jump=floor(range/2)
   d=compare(element, list[low + jump])

   if d=1 then -- search higher
    low += jump
    if low > finish then
     return 0
    end if
   elsif d=-1 then -- search lower
    high=low + jump
   else -- a hit!
    return low + jump
   end if
  end if
 end while
 return 0
 end function

new topic     » topic index » view message » categorize

2. Re: find function

Mike,

Don't have the time to benchmark, but here's the binary search routine I
always use. I believe I got the basic idea from a QuickBasic textbook,
although the Euphoria implementation is my own. Enjoy!

Gabriel


global function find_sorted(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted prior to this call
integer lo, hi, mid, c
   lo = 1
   hi = length(s)

   while lo <= hi do
      mid = floor((lo + hi) / 2)
      c = compare(x, s[mid])
      if    c < 0 then  -- x < s[mid]
         hi = mid - 1
      elsif c > 0 then  -- x > s[mid]
         lo = mid + 1
      else              -- x = s[mid]
         return mid
      end if
   end while

   return 0
end function


-----Original Message-----
From: Mike [mailto:vulcan at WIN.CO.NZ]
Sent: Monday, October 18, 1999 2:29 PM
To: EUPHORIA at LISTSERV.MUOHIO.EDU
Subject: find function


Below is a function that can be used in place of Euphoria's find() IF the
list to be searched is sorted ( in ascending order).
I have written it to manage a word list for my windows editor ( which is
progressing slowly sad  ) but I thought that some may find it useful and
maybe someone could optimize it especially for a small list where find()
currently has the advantage. For any list the search time for almost all
elements is fairly constant regardless of where they are located. In the
case of a massive list where the element to be located is NOT near the start
of the list then this function really shows it's colours. I did have a look
at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose and
may be be less "clean" to implement. I don't know if the name is appropriate
but there is a 'integer divide by 2' part in the code so I thought "What the
heck, why not!".

Any feedback is appreciated.

Yours Truly
Mike

vulcan at win.co.nz

global function binary_find(object element, sequence list)
 -- find element in list, same as find()
 integer start, finish, low, high, range, jump, d

 start=1
 finish=length(list)

 low=start
 high=finish

 while 1 do
  range=high - low

  if range < 2 then
   d = find(element, list[low..high])
   if d then d += low - 1 end if
   return d
  else
   jump=floor(range/2)
   d=compare(element, list[low + jump])

   if d=1 then -- search higher
    low += jump
    if low > finish then
     return 0
    end if
   elsif d=-1 then -- search lower
    high=low + jump
   else -- a hit!
    return low + jump
   end if
  end if
 end while
 return 0
 end function

new topic     » goto parent     » topic index » view message » categorize

3. Re: find function

I was curious to know at what point it becomes
better to use a binary search instead of the built-in
find() function.

I used Gabriel Boehme's version and compared using,
in one case, a sequence of integers, and in another case,
a sequence of strings.

It seems that for sorted sequences of up to 530 integers,
it's better to use the built-in find(). For sorted sequences
of up to 35 strings, the built-in find() also wins. I look up each
element in the sequence once. comparing strings
takes longer than comparing integers, and a binary search
needs fewer compares.

Below, all look-ups are successful.
I didn't test the case where a lot of look-ups fail.
The regular find() would not do quite as well in that kind of test.

global function find_sorted(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted prior to this call
integer lo, hi, mid, c
   lo = 1
   hi = length(s)

   while lo <= hi do
      mid = floor((lo + hi) / 2)
      c = compare(x, s[mid])
      if  c < 0 then  -- x < s[mid]
           hi = mid - 1
      elsif c > 0 then  -- x > s[mid]
           lo = mid + 1
      else              -- x = s[mid]
           return mid
      end if
   end while

   return 0
end function

include sort.e

-- constant N = 530  -- find is faster up to 530 integers
constant N = 35      -- find is faster up to  35 strings

constant ITER = 5000
sequence test

-- integers up to 1000:
-- test = sort(rand(repeat(1000, N)))

-- 5-character strings:
test = sort(rand(repeat({255,255,255,255,255}, N)))

object x
atom t
t = time()
for i = 1 to ITER do
    for j = 1 to length(test) do
         x = find_sorted(test[j], test)
    end for
end for
printf(1, "find_sorted: %.2f\n", time()-t)

t = time()
for i = 1 to ITER do
    for j = 1 to length(test) do
         x = find(test[j], test)
    end for
end for
printf(1, "find: %.2f\n", time()-t)


Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

4. Re: find function

Rob,

Cool stats! Your curiosity on this point helped me to remember some
benchmarking *I* did a couple years ago on this very same thing. I seem to
recall that using the built-in find() on the *unsorted* list of strings
produced much faster numbers than the built-in find() on the sorted list. My
theory (at the time) was that the internal address values (or whatever) get
all shuffled around when you sort(), and are much faster to access
sequentially in their original, unsorted state. Is this true? Also, are the
stats you give us below in any way dependent on computer speed, or are they
true regardless of what machine they're run on?

Thanks,
Gabriel

-----Original Message-----
From: Robert Craig [mailto:rds at ATTCANADA.NET]
Sent: Monday, October 18, 1999 11:23 PM
To: EUPHORIA at LISTSERV.MUOHIO.EDU
Subject: Re: find function


I was curious to know at what point it becomes
better to use a binary search instead of the built-in
find() function.

I used Gabriel Boehme's version and compared using,
in one case, a sequence of integers, and in another case,
a sequence of strings.

It seems that for sorted sequences of up to 530 integers,
it's better to use the built-in find(). For sorted sequences
of up to 35 strings, the built-in find() also wins. I look up each
element in the sequence once. comparing strings
takes longer than comparing integers, and a binary search
needs fewer compares.

Below, all look-ups are successful.
I didn't test the case where a lot of look-ups fail.
The regular find() would not do quite as well in that kind of test.

global function find_sorted(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted prior to this call
integer lo, hi, mid, c
   lo = 1
   hi = length(s)

   while lo <= hi do
      mid = floor((lo + hi) / 2)
      c = compare(x, s[mid])
      if  c < 0 then  -- x < s[mid]
           hi = mid - 1
      elsif c > 0 then  -- x > s[mid]
           lo = mid + 1
      else              -- x = s[mid]
           return mid
      end if
   end while

   return 0
end function

include sort.e

-- constant N = 530  -- find is faster up to 530 integers
constant N = 35      -- find is faster up to  35 strings

constant ITER = 5000
sequence test

-- integers up to 1000:
-- test = sort(rand(repeat(1000, N)))

-- 5-character strings:
test = sort(rand(repeat({255,255,255,255,255}, N)))

object x
atom t
t = time()
for i = 1 to ITER do
    for j = 1 to length(test) do
         x = find_sorted(test[j], test)
    end for
end for
printf(1, "find_sorted: %.2f\n", time()-t)

t = time()
for i = 1 to ITER do
    for j = 1 to length(test) do
         x = find(test[j], test)
    end for
end for
printf(1, "find: %.2f\n", time()-t)


Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

5. Re: find function

Gabriel Boehme writes:
> I seem to recall that using the built-in find() on
> the *unsorted* list of strings produced much faster
> numbers than the built-in find() on the sorted list.
> My theory (at the time) was that the internal address
> values (or whatever) get all shuffled around when
> you sort(), and are much faster to access sequentially
> in their original, unsorted state. Is this true?

I can't imagine that a big difference in speed
would be caused just by that.

> Also, are the stats you give us below in any way
> dependent on computer speed, or are they
> true regardless of what machine they're run on?

I'd expect the sequence-length where binary search
becomes faster than find() to be roughly the same
on all machines, although find() might suffer more
than find_sorted() if the on-chip data cache could not
hold the whole sequence.

Regards,
     Rob Craig
     Rapid Deployment Software
     http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

6. Re: find function

Mike,
Here's a set of routines that I use.  They assume that the s is an
ascending sequence of unique elements.  I haven't gotten around to
optimizing them yet, but you're welcome to use them if you want.
Joe


-- Searches for object x in sorted ascending sequence s
-- Returns the position of x if found or the insertion point for x if not
found

function binarySearch (object x, sequence s)
    integer hi, lo, mid, res

    hi = length (s)
    if hi = 0 then return 1 end if
    lo = 1

    while lo <= hi do
        mid = floor ((lo + hi) / 2)
        res = compare (x, s [mid])

        if    res < 0 then hi = mid - 1
        elsif res > 0 then lo = mid + 1
        else               return mid
        end if
    end while

    return mid + (lo > mid)
end function


-- Use binarySearch () to locate x

global function findSorted (object x, sequence s)
    integer res
    res = binarySearch (x, s)
    if res > length (s) or (not equal (s [res], x)) then
        res = 0
    end if
    return res
end function


-- Inserts object x into sorted ascending sequence s

global function insertSorted (sequence s, object x)
    integer res
    res = binarySearch (x, s)
    if res > length (s) then
        return append (s, x)
    end if
    if equal (s [res], x) then
        return s
    end if
    return s [1 .. res - 1] & prepend (s [res .. length (s)], x)
end function



-----Original Message-----
From:   Mike [SMTP:vulcan at WIN.CO.NZ]
Sent:   Monday, October 18, 1999 2:29 PM
Subject:        find function

Below is a function that can be used in place of Euphoria's find() IF the
list to be searched is sorted ( in ascending order).
I have written it to manage a word list for my windows editor ( which is
progressing slowly sad  ) but I thought that some may find it useful and
maybe someone could optimize it especially for a small list where find()
currently has the advantage. For any list the search time for almost all
elements is fairly constant regardless of where they are located. In the
case of a massive list where the element to be located is NOT near the
start
of the list then this function really shows it's colours. I did have a look
at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose
and
may be be less "clean" to implement. I don't know if the name is
appropriate
but there is a 'integer divide by 2' part in the code so I thought "What
the
heck, why not!".

Any feedback is appreciated.

Yours Truly
Mike

vulcan at win.co.nz

global function binary_find(object element, sequence list)
 -- find element in list, same as find()
 integer start, finish, low, high, range, jump, d

 start=1
 finish=length(list)

 low=start
 high=finish

 while 1 do
  range=high - low

  if range < 2 then
   d = find(element, list[low..high])
   if d then d += low - 1 end if
   return d
  else
   jump=floor(range/2)
   d=compare(element, list[low + jump])

   if d=1 then -- search higher
    low += jump
    if low > finish then
     return 0
    end if
   elsif d=-1 then -- search lower
    high=low + jump
   else -- a hit!
    return low + jump
   end if
  end if
 end while
 return 0
 end function

__________________________________________
NetZero - Defenders of the Free World
Get your FREE Internet Access and Email at
http://www.netzero.net/download/index.html

new topic     » goto parent     » topic index » view message » categorize

7. Re: find function

Thanks to Gabriel & Rob & Joe for replying. I can see now that my original
search
routine was a little bit inefficient. I did some time tests on two
components that make up the function and I would like to share some
interesting (Well, I thought they were!) findings.
With the while loop it seems that..

while 1 do
if <condition> then exit end if
..some code
end while

..is faster than..

while <condition> do
..some code
end while

This is useful to know because a coder can adopt a fixed format for a while
loop AND have the flexibility to emulate a 'do while' loop. All this safe in
the knowledge of having optimal speed.

The other thing was this:

mid=low+high
mid=floor(mid/2)

..is significantly faster than..

mid=floor((low+high)/2)

Probably in the binary search function there may be little or no noticeable
difference because the number of times the function loops is very small.
However...including both optimizations yields the result as below.

Yours Truly
Mike

vulcan at win.co.nz


global function find_sorted(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted in ascending order prior to this
call


integer low, high, mid, c
low = 1
high = length(s)
while 1 do
  mid=low+high
  mid=floor(mid/2)
  c = compare(x, s[mid])
  if c < 0 then  -- search lower
   high = mid - 1
  elsif c > 0 then -- search higher
   low = mid + 1
  else    -- found!
   return mid
  end if
  if low > high then exit end if
end while
return 0
end function



>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu