1. find function
Below is a function that can be used in place of Euphoria's find() IF the
list to be searched is sorted ( in ascending order).
I have written it to manage a word list for my windows editor ( which is
progressing slowly
) but I thought that some may find it useful and
maybe someone could optimize it especially for a small list where find()
currently has the advantage. For any list the search time for almost all
elements is fairly constant regardless of where they are located. In the
case of a massive list where the element to be located is NOT near the start
of the list then this function really shows it's colours. I did have a look
at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose and
may be be less "clean" to implement. I don't know if the name is appropriate
but there is a 'integer divide by 2' part in the code so I thought "What the
heck, why not!".
Any feedback is appreciated.
Yours Truly
Mike
vulcan at win.co.nz
global function binary_find(object element, sequence list)
-- find element in list, same as find()
integer start, finish, low, high, range, jump, d
start=1
finish=length(list)
low=start
high=finish
while 1 do
range=high - low
if range < 2 then
d = find(element, list[low..high])
if d then d += low - 1 end if
return d
else
jump=floor(range/2)
d=compare(element, list[low + jump])
if d=1 then -- search higher
low += jump
if low > finish then
return 0
end if
elsif d=-1 then -- search lower
high=low + jump
else -- a hit!
return low + jump
end if
end if
end while
return 0
end function
2. Re: find function
Mike,
Don't have the time to benchmark, but here's the binary search routine I
always use. I believe I got the basic idea from a QuickBasic textbook,
although the Euphoria implementation is my own. Enjoy!
Gabriel
global function find_sorted(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted prior to this call
integer lo, hi, mid, c
lo = 1
hi = length(s)
while lo <= hi do
mid = floor((lo + hi) / 2)
c = compare(x, s[mid])
if c < 0 then -- x < s[mid]
hi = mid - 1
elsif c > 0 then -- x > s[mid]
lo = mid + 1
else -- x = s[mid]
return mid
end if
end while
return 0
end function
-----Original Message-----
From: Mike [mailto:vulcan at WIN.CO.NZ]
Sent: Monday, October 18, 1999 2:29 PM
To: EUPHORIA at LISTSERV.MUOHIO.EDU
Subject: find function
Below is a function that can be used in place of Euphoria's find() IF the
list to be searched is sorted ( in ascending order).
I have written it to manage a word list for my windows editor ( which is
progressing slowly
) but I thought that some may find it useful and
maybe someone could optimize it especially for a small list where find()
currently has the advantage. For any list the search time for almost all
elements is fairly constant regardless of where they are located. In the
case of a massive list where the element to be located is NOT near the start
of the list then this function really shows it's colours. I did have a look
at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose and
may be be less "clean" to implement. I don't know if the name is appropriate
but there is a 'integer divide by 2' part in the code so I thought "What the
heck, why not!".
Any feedback is appreciated.
Yours Truly
Mike
vulcan at win.co.nz
global function binary_find(object element, sequence list)
-- find element in list, same as find()
integer start, finish, low, high, range, jump, d
start=1
finish=length(list)
low=start
high=finish
while 1 do
range=high - low
if range < 2 then
d = find(element, list[low..high])
if d then d += low - 1 end if
return d
else
jump=floor(range/2)
d=compare(element, list[low + jump])
if d=1 then -- search higher
low += jump
if low > finish then
return 0
end if
elsif d=-1 then -- search lower
high=low + jump
else -- a hit!
return low + jump
end if
end if
end while
return 0
end function
3. Re: find function
I was curious to know at what point it becomes
better to use a binary search instead of the built-in
find() function.
I used Gabriel Boehme's version and compared using,
in one case, a sequence of integers, and in another case,
a sequence of strings.
It seems that for sorted sequences of up to 530 integers,
it's better to use the built-in find(). For sorted sequences
of up to 35 strings, the built-in find() also wins. I look up each
element in the sequence once. comparing strings
takes longer than comparing integers, and a binary search
needs fewer compares.
Below, all look-ups are successful.
I didn't test the case where a lot of look-ups fail.
The regular find() would not do quite as well in that kind of test.
global function find_sorted(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted prior to this call
integer lo, hi, mid, c
lo = 1
hi = length(s)
while lo <= hi do
mid = floor((lo + hi) / 2)
c = compare(x, s[mid])
if c < 0 then -- x < s[mid]
hi = mid - 1
elsif c > 0 then -- x > s[mid]
lo = mid + 1
else -- x = s[mid]
return mid
end if
end while
return 0
end function
include sort.e
-- constant N = 530 -- find is faster up to 530 integers
constant N = 35 -- find is faster up to 35 strings
constant ITER = 5000
sequence test
-- integers up to 1000:
-- test = sort(rand(repeat(1000, N)))
-- 5-character strings:
test = sort(rand(repeat({255,255,255,255,255}, N)))
object x
atom t
t = time()
for i = 1 to ITER do
for j = 1 to length(test) do
x = find_sorted(test[j], test)
end for
end for
printf(1, "find_sorted: %.2f\n", time()-t)
t = time()
for i = 1 to ITER do
for j = 1 to length(test) do
x = find(test[j], test)
end for
end for
printf(1, "find: %.2f\n", time()-t)
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
4. Re: find function
Rob,
Cool stats! Your curiosity on this point helped me to remember some
benchmarking *I* did a couple years ago on this very same thing. I seem to
recall that using the built-in find() on the *unsorted* list of strings
produced much faster numbers than the built-in find() on the sorted list. My
theory (at the time) was that the internal address values (or whatever) get
all shuffled around when you sort(), and are much faster to access
sequentially in their original, unsorted state. Is this true? Also, are the
stats you give us below in any way dependent on computer speed, or are they
true regardless of what machine they're run on?
Thanks,
Gabriel
-----Original Message-----
From: Robert Craig [mailto:rds at ATTCANADA.NET]
Sent: Monday, October 18, 1999 11:23 PM
To: EUPHORIA at LISTSERV.MUOHIO.EDU
Subject: Re: find function
I was curious to know at what point it becomes
better to use a binary search instead of the built-in
find() function.
I used Gabriel Boehme's version and compared using,
in one case, a sequence of integers, and in another case,
a sequence of strings.
It seems that for sorted sequences of up to 530 integers,
it's better to use the built-in find(). For sorted sequences
of up to 35 strings, the built-in find() also wins. I look up each
element in the sequence once. comparing strings
takes longer than comparing integers, and a binary search
needs fewer compares.
Below, all look-ups are successful.
I didn't test the case where a lot of look-ups fail.
The regular find() would not do quite as well in that kind of test.
global function find_sorted(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted prior to this call
integer lo, hi, mid, c
lo = 1
hi = length(s)
while lo <= hi do
mid = floor((lo + hi) / 2)
c = compare(x, s[mid])
if c < 0 then -- x < s[mid]
hi = mid - 1
elsif c > 0 then -- x > s[mid]
lo = mid + 1
else -- x = s[mid]
return mid
end if
end while
return 0
end function
include sort.e
-- constant N = 530 -- find is faster up to 530 integers
constant N = 35 -- find is faster up to 35 strings
constant ITER = 5000
sequence test
-- integers up to 1000:
-- test = sort(rand(repeat(1000, N)))
-- 5-character strings:
test = sort(rand(repeat({255,255,255,255,255}, N)))
object x
atom t
t = time()
for i = 1 to ITER do
for j = 1 to length(test) do
x = find_sorted(test[j], test)
end for
end for
printf(1, "find_sorted: %.2f\n", time()-t)
t = time()
for i = 1 to ITER do
for j = 1 to length(test) do
x = find(test[j], test)
end for
end for
printf(1, "find: %.2f\n", time()-t)
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
5. Re: find function
Gabriel Boehme writes:
> I seem to recall that using the built-in find() on
> the *unsorted* list of strings produced much faster
> numbers than the built-in find() on the sorted list.
> My theory (at the time) was that the internal address
> values (or whatever) get all shuffled around when
> you sort(), and are much faster to access sequentially
> in their original, unsorted state. Is this true?
I can't imagine that a big difference in speed
would be caused just by that.
> Also, are the stats you give us below in any way
> dependent on computer speed, or are they
> true regardless of what machine they're run on?
I'd expect the sequence-length where binary search
becomes faster than find() to be roughly the same
on all machines, although find() might suffer more
than find_sorted() if the on-chip data cache could not
hold the whole sequence.
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
6. Re: find function
- Posted by Joe Otto <jotto at NETZERO.NET>
Oct 20, 1999
-
Last edited Oct 21, 1999
Mike,
Here's a set of routines that I use. They assume that the s is an
ascending sequence of unique elements. I haven't gotten around to
optimizing them yet, but you're welcome to use them if you want.
Joe
-- Searches for object x in sorted ascending sequence s
-- Returns the position of x if found or the insertion point for x if not
found
function binarySearch (object x, sequence s)
integer hi, lo, mid, res
hi = length (s)
if hi = 0 then return 1 end if
lo = 1
while lo <= hi do
mid = floor ((lo + hi) / 2)
res = compare (x, s [mid])
if res < 0 then hi = mid - 1
elsif res > 0 then lo = mid + 1
else return mid
end if
end while
return mid + (lo > mid)
end function
-- Use binarySearch () to locate x
global function findSorted (object x, sequence s)
integer res
res = binarySearch (x, s)
if res > length (s) or (not equal (s [res], x)) then
res = 0
end if
return res
end function
-- Inserts object x into sorted ascending sequence s
global function insertSorted (sequence s, object x)
integer res
res = binarySearch (x, s)
if res > length (s) then
return append (s, x)
end if
if equal (s [res], x) then
return s
end if
return s [1 .. res - 1] & prepend (s [res .. length (s)], x)
end function
-----Original Message-----
From: Mike [SMTP:vulcan at WIN.CO.NZ]
Sent: Monday, October 18, 1999 2:29 PM
Subject: find function
Below is a function that can be used in place of Euphoria's find() IF the
list to be searched is sorted ( in ascending order).
I have written it to manage a word list for my windows editor ( which is
progressing slowly
) but I thought that some may find it useful and
maybe someone could optimize it especially for a small list where find()
currently has the advantage. For any list the search time for almost all
elements is fairly constant regardless of where they are located. In the
case of a massive list where the element to be located is NOT near the
start
of the list then this function really shows it's colours. I did have a look
at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose
and
may be be less "clean" to implement. I don't know if the name is
appropriate
but there is a 'integer divide by 2' part in the code so I thought "What
the
heck, why not!".
Any feedback is appreciated.
Yours Truly
Mike
vulcan at win.co.nz
global function binary_find(object element, sequence list)
-- find element in list, same as find()
integer start, finish, low, high, range, jump, d
start=1
finish=length(list)
low=start
high=finish
while 1 do
range=high - low
if range < 2 then
d = find(element, list[low..high])
if d then d += low - 1 end if
return d
else
jump=floor(range/2)
d=compare(element, list[low + jump])
if d=1 then -- search higher
low += jump
if low > finish then
return 0
end if
elsif d=-1 then -- search lower
high=low + jump
else -- a hit!
return low + jump
end if
end if
end while
return 0
end function
__________________________________________
NetZero - Defenders of the Free World
Get your FREE Internet Access and Email at
http://www.netzero.net/download/index.html
7. Re: find function
Thanks to Gabriel & Rob & Joe for replying. I can see now that my original
search
routine was a little bit inefficient. I did some time tests on two
components that make up the function and I would like to share some
interesting (Well, I thought they were!) findings.
With the while loop it seems that..
while 1 do
if <condition> then exit end if
..some code
end while
..is faster than..
while <condition> do
..some code
end while
This is useful to know because a coder can adopt a fixed format for a while
loop AND have the flexibility to emulate a 'do while' loop. All this safe in
the knowledge of having optimal speed.
The other thing was this:
mid=low+high
mid=floor(mid/2)
..is significantly faster than..
mid=floor((low+high)/2)
Probably in the binary search function there may be little or no noticeable
difference because the number of times the function loops is very small.
However...including both optimizations yields the result as below.
Yours Truly
Mike
vulcan at win.co.nz
global function find_sorted(object x, sequence s)
-- does a binary search on a sorted sequence
-- returns index location if found, 0 if not
-- assumes that sequence s has been sorted in ascending order prior to this
call
integer low, high, mid, c
low = 1
high = length(s)
while 1 do
mid=low+high
mid=floor(mid/2)
c = compare(x, s[mid])
if c < 0 then -- search lower
high = mid - 1
elsif c > 0 then -- search higher
low = mid + 1
else -- found!
return mid
end if
if low > high then exit end if
end while
return 0
end function
>