1. find function
- Posted by Mike <vulcan at WIN.CO.NZ> Oct 19, 1999
- 483 views
Below is a function that can be used in place of Euphoria's find() IF the list to be searched is sorted ( in ascending order). I have written it to manage a word list for my windows editor ( which is progressing slowly ) but I thought that some may find it useful and maybe someone could optimize it especially for a small list where find() currently has the advantage. For any list the search time for almost all elements is fairly constant regardless of where they are located. In the case of a massive list where the element to be located is NOT near the start of the list then this function really shows it's colours. I did have a look at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose and may be be less "clean" to implement. I don't know if the name is appropriate but there is a 'integer divide by 2' part in the code so I thought "What the heck, why not!". Any feedback is appreciated. Yours Truly Mike vulcan at win.co.nz global function binary_find(object element, sequence list) -- find element in list, same as find() integer start, finish, low, high, range, jump, d start=1 finish=length(list) low=start high=finish while 1 do range=high - low if range < 2 then d = find(element, list[low..high]) if d then d += low - 1 end if return d else jump=floor(range/2) d=compare(element, list[low + jump]) if d=1 then -- search higher low += jump if low > finish then return 0 end if elsif d=-1 then -- search lower high=low + jump else -- a hit! return low + jump end if end if end while return 0 end function
2. Re: find function
- Posted by "Boehme, Gabriel" <gboehme at POSTOFFICE.MUSICLAND.COM> Oct 18, 1999
- 464 views
- Last edited Oct 19, 1999
Mike, Don't have the time to benchmark, but here's the binary search routine I always use. I believe I got the basic idea from a QuickBasic textbook, although the Euphoria implementation is my own. Enjoy! Gabriel global function find_sorted(object x, sequence s) -- does a binary search on a sorted sequence -- returns index location if found, 0 if not -- assumes that sequence s has been sorted prior to this call integer lo, hi, mid, c lo = 1 hi = length(s) while lo <= hi do mid = floor((lo + hi) / 2) c = compare(x, s[mid]) if c < 0 then -- x < s[mid] hi = mid - 1 elsif c > 0 then -- x > s[mid] lo = mid + 1 else -- x = s[mid] return mid end if end while return 0 end function -----Original Message----- From: Mike [mailto:vulcan at WIN.CO.NZ] Sent: Monday, October 18, 1999 2:29 PM To: EUPHORIA at LISTSERV.MUOHIO.EDU Subject: find function Below is a function that can be used in place of Euphoria's find() IF the list to be searched is sorted ( in ascending order). I have written it to manage a word list for my windows editor ( which is progressing slowly ) but I thought that some may find it useful and maybe someone could optimize it especially for a small list where find() currently has the advantage. For any list the search time for almost all elements is fairly constant regardless of where they are located. In the case of a massive list where the element to be located is NOT near the start of the list then this function really shows it's colours. I did have a look at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose and may be be less "clean" to implement. I don't know if the name is appropriate but there is a 'integer divide by 2' part in the code so I thought "What the heck, why not!". Any feedback is appreciated. Yours Truly Mike vulcan at win.co.nz global function binary_find(object element, sequence list) -- find element in list, same as find() integer start, finish, low, high, range, jump, d start=1 finish=length(list) low=start high=finish while 1 do range=high - low if range < 2 then d = find(element, list[low..high]) if d then d += low - 1 end if return d else jump=floor(range/2) d=compare(element, list[low + jump]) if d=1 then -- search higher low += jump if low > finish then return 0 end if elsif d=-1 then -- search lower high=low + jump else -- a hit! return low + jump end if end if end while return 0 end function
3. Re: find function
- Posted by Robert Craig <rds at ATTCANADA.NET> Oct 19, 1999
- 464 views
I was curious to know at what point it becomes better to use a binary search instead of the built-in find() function. I used Gabriel Boehme's version and compared using, in one case, a sequence of integers, and in another case, a sequence of strings. It seems that for sorted sequences of up to 530 integers, it's better to use the built-in find(). For sorted sequences of up to 35 strings, the built-in find() also wins. I look up each element in the sequence once. comparing strings takes longer than comparing integers, and a binary search needs fewer compares. Below, all look-ups are successful. I didn't test the case where a lot of look-ups fail. The regular find() would not do quite as well in that kind of test. global function find_sorted(object x, sequence s) -- does a binary search on a sorted sequence -- returns index location if found, 0 if not -- assumes that sequence s has been sorted prior to this call integer lo, hi, mid, c lo = 1 hi = length(s) while lo <= hi do mid = floor((lo + hi) / 2) c = compare(x, s[mid]) if c < 0 then -- x < s[mid] hi = mid - 1 elsif c > 0 then -- x > s[mid] lo = mid + 1 else -- x = s[mid] return mid end if end while return 0 end function include sort.e -- constant N = 530 -- find is faster up to 530 integers constant N = 35 -- find is faster up to 35 strings constant ITER = 5000 sequence test -- integers up to 1000: -- test = sort(rand(repeat(1000, N))) -- 5-character strings: test = sort(rand(repeat({255,255,255,255,255}, N))) object x atom t t = time() for i = 1 to ITER do for j = 1 to length(test) do x = find_sorted(test[j], test) end for end for printf(1, "find_sorted: %.2f\n", time()-t) t = time() for i = 1 to ITER do for j = 1 to length(test) do x = find(test[j], test) end for end for printf(1, "find: %.2f\n", time()-t) Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
4. Re: find function
- Posted by "Boehme, Gabriel" <gboehme at POSTOFFICE.MUSICLAND.COM> Oct 19, 1999
- 453 views
Rob, Cool stats! Your curiosity on this point helped me to remember some benchmarking *I* did a couple years ago on this very same thing. I seem to recall that using the built-in find() on the *unsorted* list of strings produced much faster numbers than the built-in find() on the sorted list. My theory (at the time) was that the internal address values (or whatever) get all shuffled around when you sort(), and are much faster to access sequentially in their original, unsorted state. Is this true? Also, are the stats you give us below in any way dependent on computer speed, or are they true regardless of what machine they're run on? Thanks, Gabriel -----Original Message----- From: Robert Craig [mailto:rds at ATTCANADA.NET] Sent: Monday, October 18, 1999 11:23 PM To: EUPHORIA at LISTSERV.MUOHIO.EDU Subject: Re: find function I was curious to know at what point it becomes better to use a binary search instead of the built-in find() function. I used Gabriel Boehme's version and compared using, in one case, a sequence of integers, and in another case, a sequence of strings. It seems that for sorted sequences of up to 530 integers, it's better to use the built-in find(). For sorted sequences of up to 35 strings, the built-in find() also wins. I look up each element in the sequence once. comparing strings takes longer than comparing integers, and a binary search needs fewer compares. Below, all look-ups are successful. I didn't test the case where a lot of look-ups fail. The regular find() would not do quite as well in that kind of test. global function find_sorted(object x, sequence s) -- does a binary search on a sorted sequence -- returns index location if found, 0 if not -- assumes that sequence s has been sorted prior to this call integer lo, hi, mid, c lo = 1 hi = length(s) while lo <= hi do mid = floor((lo + hi) / 2) c = compare(x, s[mid]) if c < 0 then -- x < s[mid] hi = mid - 1 elsif c > 0 then -- x > s[mid] lo = mid + 1 else -- x = s[mid] return mid end if end while return 0 end function include sort.e -- constant N = 530 -- find is faster up to 530 integers constant N = 35 -- find is faster up to 35 strings constant ITER = 5000 sequence test -- integers up to 1000: -- test = sort(rand(repeat(1000, N))) -- 5-character strings: test = sort(rand(repeat({255,255,255,255,255}, N))) object x atom t t = time() for i = 1 to ITER do for j = 1 to length(test) do x = find_sorted(test[j], test) end for end for printf(1, "find_sorted: %.2f\n", time()-t) t = time() for i = 1 to ITER do for j = 1 to length(test) do x = find(test[j], test) end for end for printf(1, "find: %.2f\n", time()-t) Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
5. Re: find function
- Posted by Robert Craig <rds at ATTCANADA.NET> Oct 20, 1999
- 464 views
Gabriel Boehme writes: > I seem to recall that using the built-in find() on > the *unsorted* list of strings produced much faster > numbers than the built-in find() on the sorted list. > My theory (at the time) was that the internal address > values (or whatever) get all shuffled around when > you sort(), and are much faster to access sequentially > in their original, unsorted state. Is this true? I can't imagine that a big difference in speed would be caused just by that. > Also, are the stats you give us below in any way > dependent on computer speed, or are they > true regardless of what machine they're run on? I'd expect the sequence-length where binary search becomes faster than find() to be roughly the same on all machines, although find() might suffer more than find_sorted() if the on-chip data cache could not hold the whole sequence. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
6. Re: find function
- Posted by Joe Otto <jotto at NETZERO.NET> Oct 20, 1999
- 538 views
- Last edited Oct 21, 1999
Mike, Here's a set of routines that I use. They assume that the s is an ascending sequence of unique elements. I haven't gotten around to optimizing them yet, but you're welcome to use them if you want. Joe -- Searches for object x in sorted ascending sequence s -- Returns the position of x if found or the insertion point for x if not found function binarySearch (object x, sequence s) integer hi, lo, mid, res hi = length (s) if hi = 0 then return 1 end if lo = 1 while lo <= hi do mid = floor ((lo + hi) / 2) res = compare (x, s [mid]) if res < 0 then hi = mid - 1 elsif res > 0 then lo = mid + 1 else return mid end if end while return mid + (lo > mid) end function -- Use binarySearch () to locate x global function findSorted (object x, sequence s) integer res res = binarySearch (x, s) if res > length (s) or (not equal (s [res], x)) then res = 0 end if return res end function -- Inserts object x into sorted ascending sequence s global function insertSorted (sequence s, object x) integer res res = binarySearch (x, s) if res > length (s) then return append (s, x) end if if equal (s [res], x) then return s end if return s [1 .. res - 1] & prepend (s [res .. length (s)], x) end function -----Original Message----- From: Mike [SMTP:vulcan at WIN.CO.NZ] Sent: Monday, October 18, 1999 2:29 PM Subject: find function Below is a function that can be used in place of Euphoria's find() IF the list to be searched is sorted ( in ascending order). I have written it to manage a word list for my windows editor ( which is progressing slowly ) but I thought that some may find it useful and maybe someone could optimize it especially for a small list where find() currently has the advantage. For any list the search time for almost all elements is fairly constant regardless of where they are located. In the case of a massive list where the element to be located is NOT near the start of the list then this function really shows it's colours. I did have a look at hash.ex and tree.ex in euphoria\demo but these do not suit my purpose and may be be less "clean" to implement. I don't know if the name is appropriate but there is a 'integer divide by 2' part in the code so I thought "What the heck, why not!". Any feedback is appreciated. Yours Truly Mike vulcan at win.co.nz global function binary_find(object element, sequence list) -- find element in list, same as find() integer start, finish, low, high, range, jump, d start=1 finish=length(list) low=start high=finish while 1 do range=high - low if range < 2 then d = find(element, list[low..high]) if d then d += low - 1 end if return d else jump=floor(range/2) d=compare(element, list[low + jump]) if d=1 then -- search higher low += jump if low > finish then return 0 end if elsif d=-1 then -- search lower high=low + jump else -- a hit! return low + jump end if end if end while return 0 end function __________________________________________ NetZero - Defenders of the Free World Get your FREE Internet Access and Email at http://www.netzero.net/download/index.html
7. Re: find function
- Posted by Mike <vulcan at WIN.CO.NZ> Oct 24, 1999
- 486 views
Thanks to Gabriel & Rob & Joe for replying. I can see now that my original search routine was a little bit inefficient. I did some time tests on two components that make up the function and I would like to share some interesting (Well, I thought they were!) findings. With the while loop it seems that.. while 1 do if <condition> then exit end if ..some code end while ..is faster than.. while <condition> do ..some code end while This is useful to know because a coder can adopt a fixed format for a while loop AND have the flexibility to emulate a 'do while' loop. All this safe in the knowledge of having optimal speed. The other thing was this: mid=low+high mid=floor(mid/2) ..is significantly faster than.. mid=floor((low+high)/2) Probably in the binary search function there may be little or no noticeable difference because the number of times the function loops is very small. However...including both optimizations yields the result as below. Yours Truly Mike vulcan at win.co.nz global function find_sorted(object x, sequence s) -- does a binary search on a sorted sequence -- returns index location if found, 0 if not -- assumes that sequence s has been sorted in ascending order prior to this call integer low, high, mid, c low = 1 high = length(s) while 1 do mid=low+high mid=floor(mid/2) c = compare(x, s[mid]) if c < 0 then -- search lower high = mid - 1 elsif c > 0 then -- search higher low = mid + 1 else -- found! return mid end if if low > high then exit end if end while return 0 end function >