1. Which is faster ?

On a long sequence of items which is faster using the
find() function or searching with a for loop ???

new topic     » topic index » view message » categorize

2. Re: Which is faster ?

bernie said...

On a long sequence of items which is faster using the
find() function or searching with a for loop ???

find() should always be faster than a manual loop. In fact, find() is even faster in 4.0 than it was in 3.1. If the list gets too long, however, other methods may be able to beat it, such as a binary search (of which there is an implementation in the 4.0 std library). But of course, you have to make sure that you sort your data first.

Matt

new topic     » goto parent     » topic index » view message » categorize

3. Re: Which is faster ?

mattlewis said...
bernie said...

On a long sequence of items which is faster using the
find() function or searching with a for loop ???

find() should always be faster than a manual loop. In fact, find() is even faster in 4.0 than it was in 3.1. If the list gets too long, however, other methods may be able to beat it, such as a binary search (of which there is an implementation in the 4.0 std library). But of course, you have to make sure that you sort your data first.

Matt

Just a bit more clarification, find() is faster for long sequences. If the sequence is, say less than 10-15 items, then a loop can be faster. Also, the type of data in the sequence can influence the speed too. Generally, comparing integers (using find() or '=') is faster than comparing atoms, which is in turn fast than comparing sequences.

However, following Matt's advice, if you know something about the contents of the sequence before seeking, you can often come up with a smarter way of searching it.

  • A sorted sequence would benefit from a binary search.
  • Or if there are mostly only a few items that are found much more than other items, arrange to have them at the front of the sequence.
  • Or if the sequence is fairly static (doesn't change much), you might consider a hash map instead.
  • Maybe you can arrange the sequence to contain a set of sub-sequences based of the length of the item. Such that all 1-element items are in the first sub-sequence, 2-element items in the second sub-sequence etc.... Then when doing a search, you can quick work out which sub-sequence your item is contained it.
new topic     » goto parent     » topic index » view message » categorize

4. Re: Which is faster ?

DerekParnell said...

Just a bit more clarification, find() is faster for long sequences. If the sequence is, say less than 10-15 items, then a loop can be faster.

Do you have any benchmarks that demonstrate this? I ran the following on r1380, and find() always won. I changed the length of the sequence, changed the values to be all integers (and used = instead of equal()), etc. Actually, the loop won one time, when s = {""}. There's just a lot more overhead in a euphoria loop than the native C loop in the backend find function (not to mention your optimizations for non-integer data).

 
sequence s = { 1,"2",3,4.4,5,6,7} 
constant REPS = 1000000 
 
atom t = time() 
for i = 1 to REPS do 
	for j = 1 to length(s) do 
		object x = s[j] 
		integer ix = find( j, s ) 
	end for 
end for 
atom find_time = time() - t 
 
t = time() 
for i = 1 to REPS do 
	for j = 1 to length(s) do 
		for k = 1 to length(s) do 
			object x = s[j] 
			if equal( s[k], j ) then 
				exit 
			end if 
		end for 
	end for 
end for 
atom loop_time = time() - t 
 
printf(1, "%d repetitions:\nfind: %1.3f\nloop: %1.3f\n", { REPS, find_time, loop_time }) 

Matt

new topic     » goto parent     » topic index » view message » categorize

5. Re: Which is faster ?

mattlewis said...
DerekParnell said...

Just a bit more clarification, find() is faster for long sequences. If the sequence is, say less than 10-15 items, then a loop can be faster.

Do you have any benchmarks that demonstrate this?

No. Actually I mis-remembered. What I was thinking about was the difference between find() and map() - there is a point at which map() becomes faster than find(). But you are correct, find() is always faster than looping if you have an unordered sequence to scan through.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Which is faster ?

DerekParnell said...

No. Actually I mis-remembered. What I was thinking about was the difference between find() and map() - there is a point at which map() becomes faster than find(). But you are correct, find() is always faster than looping if you have an unordered sequence to scan through.

Ok. That I believe. smile

Matt

new topic     » goto parent     » topic index » view message » categorize

7. Re: Which is faster ?

mattlewis said...
DerekParnell said...

No. Actually I mis-remembered. What I was thinking about was the difference between find() and map() - there is a point at which map() becomes faster than find(). But you are correct, find() is always faster than looping if you have an unordered sequence to scan through.

Ok. That I believe. smile

Matt

Thanks Derek and Matt for the information.
I gave up using maps because i kept running
into problems that I could not trace the origin of
and the maps seemed to be slower because of the overhead of calling them.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu