1. match() speed test


I wrote a short test to determine the fastest way to locate substrings in a huge file. The results are unexpected. The code is at http://openeuphoria.org/pastey/175.wc , typical results are at http://openeuphoria.org/pastey/176.wc . Often, time2 is a clean sweep winner. I alternated the test order.

It's unexpected that lookng for the longer term is faster than the shorter term.

Looking for "eleven" was faster even if "x" was before "eleven", and even if i put an "e" in every needle (to attempt tripping up the search for "eleven" (the "e" would hit on every needle)).

How can this be? Is my test defective?

All that said, the speeds are very impressive: average is 260 nanoseconds per each needle!

useless

new topic     » topic index » view message » categorize

2. Re: match() speed test

useless_ said...

Looking for "eleven" was faster even if "x" was before "eleven", and even if i put an "e" in every needle (to attempt tripping up the search for "eleven" (the "e" would hit on every needle)).

How can this be? Is my test defective?

I think that the answer is that with a longer needle, match() stops looking sooner, since there's no point in looking past the last element where the needle could possibly start. Since most of your tests have match() looking at non-matches, the longer needle does many fewer comparisons.

Try shorter haystacks for test1, and you should see the longer needle taking longer to find.

Matt

new topic     » goto parent     » topic index » view message » categorize

3. Re: match() speed test

mattlewis said...
useless_ said...

Looking for "eleven" was faster even if "x" was before "eleven", and even if i put an "e" in every needle (to attempt tripping up the search for "eleven" (the "e" would hit on every needle)).

How can this be? Is my test defective?

I think that the answer is that with a longer needle, match() stops looking sooner, since there's no point in looking past the last element where the needle could possibly start. Since most of your tests have match() looking at non-matches, the longer needle does many fewer comparisons.

Try shorter haystacks for test1, and you should see the longer needle taking longer to find.

Matt


So match does a length test also?

useless

new topic     » goto parent     » topic index » view message » categorize

4. Re: match() speed test


Ah, it must be. I changed "ppppp" (5 chars long) to "eppppp" (same len as "eleven") on every needle, causing the match for "eleven" to take a speed hit.

useless

new topic     » goto parent     » topic index » view message » categorize

5. Re: match() speed test

useless_ said...
mattlewis said...
useless_ said...

Looking for "eleven" was faster even if "x" was before "eleven", and even if i put an "e" in every needle (to attempt tripping up the search for "eleven" (the "e" would hit on every needle)).

How can this be? Is my test defective?

I think that the answer is that with a longer needle, match() stops looking sooner, since there's no point in looking past the last element where the needle could possibly start. Since most of your tests have match() looking at non-matches, the longer needle does many fewer comparisons.

Try shorter haystacks for test1, and you should see the longer needle taking longer to find.

So match does a length test also?

Sort of. It uses a while loop, but stops the loop at the length of the sequence plus one minus the length, since there's no point in looking for the needle to start if there aren't enough elements left from where it's looking.

Matt

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu