1. routine_ids and benchmarking

I have a question. I've been running a benchmark program which relies upon
the use of routine_ids. I have recently discovered that close results can be
swayed significantly, simply by moving the test routines to different
physical locations in the program. I have even managed to obtain results
*consistently* showing one routine to be faster than another, when both
routines are identical!

I do not believe I have seen anything about this mentioned either here on
the list, or in the official Euphoria documentation. Is this truly a
"feature" of Euphoria, or is my computer just playing with my mind?


Thanks in advance,
   Gabriel Boehme


 ------
A dead thing can go with the stream, but only a living thing can go against
it.

G.K. Chesterton
 ------

new topic     » topic index » view message » categorize

2. Re: routine_ids and benchmarking

Gabriel Boehme writes:
> I have recently discovered that close results can be
> swayed significantly, simply by moving the test routines
> to different physical locations in the program...
> ...Is this truly a "feature" of Euphoria, or is my computer just
> playing with my mind?

The short answer is: Your computer is playing with your mind.

Modern CPU's depend heavily on caching. The gap in speed
between the CPU chip with its on-chip cache, and the main
DRAM memory, has been widening steadily
for many years. (Most machines also have a secondary cache
between the on-chip cache and DRAM) . A cache
"miss" causes a severe penalty.

Generally speaking, the most recently accessed data
is kept in cache, while older data is kicked out. In some
cases however, recent data may be kicked out if the
address of that data conflicts with the addresses of other
data in the cache. For example, on a Pentium
there is a constraint that allows a maximum of 2 addresses with
the same 2nd and 3rd-last hex digits in the address (+++++xx+).
When a third such address is fetched, one of the existing two
must be kicked out, even if it was accessed recently. The
(original) Pentium data cache is said to be "2-way set associative".
(I believe the Pentium II and III data cache is 4-way set associative)

If I've lost you, don't worry, the point is that by varying the
address of Euphoria code and data in a trivial way, you
can sometimes cause a significant change in the speed of the code.
This is true for any language running on a Pentium. A 486 is
less susceptible to this, but has other *sensitivities* to code
alignment etc.

In one extreme case that I checked out very carefully, a small for-loop
in a subroutine started running *6* times slower when I declared
a new variable in an *unrelated* routine. I couldn't believe it at first,
but after a day or so of machine-level debugging I managed to
clearly show that there were 3 variables in the for-loop
competing for two available slots in the cache. As each
variable was fetched, it bumped one of the others out of cache,
so that none of the 3 variables was *ever* in cache when it was
needed. Almost *any* trivial change to the program, prior to the
for-loop, would make it run fast again.

Regards,
     Rob Craig
     Rapid Deployment Software
     http://members.aol.com/FilesEu/

new topic     » goto parent     » topic index » view message » categorize

3. Re: routine_ids and benchmarking

Rob, thanks for the enlightenment. Art

At 09:27 PM 4/21/99 -0400, you wrote:

>Modern CPU's depend heavily on caching. The gap in speed
>between the CPU chip with its on-chip cache, and the main
>DRAM memory, has been widening steadily
>for many years. (Most machines also have a secondary cache
>between the on-chip cache and DRAM) . A cache
>"miss" causes a severe penalty.

Art Adamson, The Cincinnati Engine Man, permanent address euclid2 at email.com

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu