1. Interesting benchmark results - Euphoria vs. Euphoria

I wrote an integer intensive benchmark program (actually based on a rosetta code entry). I've been using Euphoria 4.04 for a while, and was impressed with how fast it was on the benchmark, especially compared to some other interpreters. But then someone challenged me, claiming that it wasn't the speed-demon I claimed, e.g., not quite as fast as Ruby, and only a hair faster than Lua.

I found out that we were using different versions. So I downloaded the latest, and sure enough, it is a good bit slower. Below is the benchmark code, and below that are the results with 3.1, 4.04, 4.05, and 4.1 beta. The Euphoria interpreter has gone from taking 12 seconds in 3.1 to 64 seconds in 4.1 to run the benchmark code. Not sure if anything can be done, but it is interesting to say the least.

without type_check 
 
integer left_edge, right_edge, top_edge, bottom_edge, max_iter, x_step, 
    y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char, accum, count 
atom t 
 
t = time() 
 
accum = 0 
count = 0 
while count < 1545 do 
    left_edge   = -420 
    right_edge  =  300 
    top_edge    =  300 
    bottom_edge = -300 
    x_step      =  7 
    y_step      =  15 
 
    max_iter    =  200 
 
    y0 = top_edge 
    while y0 > bottom_edge do 
        x0 = left_edge 
        while x0 < right_edge do 
            y = 0 
            x = 0 
            the_char = 32 
            x_x = 0 
            y_y = 0 
            i = 0 
            while i < max_iter and x_x + y_y <= 800 do 
                x_x = floor((x * x) / 200) 
                y_y = floor((y * y) / 200) 
                if x_x + y_y > 800 then 
                    the_char = 48 + i 
                    if i > 9 then 
                        the_char = 64 
                    end if 
                else 
                    temp = x_x - y_y + x0 
                    if (x < 0 and y > 0) or (x > 0 and y < 0) then 
                        y = (-1 * floor((-1 * x * y) / 100)) + y0 
                    else 
                        y = floor(x * y / 100) + y0 
                    end if 
                    x = temp 
                end if 
 
                i = i + 1 
            end while 
            accum = accum + the_char 
 
            x0 = x0 + x_step 
        end while 
 
        y0 = y0 - y_step 
    end while 
 
    if remainder(count, 300) = 0 then 
        printf(1, "%d\n", accum) 
    end if 
 
    count = count + 1 
end while 
 
printf(1, "%d\n\nCompleted in %d seconds\n\n", {accum, time() - t}) 

http://semware.com/images/mandel-eu-all.png

new topic     » topic index » view message » categorize

2. Re: Interesting benchmark results - Euphoria vs. Euphoria

ed_davis said...

... The Euphoria interpreter has gone from taking 12 seconds in 3.1 to 64 seconds in 4.1 to run the benchmark code. Not sure if anything can be done ...

The primary culprit has been the introduction of checking of GLOBAL variables and "better" dereferencing of temporary results.

As a general rules-of-thumb ...

  • Never use variables that are declared outside of a routine.
  • Explicitly use your own temporary result variable. This is not a big winner but may help sometimes, especially inside deeply nested loops.
  • Avoid combinations of 'and' and 'or' in a loop exit condition. This is one area that Euphoria needs some optimising work done.
  • Use the X op= Y operators instead of X = X op Y where possible

You can use the dis utility to view the generated Intermediate Language code generated by Euphoria's parser. This is the actual code that will be executed.

Here is your code with these suggestions included. On my system I got it down from 51 seconds to 21 seconds.

without type_check  
 
procedure main() -- Avoiding variables declared outside of routines. 
integer left_edge, right_edge, top_edge, bottom_edge, max_iter, x_step,  
    y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char, accum, count  
 
atom t1    -- Explicit temporary used 
integer t2 -- Explicit temporary used 
   
atom t  
  
t = time()  
  
accum = 0  
count = 0  
while count < 1545 do  
    left_edge   = -420  
    right_edge  =  300  
    top_edge    =  300  
    bottom_edge = -300  
    x_step      =  7  
    y_step      =  15  
  
    max_iter    =  200  
  
    y0 = top_edge  
    while 1 do 
        if y0 <= bottom_edge then exit end if 
        x0 = left_edge  
        while 1 do 
            if x0 >= right_edge then exit end if  
            y = 0  
            x = 0  
            the_char = 32  
            x_x = 0  
            y_y = 0  
            i = 0 
            t1 = 0 
            while 1 do 
                if i >= max_iter then exit end if 
                if t1 > 800 then exit end if  
            	t1 = x * x 
                x_x = floor(t1 / 200)  
                t1 = y * y 
                y_y = floor(t1 / 200)  
                t1 = x_x + y_y 
                if t1 > 800 then  
                    the_char = 48 + i  
                    if i > 9 then  
                        the_char = 64  
                    end if  
                else  
                    temp = x_x - y_y 
                    temp = temp + x0  
                    t2 = x * y 
                    --if (x < 0 and y > 0) or (x > 0 and y < 0) then  
                    if t2 < 0 then 
                        t2 *= -1 
                        t1 = floor(t2 / 100) 
                        t1 *= -1 
                        y = t1 + y0 
                        --y = (-1 * floor((-1 * x * y) / 100)) + y0  
                    else  
                    	--t1 = x * y 
                    	t1 = floor(t2 / 100) 
                    	y = t1 + y0 
                        --y = floor(x * y / 100) + y0  
                    end if  
                    x = temp  
                end if  
  
                i = i + 1 
                t1 = x_x + y_y   
            end while  
            accum += the_char  
  
            x0 += x_step  
        end while  
  
        y0 -= y_step  
    end while  
  
    if remainder(count, 300) = 0 then  
        printf(1, "%d\n", accum)  
    end if  
  
    count += 1  
end while  
  
printf(1, "%d\n\nCompleted in %d seconds\n\n", {accum, time() - t})  
end procedure 
main() 
new topic     » goto parent     » topic index » view message » categorize

3. Re: Interesting benchmark results - Euphoria vs. Euphoria

Hi Ed,

I finally finished my virtual machine interpreter, started back in December when you published that primes benchmark. I now have it working on the primes and this new benchmark. The interpreter is written in C and is register-based (instead of stack-based), and it goes really fast. The parser and bytecode compiler is written in Euphoria and it supports only a small subset of Euphoria statements, mainly integer arithmetic and loops. If you don't mind, could you benchmark it on your machine and let me know how it stacks up? The source code is here: https://github.com/peberlein/interpreter

new topic     » goto parent     » topic index » view message » categorize

4. Re: Interesting benchmark results - Euphoria vs. Euphoria

PeteE said...

Hi Ed,

I finally finished my virtual machine interpreter, started back in December when you published that primes benchmark. I now have it working on the primes and this new benchmark. The interpreter is written in C and is register-based (instead of stack-based), and it goes really fast. The parser and bytecode compiler is written in Euphoria and it supports only a small subset of Euphoria statements, mainly integer arithmetic and loops. If you don't mind, could you benchmark it on your machine and let me know how it stacks up? The source code is here: https://github.com/peberlein/interpreter

Hello Pete!

It has been a long time since PEU! I still have peu08 from August 1998. smile

Your new interpreter looks really promising. But (sorry about that), I had a problem with while loops. I could not get bench.ex or primes.ex to run - they both crashed. I traced it down to the while loop. For instance, this appears to work:

integer pc, k, p, lim, n 
 
lim = 5000000 
n = 1 
pc = 0 
 
--while n < lim do 
    k = 3 
    p = 1 
    n = n + 2 
--    while k * k < n and p do 
        p = floor(n / k) * k != n 
        k = k + 2 
--    end while 
    if p then 
        pc = pc + 1 
    end if 
--end while 
 
? pc 

But this crashes:

integer i 
i = 0 
while i < 10 do 
    i = i + 1 
    ? i 
end while 

The above generated:

00000101	0: load   r1, 0 
000A0201	1: load   r2, 10 
0301020A	2: jle    r2, r1, 3 (6) 
01010105	3: addu8  r1, r1, 1 
00000114	4: qprint r1 
FFFB000D	5: jmp    -5 (1) 
00000000	6: end 

I'm running Windows 7. I'm using: gcc (tdm64-2) 4.8.1

I compiled interpret.c with:

-m32 -O3 -march=native -mtune=native

I also tried it with:

-m64 -O3 -march=native -mtune=native

I also tried it without any options, save forcing 32-bit/64-bit compilation:

-m32

and

-m64

Let me know if there is something you'd like me try.

new topic     » goto parent     » topic index » view message » categorize

5. Re: Interesting benchmark results - Euphoria vs. Euphoria

Hi Ed, sorry about that - I only tested Linux and Mac. I'm installing MinGW on Win 7 now...

Edit: Fixed now and pushed to github. Needed "strtoul" instead of "strtol" when reading the bytecodes.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Interesting benchmark results - Euphoria vs. Euphoria

ed_davis said...

The Euphoria interpreter has gone from taking 12 seconds in 3.1 to 64 seconds in 4.1 to run the benchmark code. Not sure if anything can be done, but it is interesting to say the least.

64 seconds! Euphoria seems to be losing the edge of speed. Is there any option in the v4.1 interpreter that could allow for better speed? Anything that could bring back the old speed, at the cost of having a limited feature set?

new topic     » goto parent     » topic index » view message » categorize

7. Re: Interesting benchmark results - Euphoria vs. Euphoria

I get (running each twice):

Interpreter Time
exu (3.1) 23, 20
eui (4.0.5) 19, 19
eui (4.1, 32-bits) 25, 25
eui (4.1, 64-bits) 29, 30

My results don't show as dramatic a difference, and in fact, 4.0.5 was the best.

I haven't dug into the details, but Derek is right, the difference is really the way we handle temporary values. This was really a consequence of being able to have things get cleaned up (memory freed, files closed, etc) when a value loses all reference counts.

We could probably do more work on figuring out when that's not necessary (e.g., when it's an integer).

OTOH, the translators are consistent:

Translator Time
ecu (3.1) 2, 2
eui (4.05) 2, 2
euc (4.1, 32-bits) 2, 2
euc (4.1, 64-bits) 2, 2

Matt

new topic     » goto parent     » topic index » view message » categorize

8. Re: Interesting benchmark results - Euphoria vs. Euphoria

PeteE said...

Hi Ed, sorry about that - I only tested Linux and Mac. I'm installing MinGW on Win 7 now...

Edit: Fixed now and pushed to github. Needed "strtoul" instead of "strtol" when reading the bytecodes.

Works much better now smile

Compiled for 32-bit under 64-bit Windows 7: 17.96 seconds

Compiled for 64-bit under 64-bit Windows 7: 14.26

And here is how it compares to some other language processors.

Processor Time Type From
C -O3 or -O2 1.00 Native
BCX 1.12 C bcxbasic.com
C -O or -O1 1.13 Native
Java 1.23 JIT
C# 1.34 JIT
Visual Basic .Net 1.52 JIT
Free Pascal -O3 1.92 Native freepascal.org
C, no options 1.96 Native
C -Os 2.38 Native
Borland C 2.80 Native
TinyC 3.23 Native bellard.org/tcc
Free Basic 3.36 Native freebasic.net
euc -gcc -con 3.58 C Euphoria to C (gcc)
Oxygen basic 3.77 Native oxygenbasic.org
Power Basic 4.03 Native powerbasic.com
euc -wat -con 6.52 C Euphoria to C (watcom)
Euphoria v3.1 12.45 VM rapideuphoria.com
PE 64-bit 14.26 VM github.com/peberlein/interpreter
PE 32-bit 17.96 VM github.com/peberlein/interpreter
Euphoria v4.04 24.36 VM openeuphoria.com
QB64 32.43 C++ qb64.net
Pike 44.39 VM pike.lysator.liu.se
calc3a 54.87 AST epaperpress.com/lexandyacc/
toy2.c 58.39 VM groups.yahoo.com/neo/groups/QDepartment/files/Ed-Davis-stuff
Tinypas.c 58.97 VM Pascal-S converted to C
Ruby 60.25 VM ruby-lang.org
Euphoria v4.1 beta 61.40 VM openeuphoria.com
new topic     » goto parent     » topic index » view message » categorize

9. Re: Interesting benchmark results - Euphoria vs. Euphoria

(Hmmm. I actually had about a dozen more entries, but for some reason Creol won't display them, so splitting into two messages. I wonder if there is a limit on the table size?)

Processor Time Type From
tiny-c.c 65.96 VM iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c
Lua 80.14 VM lua.org
jwillia basic 109.23 VM github.com/jwillia3/BASIC
hoc 140.37 VM netlib.bell-labs.com/~bwk/hoc.sh
NaaLaa 168.90 VM NaaLaa.com
CInt 201.01 VM root.cern.ch/drupal/content/cint
Python 274.79 VM python.org
Yabasic 278.77 VM yabasic.de
SpecBAS 358.74 VM sites.google.com/site/pauldunn
Chip Munk Basic 442.09 Interpreter nicholson.com/rhn/basic
BBCBasic 531.89 Interpreter bbcbasic.co.uk
Thin Basic 543.00 Interpreter thinbasic.com
CH 582.00 Interpreter softintegration.com
Scriba 618.00 Interpreter scriptbasic.org
SI 1020.73 Interpreter drdobbs.com/cpp/si-a-c-like-script-interpreter/184408141
LittleC 2428.74 Interpreter hbp.iconbar.com/download/microC.arc
PicoC 2478.18 Interpreter code.google.com/p/picoc
DDS5 3033.35 Interpreter ioccc.org/1990/dds.c

For the VM interpreters I tested, only Euphoria 3.1 was faster.

new topic     » goto parent     » topic index » view message » categorize

10. Re: Interesting benchmark results - Euphoria vs. Euphoria

mattlewis said...

I get (running each twice):

...

My results don't show as dramatic a difference, and in fact, 4.0.5 was the best.

Matt

Which benchmark program did you run, my original, or Derek's modified version?

I also get much better times when I run Derek's modified version - not quite as good/consistent as yours though. But I consider Derek's version cheating smile

new topic     » goto parent     » topic index » view message » categorize

11. Re: Interesting benchmark results - Euphoria vs. Euphoria

ed_davis said...

But I consider Derek's version cheating smile

Why? The algorithm is the same, isn't it? He just changed some of the syntax (A = A op B to A op= B) and moved some temporary variables around.

new topic     » goto parent     » topic index » view message » categorize

12. Re: Interesting benchmark results - Euphoria vs. Euphoria

jimcbrown said...
ed_davis said...

But I consider Derek's version cheating smile

Why? The algorithm is the same, isn't it? He just changed some of the syntax (A = A op B to A op= B) and moved some temporary variables around.

Yes, but I could do the same for many of the languages tested. Several of the simpler processors would greatly benefit from less complicated expressions. The goal was to have the same (as much as possible) code for each processor, without any programmer optimizations.

new topic     » goto parent     » topic index » view message » categorize

13. Re: Interesting benchmark results - Euphoria vs. Euphoria

ed_davis said...
jimcbrown said...
ed_davis said...

But I consider Derek's version cheating smile

Why? The algorithm is the same, isn't it? He just changed some of the syntax (A = A op B to A op= B) and moved some temporary variables around.

Yes, but I could do the same for many of the languages tested. Several of the simpler processors would greatly benefit from less complicated expressions. The goal was to have the same (as much as possible) code for each processor, without any programmer optimizations.

Fair enough.

new topic     » goto parent     » topic index » view message » categorize

14. Re: Interesting benchmark results - Euphoria vs. Euphoria

ed_davis said...
mattlewis said...

I get (running each twice):

...

My results don't show as dramatic a difference, and in fact, 4.0.5 was the best.

Matt

Which benchmark program did you run, my original, or Derek's modified version?

I also get much better times when I run Derek's modified version - not quite as good/consistent as yours though. But I consider Derek's version cheating smile

My times were with your code. Honestly, just putting it all in a main() function was the biggest difference. I did do some similar things as Derek's code in playing with it, and 4.0.5 and 3.1 were about the same (12-15s) and the 64 bit version of 4.1 was a little bit slower (~17s).

I think wrapping it in a main() function isn't much cheating, since many languages make you do that anyways. The top level variable checking is there due to the ability to forward reference and jump around in code. The interpreter doesn't / can't do the level of checking that the translator does, and in this case, the feature vastly outweighs the performance degradation.

The temp variable stuff is definitely something we need to work on, but again, it's a huge feature win, in my book.

Matt

new topic     » goto parent     » topic index » view message » categorize

15. Re: Interesting benchmark results - Euphoria vs. Euphoria

mattlewis said...

I think wrapping it in a main() function isn't much cheating, since many languages make you do that anyways.

I do this all the time, simply as a measure of proper practice. It sure would be nice if Euphoria ran a main() routine automatically. (wink wink, nudge nudge) blink

-Greg
Forked into: Automatic entry point preprocessor

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu