1. Interesting benchmark results - Euphoria vs. Euphoria
- Posted by ed_davis May 15, 2014
- 2791 views
I wrote an integer intensive benchmark program (actually based on a rosetta code entry). I've been using Euphoria 4.04 for a while, and was impressed with how fast it was on the benchmark, especially compared to some other interpreters. But then someone challenged me, claiming that it wasn't the speed-demon I claimed, e.g., not quite as fast as Ruby, and only a hair faster than Lua.
I found out that we were using different versions. So I downloaded the latest, and sure enough, it is a good bit slower. Below is the benchmark code, and below that are the results with 3.1, 4.04, 4.05, and 4.1 beta. The Euphoria interpreter has gone from taking 12 seconds in 3.1 to 64 seconds in 4.1 to run the benchmark code. Not sure if anything can be done, but it is interesting to say the least.
without type_check integer left_edge, right_edge, top_edge, bottom_edge, max_iter, x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char, accum, count atom t t = time() accum = 0 count = 0 while count < 1545 do left_edge = -420 right_edge = 300 top_edge = 300 bottom_edge = -300 x_step = 7 y_step = 15 max_iter = 200 y0 = top_edge while y0 > bottom_edge do x0 = left_edge while x0 < right_edge do y = 0 x = 0 the_char = 32 x_x = 0 y_y = 0 i = 0 while i < max_iter and x_x + y_y <= 800 do x_x = floor((x * x) / 200) y_y = floor((y * y) / 200) if x_x + y_y > 800 then the_char = 48 + i if i > 9 then the_char = 64 end if else temp = x_x - y_y + x0 if (x < 0 and y > 0) or (x > 0 and y < 0) then y = (-1 * floor((-1 * x * y) / 100)) + y0 else y = floor(x * y / 100) + y0 end if x = temp end if i = i + 1 end while accum = accum + the_char x0 = x0 + x_step end while y0 = y0 - y_step end while if remainder(count, 300) = 0 then printf(1, "%d\n", accum) end if count = count + 1 end while printf(1, "%d\n\nCompleted in %d seconds\n\n", {accum, time() - t})
2. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by DerekParnell (admin) May 15, 2014
- 2737 views
... The Euphoria interpreter has gone from taking 12 seconds in 3.1 to 64 seconds in 4.1 to run the benchmark code. Not sure if anything can be done ...
The primary culprit has been the introduction of checking of GLOBAL variables and "better" dereferencing of temporary results.
As a general rules-of-thumb ...
- Never use variables that are declared outside of a routine.
- Explicitly use your own temporary result variable. This is not a big winner but may help sometimes, especially inside deeply nested loops.
- Avoid combinations of 'and' and 'or' in a loop exit condition. This is one area that Euphoria needs some optimising work done.
- Use the X op= Y operators instead of X = X op Y where possible
You can use the dis utility to view the generated Intermediate Language code generated by Euphoria's parser. This is the actual code that will be executed.
Here is your code with these suggestions included. On my system I got it down from 51 seconds to 21 seconds.
without type_check procedure main() -- Avoiding variables declared outside of routines. integer left_edge, right_edge, top_edge, bottom_edge, max_iter, x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char, accum, count atom t1 -- Explicit temporary used integer t2 -- Explicit temporary used atom t t = time() accum = 0 count = 0 while count < 1545 do left_edge = -420 right_edge = 300 top_edge = 300 bottom_edge = -300 x_step = 7 y_step = 15 max_iter = 200 y0 = top_edge while 1 do if y0 <= bottom_edge then exit end if x0 = left_edge while 1 do if x0 >= right_edge then exit end if y = 0 x = 0 the_char = 32 x_x = 0 y_y = 0 i = 0 t1 = 0 while 1 do if i >= max_iter then exit end if if t1 > 800 then exit end if t1 = x * x x_x = floor(t1 / 200) t1 = y * y y_y = floor(t1 / 200) t1 = x_x + y_y if t1 > 800 then the_char = 48 + i if i > 9 then the_char = 64 end if else temp = x_x - y_y temp = temp + x0 t2 = x * y --if (x < 0 and y > 0) or (x > 0 and y < 0) then if t2 < 0 then t2 *= -1 t1 = floor(t2 / 100) t1 *= -1 y = t1 + y0 --y = (-1 * floor((-1 * x * y) / 100)) + y0 else --t1 = x * y t1 = floor(t2 / 100) y = t1 + y0 --y = floor(x * y / 100) + y0 end if x = temp end if i = i + 1 t1 = x_x + y_y end while accum += the_char x0 += x_step end while y0 -= y_step end while if remainder(count, 300) = 0 then printf(1, "%d\n", accum) end if count += 1 end while printf(1, "%d\n\nCompleted in %d seconds\n\n", {accum, time() - t}) end procedure main()
3. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by PeteE May 21, 2014
- 2613 views
Hi Ed,
I finally finished my virtual machine interpreter, started back in December when you published that primes benchmark. I now have it working on the primes and this new benchmark. The interpreter is written in C and is register-based (instead of stack-based), and it goes really fast. The parser and bytecode compiler is written in Euphoria and it supports only a small subset of Euphoria statements, mainly integer arithmetic and loops. If you don't mind, could you benchmark it on your machine and let me know how it stacks up? The source code is here: https://github.com/peberlein/interpreter
4. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by ed_davis May 21, 2014
- 2605 views
Hi Ed,
I finally finished my virtual machine interpreter, started back in December when you published that primes benchmark. I now have it working on the primes and this new benchmark. The interpreter is written in C and is register-based (instead of stack-based), and it goes really fast. The parser and bytecode compiler is written in Euphoria and it supports only a small subset of Euphoria statements, mainly integer arithmetic and loops. If you don't mind, could you benchmark it on your machine and let me know how it stacks up? The source code is here: https://github.com/peberlein/interpreter
Hello Pete!
It has been a long time since PEU! I still have peu08 from August 1998.
Your new interpreter looks really promising. But (sorry about that), I had a problem with while loops. I could not get bench.ex or primes.ex to run - they both crashed. I traced it down to the while loop. For instance, this appears to work:
integer pc, k, p, lim, n lim = 5000000 n = 1 pc = 0 --while n < lim do k = 3 p = 1 n = n + 2 -- while k * k < n and p do p = floor(n / k) * k != n k = k + 2 -- end while if p then pc = pc + 1 end if --end while ? pc
But this crashes:
integer i i = 0 while i < 10 do i = i + 1 ? i end while
The above generated:
00000101 0: load r1, 0 000A0201 1: load r2, 10 0301020A 2: jle r2, r1, 3 (6) 01010105 3: addu8 r1, r1, 1 00000114 4: qprint r1 FFFB000D 5: jmp -5 (1) 00000000 6: end
I'm running Windows 7. I'm using: gcc (tdm64-2) 4.8.1
I compiled interpret.c with:
-m32 -O3 -march=native -mtune=native
I also tried it with:
-m64 -O3 -march=native -mtune=native
I also tried it without any options, save forcing 32-bit/64-bit compilation:
-m32
and
-m64
Let me know if there is something you'd like me try.
5. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by PeteE May 21, 2014
- 2553 views
- Last edited May 22, 2014
Hi Ed, sorry about that - I only tested Linux and Mac. I'm installing MinGW on Win 7 now...
Edit: Fixed now and pushed to github. Needed "strtoul" instead of "strtol" when reading the bytecodes.
6. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by GreenEuphorian May 22, 2014
- 2508 views
The Euphoria interpreter has gone from taking 12 seconds in 3.1 to 64 seconds in 4.1 to run the benchmark code. Not sure if anything can be done, but it is interesting to say the least.
64 seconds! Euphoria seems to be losing the edge of speed. Is there any option in the v4.1 interpreter that could allow for better speed? Anything that could bring back the old speed, at the cost of having a limited feature set?
7. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by mattlewis (admin) May 22, 2014
- 2503 views
I get (running each twice):
Interpreter | Time |
---|---|
exu (3.1) | 23, 20 |
eui (4.0.5) | 19, 19 |
eui (4.1, 32-bits) | 25, 25 |
eui (4.1, 64-bits) | 29, 30 |
My results don't show as dramatic a difference, and in fact, 4.0.5 was the best.
I haven't dug into the details, but Derek is right, the difference is really the way we handle temporary values. This was really a consequence of being able to have things get cleaned up (memory freed, files closed, etc) when a value loses all reference counts.
We could probably do more work on figuring out when that's not necessary (e.g., when it's an integer).
OTOH, the translators are consistent:
Translator | Time |
---|---|
ecu (3.1) | 2, 2 |
eui (4.05) | 2, 2 |
euc (4.1, 32-bits) | 2, 2 |
euc (4.1, 64-bits) | 2, 2 |
Matt
8. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by ed_davis May 22, 2014
- 2720 views
Hi Ed, sorry about that - I only tested Linux and Mac. I'm installing MinGW on Win 7 now...
Edit: Fixed now and pushed to github. Needed "strtoul" instead of "strtol" when reading the bytecodes.
Works much better now
Compiled for 32-bit under 64-bit Windows 7: 17.96 seconds
Compiled for 64-bit under 64-bit Windows 7: 14.26
And here is how it compares to some other language processors.
Processor | Time | Type | From |
---|---|---|---|
C -O3 or -O2 | 1.00 | Native | |
BCX | 1.12 | C | bcxbasic.com |
C -O or -O1 | 1.13 | Native | |
Java | 1.23 | JIT | |
C# | 1.34 | JIT | |
Visual Basic .Net | 1.52 | JIT | |
Free Pascal -O3 | 1.92 | Native | freepascal.org |
C, no options | 1.96 | Native | |
C -Os | 2.38 | Native | |
Borland C | 2.80 | Native | |
TinyC | 3.23 | Native | bellard.org/tcc |
Free Basic | 3.36 | Native | freebasic.net |
euc -gcc -con | 3.58 | C | Euphoria to C (gcc) |
Oxygen basic | 3.77 | Native | oxygenbasic.org |
Power Basic | 4.03 | Native | powerbasic.com |
euc -wat -con | 6.52 | C | Euphoria to C (watcom) |
Euphoria v3.1 | 12.45 | VM | rapideuphoria.com |
PE 64-bit | 14.26 | VM | github.com/peberlein/interpreter |
PE 32-bit | 17.96 | VM | github.com/peberlein/interpreter |
Euphoria v4.04 | 24.36 | VM | openeuphoria.com |
QB64 | 32.43 | C++ | qb64.net |
Pike | 44.39 | VM | pike.lysator.liu.se |
calc3a | 54.87 | AST | epaperpress.com/lexandyacc/ |
toy2.c | 58.39 | VM | groups.yahoo.com/neo/groups/QDepartment/files/Ed-Davis-stuff |
Tinypas.c | 58.97 | VM | Pascal-S converted to C |
Ruby | 60.25 | VM | ruby-lang.org |
Euphoria v4.1 beta | 61.40 | VM | openeuphoria.com |
9. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by ed_davis May 22, 2014
- 2561 views
(Hmmm. I actually had about a dozen more entries, but for some reason Creol won't display them, so splitting into two messages. I wonder if there is a limit on the table size?)
Processor | Time | Type | From |
---|---|---|---|
tiny-c.c | 65.96 | VM | iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c |
Lua | 80.14 | VM | lua.org |
jwillia basic | 109.23 | VM | github.com/jwillia3/BASIC |
hoc | 140.37 | VM | netlib.bell-labs.com/~bwk/hoc.sh |
NaaLaa | 168.90 | VM | NaaLaa.com |
CInt | 201.01 | VM | root.cern.ch/drupal/content/cint |
Python | 274.79 | VM | python.org |
Yabasic | 278.77 | VM | yabasic.de |
SpecBAS | 358.74 | VM | sites.google.com/site/pauldunn |
Chip Munk Basic | 442.09 | Interpreter | nicholson.com/rhn/basic |
BBCBasic | 531.89 | Interpreter | bbcbasic.co.uk |
Thin Basic | 543.00 | Interpreter | thinbasic.com |
CH | 582.00 | Interpreter | softintegration.com |
Scriba | 618.00 | Interpreter | scriptbasic.org |
SI | 1020.73 | Interpreter | drdobbs.com/cpp/si-a-c-like-script-interpreter/184408141 |
LittleC | 2428.74 | Interpreter | hbp.iconbar.com/download/microC.arc |
PicoC | 2478.18 | Interpreter | code.google.com/p/picoc |
DDS5 | 3033.35 | Interpreter | ioccc.org/1990/dds.c |
For the VM interpreters I tested, only Euphoria 3.1 was faster.
10. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by ed_davis May 22, 2014
- 2521 views
I get (running each twice):
...
My results don't show as dramatic a difference, and in fact, 4.0.5 was the best.
Matt
Which benchmark program did you run, my original, or Derek's modified version?
I also get much better times when I run Derek's modified version - not quite as good/consistent as yours though. But I consider Derek's version cheating
11. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by jimcbrown (admin) May 22, 2014
- 2496 views
But I consider Derek's version cheating
Why? The algorithm is the same, isn't it? He just changed some of the syntax (A = A op B to A op= B) and moved some temporary variables around.
12. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by ed_davis May 22, 2014
- 2503 views
But I consider Derek's version cheating
Why? The algorithm is the same, isn't it? He just changed some of the syntax (A = A op B to A op= B) and moved some temporary variables around.
Yes, but I could do the same for many of the languages tested. Several of the simpler processors would greatly benefit from less complicated expressions. The goal was to have the same (as much as possible) code for each processor, without any programmer optimizations.
13. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by jimcbrown (admin) May 22, 2014
- 2470 views
But I consider Derek's version cheating
Why? The algorithm is the same, isn't it? He just changed some of the syntax (A = A op B to A op= B) and moved some temporary variables around.
Yes, but I could do the same for many of the languages tested. Several of the simpler processors would greatly benefit from less complicated expressions. The goal was to have the same (as much as possible) code for each processor, without any programmer optimizations.
Fair enough.
14. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by mattlewis (admin) May 22, 2014
- 2494 views
I get (running each twice):
...
My results don't show as dramatic a difference, and in fact, 4.0.5 was the best.
Matt
Which benchmark program did you run, my original, or Derek's modified version?
I also get much better times when I run Derek's modified version - not quite as good/consistent as yours though. But I consider Derek's version cheating
My times were with your code. Honestly, just putting it all in a main() function was the biggest difference. I did do some similar things as Derek's code in playing with it, and 4.0.5 and 3.1 were about the same (12-15s) and the 64 bit version of 4.1 was a little bit slower (~17s).
I think wrapping it in a main() function isn't much cheating, since many languages make you do that anyways. The top level variable checking is there due to the ability to forward reference and jump around in code. The interpreter doesn't / can't do the level of checking that the translator does, and in this case, the feature vastly outweighs the performance degradation.
The temp variable stuff is definitely something we need to work on, but again, it's a huge feature win, in my book.
Matt
15. Re: Interesting benchmark results - Euphoria vs. Euphoria
- Posted by ghaberek (admin) May 22, 2014
- 2495 views
I think wrapping it in a main() function isn't much cheating, since many languages make you do that anyways.
I do this all the time, simply as a measure of proper practice. It sure would be nice if Euphoria ran a main() routine automatically. (wink wink, nudge nudge)
-Greg
Forked into: Automatic entry point preprocessor