6.11 Performance Tips

6.11.1 General Tips

  • If your program is fast enough, forget about speeding it up. Just make it simple and readable.
  • If your program is way too slow, the tips below will probably not solve your problem. You should find a better overall algorithm.
  • The easiest way to gain a bit of speed is to turn off run-time type-checking. Insert the line:
without type_check

at the top of your main .ex file, ahead of any include statements. You'll typically gain between 0 and 20 percent depending on the types you have defined, and the files that you are including. Most of the standard include files do some user-defined type-checking. A program that is completely without user-defined type-checking might still be speeded up slightly.
Also, be sure to remove, or comment-out, any

with trace
with profile
with profile_time

statements. with trace (even without any calls to trace), and with profile can easily slow you down by 10% or more. with profile_time might slow you down by 1%. Each of these options will consume extra memory as well.

  • Calculations using integer values are faster than calculations using floating-point numbers
  • Declare variables as integer rather than atom where possible, and as sequence rather than object where possible. This usually gains you a few percent in speed.
  • In an expression involving floating-point calculations, it's usually faster to write constant numbers in floating point form, e.g. when x has a floating-point value, say, x = 9.9

    change:
x = x * 5

to:

x = x * 5.0

This saves the interpreter from having to convert integer 5 to floating-point 5.0 each time.

  • Euphoria does short-circuit evaluation of if, elsif, and while conditions involving and and or. Euphoria will stop evaluating any condition once it determines if the condition is true or not. For instance in the if-statement:
if x > 20 and y = 0 then
    ...
end if

The "y = 0" test will only be made when "x > 20" is true.
For maximum speed, you can order your tests. Do "x > 20" first if it is more likely to be false than "y = 0".
In general, with a condition "A and B", Euphoria will not evaluate the expression B, when A is false (zero). Similarly, with a condition like "A or B", B will not be evaluated when A is true (non-zero).
Simple if-statements are highly optimized. With the current version of the interpreter, nested simple if's that compare integers are usually a bit faster than a single short-circuit if-statement e.g.:

if x > 20 then
    if y = 0 then
       ...
    end if
end if
  • The speed of access to private variables, local variables and global variables is the same.
  • There is no performance penalty for defining constants versus plugging in hard-coded literal numbers. The speed of:
y = x * MAX

is exactly the same as:

y = x * 1000

where you've previously defined:

constant MAX = 1000
  • There is no performance penalty for having lots of comments in your program. Comments are completely ignored. They are not executed in any way. It might take a few milliseconds longer for the initial load of your program, but that's a very small price to pay for future maintainability, and when you bind your program, or translate your program to C, all comments are stripped out, so the cost becomes absolute zero.

6.11.2 Measuring Performance

In any programming language, and especially in Euphoria, you really have to make measurements before drawing conclusions about performance.

Euphoria provides both execution-count profiling, as well as time profiling. You will often be surprised by the results of these profiles. Concentrate your efforts on the places in your program that are using a high percentage of the total time (or at least are executed a large number of times.) There's no point to rewriting a section of code that uses 0.01% of the total time. Usually there will be one place, or just a few places where code tweaking will make a significant difference.

You can also measure the speed of code by using the time() function. e.g.

atom t = time()
for i = 1 to 10000 do
    -- small chunk of code here
end for
? time() - t

You might rewrite the small chunk of code in different ways to see which way is faster.

6.11.3 How to Speed-Up Loops

Profiling will show you the hot spots in your program. These are usually inside loops. Look at each calculation inside the loop and ask yourself if it really needs to happen every time through the loop, or could it be done just once, prior to the loop.

6.11.4 Converting Multiplies to Adds in a Loop

Addition is faster than multiplication. Sometimes you can replace a multiplication by the loop variable, with an addition. Something like:

for i = 0 to 199 do
    poke(screen_memory+i*320, 0)
end for

becomes:

x = screen_memory
for i = 0 to 199 do
    poke(x, 0)
    x = x + 320
end for

6.11.5 Saving Results in Variables

  • It's faster to save the result of a calculation in a variable, than it is to recalculate it later. Even something as simple as a subscript operation, or adding 1 to a variable is worth saving.
  • When you have a sequence with multiple levels of subscripting, it is faster to change code like:
for i = 1 to 1000 do
   y[a][i] = y[a][i]+1
end for

to:

ya = y[a]
for i = 1 to 1000 do
    ya[i] = ya[i] + 1
end for
y[a] = ya

So you are doing two subscript operations per iteration of the loop, rather than four. The operations, ya = y[a] and y[a] = ya are very cheap. They just copy a pointer. They don't copy a whole sequence.

  • There is a slight cost when you create a new sequence using {a,b,c}. If possible, move this operation out of a critical loop by storing it in a variable before the loop, and referencing the variable inside the loop.

6.11.6 In-lining of Routine Calls

If you have a routine that is rather small, the interpreter and translator will in-line it for you. Your code will remain as readable as before.

6.11.7 Operations on Sequences

Euphoria lets you operate on a large sequence of data using a single statement. This saves you from writing a loop where you process one element at-a-time. e.g.

x = {1,3,5,7,9}
y = {2,4,6,8,10}
z = x + y

versus:

z = repeat(0, 5)  -- if necessary
for i = 1 to 5 do
    z[i] = x[i] + y[i]
end for

In most interpreted languages, it is much faster to process a whole sequence (array) in one statement, than it is to perform scalar operations in a loop. This is because the interpreter has a large amount of overhead for each statement it executes.

Euphoria is different. Euphoria is very lean, with little interpretive overhead, so operations on sequences don't always win. The only solution is to time it both ways. The per-element cost is usually lower when you process a sequence in one statement, but there are overheads associated with allocation and deallocation of sequences that may tip the scale the other way.

6.11.8 Some Special Case Optimizations

Euphoria automatically optimizes certain special cases. x and y below could be variables or arbitrary expressions.

x + 1      -- faster than general x + y
1 + x      -- faster than general y + x
x * 2      -- faster than general x * y
2 * x      -- faster than general y * x
x / 2      -- faster than general x / y
floor(x/y) -- where x and y are integers, is faster than x/y
floor(x/2) -- faster than floor(x/y)

x below is a simple variable, y is any variable or expression:

x = append(x, y)   -- faster than general z = append(x, y)
x = prepend(x, y)  -- faster than general z = prepend(x, y)

x = x & y          -- where x is much larger than y,
                   -- is faster than general z = x & y

When you write a loop that "grows" a sequence, by appending or concatenating data onto it, the time will, in general, grow in proportion to the square of the number (N) of elements you are adding. However, if you can use one of the special optimized forms of append(), prepend() or concatenation listed above, the time will grow in proportion to just N (roughly). This could save you a huge amount of time when creating an extremely long sequence.

(You could also use repeat() to establish the maximum size of the sequence, and then fill in the elements in a loop, as discussed below.)

6.11.9 Assignment with Operators

For greater speed, convert:

**left-hand-side = left-hand-side op expression**
to:
**left-hand-side op= expression**
For example:

-- Instead of ...
some_val = some_val * 3
-- Use ...
some_val *= 3

whenever left-hand-side contains at least two subscripts, or at least one subscript and a slice. In all simpler cases the two forms run at the same speed (or very close to the same).

6.11.10 Library / Built-In Routines

Some common routines are extremely fast. You probably couldn't do the job faster any other way, even if you used C or assembly language. Some of these are:

6.11.10.1 Low Level Memory Manipulation

6.11.10.2 Sequence Manipulation

Other routines are reasonably fast, but you might be able to do the job faster in some cases if speed was crucial.

x = repeat(0,100) -- Pre-allocate all the elements first.
for i = 1 to 100 do
    x[i] = i
end for

is somewhat faster than:

x = {}
for i = 1 to 100 do
    x = append(x, i)
end for

because append() has to allocate and reallocate space as x grows in size. With repeat(), the space for x is allocated once at the beginning. (append() is smart enough not to allocate space with every append to x. It will allocate somewhat more than it needs, to reduce the number of reallocations.)

These built-in operations are also optimize to make changes in place (where possible), rather than creating copies of sequences via slices.

6.11.10.3 Bitwise operations vs Arithmetic

You can replace:

remainder(x, p)

with:

and_bits(x, p-1)

for greater speed when p is a positive power of 2. x must be a non-negative integer that fits in 32-bits.

arctan is faster than arccos or arcsin.

6.11.11 Searching

Euphoria's find is the fastest way to search for a value in a sequence up to about 50 elements. Beyond that, you might consider a map or other implementation of a hash table (demo\hash.ex) or a binary tree (demo\tree.ex).

6.11.12 Sorting

In most cases you can just use the shell sort routine in sort.e.

If you have a huge amount of data to sort, you might try one of the sorts in demo\allsorts.e (e.g. great sort). If your data is too big to fit in memory, don't rely on Euphoria's automatic memory swapping capability. Instead, sort a few thousand records at a time, and write them out to a series of temporary files. Then merge all the sorted temporary files into one big sorted file.

If your data consists of integers only, and they are all in a fairly narrow range, try the bucket sort in demo\allsorts.e.

6.11.13 Taking Advantage of Cache Memory

As CPU speeds increase, the gap between the speed of the on-chip cache memory and the speed of the main memory or DRAM (dynamic random access memory) becomes ever greater. You might have 256 Mb of DRAM on your computer, but the on-chip cache is likely to be only 8K (data) plus 8K (instructions) on a Pentium, or 16K (data) plus 16K (instructions) on a Pentium with MMX or a Pentium II/III. Most machines will also have a "level-2" cache of 256K or 512K.

An algorithm that steps through a long sequence of a couple of thousand elements or more, many times, from beginning to end, performing one small operation on each element, will not make good use of the on-chip data cache. It might be better to go through once, applying several operations to each element, before moving on to the next element. The same argument holds when your program starts swapping, and the least-recently-used data is moved out to disk.

These cache effects aren't as noticeable in Euphoria as they are in lower-level compiled languages, but they are measurable.

6.11.14 Using Machine Code and C

Euphoria lets you call routines written in machine code. You can call C routines in dynamically loaded library files, and these C routines can call your Euphoria routines. You might need to call C or machine code because of something that can not be done directly in Euphoria, or you might do it for improved speed.

To boost speed, the machine code or C routine needs to do a significant amount of work on each call, otherwise the overhead of setting up the arguments and making the call will dominate the time, and it might not gain you much.

Many programs have some inner core operation that consumes most of the CPU time. If you can code this in C or machine code, while leaving the bulk of the program in Euphoria, you might achieve a speed comparable to C, without sacrificing Euphoria's safety and flexibility.

6.11.15 Using The Euphoria To C Translator

The Euphoria To C Translator is included in the installation package. It will translate any Euphoria program into a set of C source files that you can compile using a C compiler.

The executable file that you get using the Translator should run the same, but faster than when you use the interpreter. The speed-up can be anywhere from a few percent to a factor of 5 or more.