passing sequences to procedures/functions

new topic     » goto parent     » topic index » view thread      » older message » newer message

Mike Burrell writes:

> i obviously don't know all the internal workings of the euphoria
> compiler but it occured to me that passing big sequences to procedures and
> functions must be an awful waste of memory...  because an entire sequence
> can be passed to a subprogram intact and any changes to it inside the
> subprogram don't affect the sequence externally, i figure that it must
> make a complete copy of that sequence when it is passed...  however, if you
> had like a billion element sequence, then it would have to make another
> billion element sequence when you passed it to a procedure or
> something, right??  in many cases, however, you wouldn't alter the sequence, j
ust
> read from it, or you wouldn't mind altering the external form...  so
> wouldn't it be possible to pass a sequence to a procedure or function via
> pointer, so that the sequence used internally by the procedure/function
> and externally are one and the same...  this would prevent having to
> make a duplicate of the sequence by making just a duplicate of the pointer
> to that sequence...  i dunno just thinking :>

When you pass a sequence as an argument to a procedure or function, only
a pointer to the sequence is actually passed.

If the subroutine does not modify the sequence then a copy is never made.

If the subroutine (or a routine called by the subroutine) tries to modify
the sequence, then a copy must be made. However, only the "top-level"
sequence is copied (4 bytes per element), NOT the elements of the sequence
that are also sequences or floating-point numbers.

e.g. consider:

    procedure foo(sequence s)
       sequence t
       t = s
       t[1] = "Pascal"
    end procedure

    sequence x
    x = {"Euphoria", "Programming", "Language"}
    foo(x)

The statement  t = s  simply copies a 4-byte pointer, NOT a bunch of strings.
The statement t[1] = "Pascal" causes a new sequence of 3 elements to be created
with
3 pointers to the 3 strings, and then one pointer is overwritten with a
pointer to "Pascal". In this example the strings themselves are
never copied.

Conclusions:

   1. no modification --> no copy
   2. modification --> only the top-level set of pointers in a "multi-dimensiona
l"
      sequence is actually copied.
   3. The per-element cost of copying 4-bytes at the machine-level
      is very small compared to the Euphoria-level processing that
      you are likely to do per-element inside the subroutine
      e.g. subscript, add, compare etc.

The worst case would be if you passed in a huge sequence and modified
one element, and did nothing else. In that case you should
consider storing the sequence in a global variable, rather than
passing it as a parameter.

There are obviously cases where performance could be improved
by having "reference" parameters, that would never be copied.
There are currently no plans to allow such parameters.

Regards,
  Rob Craig
  Rapid Deployment Software

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu