Concatenation

Tip brought to us by DerekParnell.

Euphoria makes concatenation so easy, it seems a shame not to use it. However, this simplicity comes with a price. Concatenation is a time consuming thing to do and if your application needs to do lots of it, then it pays to find other ways to do things. So, to get better performance from your application, avoid as much concatenation as possible.

Why is it time consuming? Well, under the hood, whenever you join two objects together, Euphoria makes a new sequence. And if that new sequence is to replace a sequence, the old sequence's memory needs to be deallocated. Consider this...

sequence c  
  
c = "abc" & "def" -- Statement #1  
c &= "xyz" -- Statement #2  
c = append(c, "000") -- Statement #3  


  • Statement #1: Memory for 6 items is allocated. "abc" and "def" are copied to the new allocation. 'c' is then made to refer to the new allocation.
  • Statement #2: Memory is allocated for 9 items. All of 'c' gets copied and "xyz" gets copied to the new allocation. The memory that 'c' was referring to is deallocated and 'c' is next set to refer to the new allocation.
  • Statement #3: Memory is allocated for 12 items. All of 'c' gets copied and "000" gets copied to the new allocation. The memory that 'c' was referring to is deallocated and 'c' is next set to refer to the new allocation.

These is a simplified explanation of what happens because it is a bit more complicated than this (sometimes 'c' already points to enough memory so no extra allocation is needed), but you get the drift.

As you can see, the more you concatenate, the more Euphoria has to allocate RAM, copy stuff around, and in some cases deallocate now unused RAM. One way of preventing all this work is to pre-allocate the result sequence or to use an existing sequence with enough room.

Let's see this in action. Suppose you are writing a function to collect all the vowels in a sentence. (I don't know why but just bear with me for now). Here is a simple way, using concatenation.

function getvowels(sequence text)  
 sequence res  
  
 res = ""  
 for i = 1 to length(text) do  
  if find(text[i], "aeiou") then  
    res &= text[i]  
  end if  
 end for  
 return res  
end function  


That function does a concatenation for every vowel it finds, and each concatenation copies (again) all the vowels found so far.

So here is a faster way...

function getvowels(sequence text)  
 sequence res  
 integer p  
  
 res = repeat(' ', length(text))  
 p = 0  
 for i = 1 to length(text) do  
  if find(text[i], "aeiou") then  
    p += 1  
    res[p] = text[i]  
  end if  
 end for  
 return res[1 .. p]  
end function  


In this version, we allocate the maximum possible size of the return value (if text were all vowels then the length of text is the maximum size.). This allocation gets done once. Now, as we find a vowel, instead of concatenating, we just copy it to the next available spot in the result sequence. When all is done, we just return everything in the result from 1 up to the last one we copied there. This has a lot less RAM allocation and copying going on, but is a bit more complex. Programming is always an exercise in trade-offs.

There was a little controversy about this - check out http://openeuphoria.org/forum/113066.wc?last_id=113088 for the full post.

Search



Quick Links

User menu

Not signed in.

Misc Menu