Tip of the week - Concetenation
- Posted by ChrisB (moderator) Oct 17, 2010
- 2194 views
This weeks Tip of the week is brought to us by Derek Parnell.
Euphoria makes concatenation so easy, it seems a shame not to use it. However, this simplicity comes with a price. Concatenation is a time consuming thing to do and if your application needs to do lots of it, then it pays to find other ways to do things. So, to get better performance from your application, avoid as much concatenation as possible.
Why is it time consuming? Well, under the hood, whenever you join two objects together, Euphoria makes a new sequence. And if that new sequence is to replace a sequence, the old sequence's memory needs to be deallocated. Consider this...
sequence c c = "abc" & "def" -- Statement #1 c &= "xyz" -- Statement #2 c = append(c, "000") -- Statement #3
- Statement #1: Memory for 6 items is allocated. "abc" and "def" are copied to the new allocation. 'c' is then made to refer to the new allocation.
- Statement #2: Memory is allocated for 9 items. All of 'c' gets copied and "xyz" gets copied to the new allocation. The memory that 'c' was referring to is deallocated and 'c' is next set to refer to the new allocation.
- Statement #3: Memory is allocated for 12 items. All of 'c' gets copied and "000" gets copied to the new allocation. The memory that 'c' was referring to is deallocated and 'c' is next set to refer to the new allocation.
These is a simplified explanation of what happens because it is a bit more complicated than this (sometimes 'c' already points to enough memory so no extra allocation is needed), but you get the drift.
As you can see, the more you concatenate, the more Euphoria has to allocate RAM, copy stuff around, and in some cases deallocate now unused RAM. One way of preventing all this work is to pre-allocate the result sequence or to use an existing sequence with enough room.
Let's see this in action. Suppose you are writing a function to collect all the vowels in a sentence. (I don't know why but just bear with me for now). Here is a simple way, using concatenation.
function getvowels(sequence text) sequence res res = "" for i = 1 to length(text) do if find(text[i], "aeiou") then res &= text[i] end if end for return res end function
That function does a concatenation for every vowel it finds, and each concatenation copies (again) all the vowels found so far.
So here is a faster way...
function getvowels(sequence text) sequence res integer p res = repeat(' ', length(text)) p = 0 for i = 1 to length(text) do if find(text[i], "aeiou") then p += 1 res[p] = text[i] end if end for return res[1 .. p] end function
In this version, we allocate the maximum possible size of the return value (if text were all vowels then the length of text is the maximum size.). This allocation gets done once. Now, as we find a vowel, instead of concatenating, we just copy it to the next available spot in the result sequence. When all is done, we just return everything in the result from 1 up to the last one we copied there. This has a lot less RAM allocation and copying going on, but is a bit more complex. Programming is always an exercise in trade-offs.
By the way, that is not the fastest it could be yet, but that I'll leave that for another Tip-Of-The-Week entry.
--------------------------------------------------------------------------------------------------------------------------------
Things that may seem second nature to you, or if you've ever had an 'aha' moment, let us know, others may see things they have never seen before. Remember, send your tips to totw@openeuphoria.org.