Sequence Ops
- Posted by Robert Craig <rds at Ra?idEuphoria.com> Jul 15, 2007
- 611 views
Here's a bit of background on sequence operations in Euphoria. When I started designing the Euphoria language (18 years ago!), I believed that the basic cycle time of the interpreter, i.e. the time required to process the simplest IL operation, such as adding two integers, would be similar to other interpreters around at that time, and Euphoria might even be slower, since Euphoria was going to have great flexibility in it's data structures, as well as lots of run-time error checking. I knew from my experience with APL, that an interpreted language could gain speed by supporting operations on large aggregates of data (vectors, matrices, lists etc.), thereby reducing the number of statements that need to be interpreted, and shifting the workload to a fast routine written in C or assembly language. In APL this was very true. Everyone tried hard to write code that used APL "vectors" (somewhat similar to Euphoria sequences, but more restricted), rather than APL scalars (atoms), since you could often speed up a program by a factor of 10x to 100x that way. APL also had a richer set of primitive operations on vectors than Euphoria has. e.g. it had things like rotate-right, matrix transpose etc. It also had a feature that let you apply an operation to a whole vector. e.g. +/x would add up all the elements in x. It also let you select elements from a vector by providing another vector of 1's and 0's which typically was created by relational ops (< > = etc.). x = y, where x and y are vectors (sequences) was thus very useful in APL. Less so, in Euphoria. The first working version of the Euphoria interpreter was about 40 times slower than today's interpreter at simple integer:integer ops. sequence:sequence ops were not much slower than today however, since they are handled by run-time routines written in C, so it was usually much faster to use sequence ops to do something, than to write a Euphoria loop, and do the job one atom at a time. However, I soon became obsessed with speed, and surprised even myself with how much faster I was able to make the interpreter, eventually reducing the overhead per IL op down to just a few machine instructions. I always compared Euphoria against C, never benchmarking against other interpreters until just before v1.0 when I set up some benchmarks against QBasic, and was very surprised at how much faster Euphoria was. I later compared against other popular interpreters and was also surprised. It seemed like other developers just didn't care much about interpreter speed, or they added features to their language that were incompatible with fast execution. Many years later, the Euphoria to C translator, boosted the speed of simple integer:integer ops even more. So the situation today is that it's often a bit faster to write a Euphoria loop, than it is to use sequence ops, mainly because sequence ops will have some storage allocation/deallocation overhead. Euphoria sequence ops are not used as much as I originally expected. Back in my APL days, I worked (as a summer job in university) for a large stock broker's research department. I was constantly doing analysis of time series data, such as the closing daily stock price of some company going back many years. We had various theories to test, computing correlation coefficients on the data etc. It was very convenient to use APL vectors (Euphoria sequences). It eliminated a lot of loops, and made the code very concise, not to mention faster. Later in my career I found myself working with a powerful parallel SIMD (single-instruction stream, multiple data stream) computer used in analyzing sonar signals bouncing off submarines. It had 8 CPU's working in lock-step, executing the same instruction on 8 streams of data. It was the fastest machine in the world at computing FFTs (Fast Fourier Transforms). I worked on a language for that machine, where, naturally SIMD (i.e. sequence) operations predominated. I think the people on this list who are interested in language design may not have as much use for SIMD operations as other application writers who work in the areas of business data processing, statistics, or scientific computing, where the algorithms are often fairly simple (i.e. boring), but you are applying those algorithms to large amounts of real-world data. Hobbyists are not usually interested in applications like that, and don't have access to large streams of data that they care about. So what should we do? Thats up to you! Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com