1. Sequence Ops
- Posted by Robert Craig <rds at Ra?idEuphoria.com> Jul 15, 2007
- 612 views
Here's a bit of background on sequence operations in Euphoria. When I started designing the Euphoria language (18 years ago!), I believed that the basic cycle time of the interpreter, i.e. the time required to process the simplest IL operation, such as adding two integers, would be similar to other interpreters around at that time, and Euphoria might even be slower, since Euphoria was going to have great flexibility in it's data structures, as well as lots of run-time error checking. I knew from my experience with APL, that an interpreted language could gain speed by supporting operations on large aggregates of data (vectors, matrices, lists etc.), thereby reducing the number of statements that need to be interpreted, and shifting the workload to a fast routine written in C or assembly language. In APL this was very true. Everyone tried hard to write code that used APL "vectors" (somewhat similar to Euphoria sequences, but more restricted), rather than APL scalars (atoms), since you could often speed up a program by a factor of 10x to 100x that way. APL also had a richer set of primitive operations on vectors than Euphoria has. e.g. it had things like rotate-right, matrix transpose etc. It also had a feature that let you apply an operation to a whole vector. e.g. +/x would add up all the elements in x. It also let you select elements from a vector by providing another vector of 1's and 0's which typically was created by relational ops (< > = etc.). x = y, where x and y are vectors (sequences) was thus very useful in APL. Less so, in Euphoria. The first working version of the Euphoria interpreter was about 40 times slower than today's interpreter at simple integer:integer ops. sequence:sequence ops were not much slower than today however, since they are handled by run-time routines written in C, so it was usually much faster to use sequence ops to do something, than to write a Euphoria loop, and do the job one atom at a time. However, I soon became obsessed with speed, and surprised even myself with how much faster I was able to make the interpreter, eventually reducing the overhead per IL op down to just a few machine instructions. I always compared Euphoria against C, never benchmarking against other interpreters until just before v1.0 when I set up some benchmarks against QBasic, and was very surprised at how much faster Euphoria was. I later compared against other popular interpreters and was also surprised. It seemed like other developers just didn't care much about interpreter speed, or they added features to their language that were incompatible with fast execution. Many years later, the Euphoria to C translator, boosted the speed of simple integer:integer ops even more. So the situation today is that it's often a bit faster to write a Euphoria loop, than it is to use sequence ops, mainly because sequence ops will have some storage allocation/deallocation overhead. Euphoria sequence ops are not used as much as I originally expected. Back in my APL days, I worked (as a summer job in university) for a large stock broker's research department. I was constantly doing analysis of time series data, such as the closing daily stock price of some company going back many years. We had various theories to test, computing correlation coefficients on the data etc. It was very convenient to use APL vectors (Euphoria sequences). It eliminated a lot of loops, and made the code very concise, not to mention faster. Later in my career I found myself working with a powerful parallel SIMD (single-instruction stream, multiple data stream) computer used in analyzing sonar signals bouncing off submarines. It had 8 CPU's working in lock-step, executing the same instruction on 8 streams of data. It was the fastest machine in the world at computing FFTs (Fast Fourier Transforms). I worked on a language for that machine, where, naturally SIMD (i.e. sequence) operations predominated. I think the people on this list who are interested in language design may not have as much use for SIMD operations as other application writers who work in the areas of business data processing, statistics, or scientific computing, where the algorithms are often fairly simple (i.e. boring), but you are applying those algorithms to large amounts of real-world data. Hobbyists are not usually interested in applications like that, and don't have access to large streams of data that they care about. So what should we do? Thats up to you! Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
2. Re: Sequence Ops
- Posted by Ricardo Forno <ricardoforno at tutopia.?om> Jul 15, 2007
- 593 views
- Last edited Jul 16, 2007
Robert Craig wrote: > > Here's a bit of background on sequence operations > in Euphoria. > > When I started designing the Euphoria language > (18 years ago!), I believed that the basic cycle time > of the interpreter, i.e. the time required to process > the simplest IL operation, such as adding two integers, > would be similar to other interpreters around at that time, > and Euphoria might even be slower, since Euphoria was going > to have great flexibility in it's data structures, as well > as lots of run-time error checking. > > I knew from my experience with APL, that an interpreted > language could gain speed by supporting operations on > large aggregates of data (vectors, matrices, lists etc.), > thereby reducing the number of statements that need to be interpreted, > and shifting the workload to a fast routine written in C or > assembly language. In APL this was very true. Everyone tried > hard to write code that used APL "vectors" > (somewhat similar to Euphoria sequences, but more restricted), > rather than APL scalars (atoms), since you could often > speed up a program by a factor of 10x to 100x that way. > > APL also had a richer set of primitive operations on vectors > than Euphoria has. e.g. it had things like rotate-right, > matrix transpose etc. It also had a feature that let you > apply an operation to a whole vector. e.g. +/x would add > up all the elements in x. It also let you select elements > from a vector by providing another vector of 1's and 0's > which typically was created by relational ops (< > = etc.). > x = y, where x and y are vectors (sequences) was thus > very useful in APL. Less so, in Euphoria. > > The first working version of the Euphoria interpreter > was about 40 times slower than today's interpreter > at simple integer:integer ops. > > sequence:sequence ops were not much slower than today however, > since they are handled by run-time routines written in C, > so it was usually much faster to use sequence ops to do something, > than to write a Euphoria loop, and do the job one atom at a time. > > However, I soon became obsessed with speed, and surprised > even myself with how much faster I was able to > make the interpreter, eventually reducing the overhead > per IL op down to just a few machine instructions. > I always compared Euphoria against C, never benchmarking > against other interpreters until just before v1.0 when > I set up some benchmarks against QBasic, and was very surprised > at how much faster Euphoria was. I later compared against > other popular interpreters and was also surprised. It seemed > like other developers just didn't care much about interpreter speed, > or they added features to their language that were incompatible with > fast execution. > > Many years later, the Euphoria to C translator, boosted the speed > of simple integer:integer ops even more. > > So the situation today is that it's often a bit faster to > write a Euphoria loop, than it is to use sequence ops, > mainly because sequence ops will have some storage > allocation/deallocation overhead. > > Euphoria sequence ops are not used as much as > I originally expected. > > Back in my APL days, I worked (as a summer job in university) > for a large stock broker's research department. I was constantly > doing analysis of time series data, such as the closing > daily stock price of some company going back many years. > We had various theories to test, computing correlation coefficients > on the data etc. It was very convenient to use APL vectors > (Euphoria sequences). It eliminated a lot of loops, > and made the code very concise, not to mention faster. > > Later in my career I found myself working with a powerful > parallel SIMD (single-instruction stream, multiple data stream) > computer used in analyzing sonar signals bouncing off submarines. > It had 8 CPU's working in lock-step, executing the same > instruction on 8 streams of data. It was the fastest > machine in the world at computing FFTs (Fast Fourier Transforms). > I worked on a language for that machine, where, naturally > SIMD (i.e. sequence) operations predominated. > > I think the people on this list who are interested > in language design may not have as much use for SIMD operations > as other application writers who work in the areas of business > data processing, statistics, or scientific computing, where > the algorithms are often fairly simple (i.e. boring), > but you are applying those algorithms to large amounts of > real-world data. Hobbyists are not usually interested in > applications like that, and don't have access to large streams > of data that they care about. > > So what should we do? > Thats up to you! > > Regards, > Rob Craig > Rapid Deployment Software > <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a> Hi Rob. What I think is that maybe is is possible to increase the speed of sequence operations such as to beat loops. At least, this is how another APL-derived language (J) works. In J, sequence / array operations are much faster than their loop counterparts. Regards.
3. Re: Sequence Ops
- Posted by CChris <christian.cuvier at agricultu?e.gouv.fr> Jul 15, 2007
- 597 views
- Last edited Jul 16, 2007
Let me state what I think we should do. Because "sequence operations" are diverse, I don't think an all-encompassing solution would work, whatever it is. The contexts where the various operations are used hardly overlap, so that forcing a unniform meaning across them just doesn't make sense for me. 1/ [arithmetic op_]assignments: lhs=, +=, -=, *=, /= rhs I have always worked in mathematics and statustucs. As a result, I completely share Rob's views on these operations as extremely useful to keep the source files readable as well as providing a very needed flexibility. Note for Pete: I'd probably stop using Euphoria altogether if 1+{1,2,3} ever started to return anything other than {2,3,4}, even without an error, because that's one of its current strengths. Using sq_add() and friends just means more typing for no benefit. That's the single construct I use most when doing scientifical/statistical computing. However, because CPUs don't work exactly as they used to 18 years ago, a good thing to do would be to assess whether, when the rhs is an atom, the above sequence ops are indeed faster than loops. If the benchmarks say no, then we might have to reconsider their being in the language. If the answer is no when the rhs is a sequence, I don't think it is much of a problem, because it is a less typical use. There is a moblem, however, with the semantics of "s[p..q] = x". When x is an atom, this is clear, and assigns a few contiguous elements the same value. But when x is a sequence, the semantics change to mean "set a subsequence of s to x", and the lengths have better match. Perhaps this point should be emphasized more in the docs, and the awkward, error prone <euocde>s[p..q]=repeat(x,q-p+1)</eucode> {{{ be mentioned as the way to do the possibly more natural thing: assigning a sequence value to contiguous elements. I had suggested s[p..q][]=x a few years ago, see the corresponding thread. 2/ The &= operator. This one is a special case, because it doesn't apply a transformation to each element of a sequence as the above, but takes the lhs as a whole. I wish the former could be done too, and it has been requested a couple times on th list. Since "seq[p..q] &= x" is always illegal (except when x is "", in which case seq remains unchanged), why not use this idiom to apply a "&" to each element of a sequence? We'd have
sequence s s={1,2,3} ? s & 0 -- {1,2,3,0} ? s[1..3] & 0 -- {{1,0},{2,0},{3,0}}, currently errors out because lengths don't match
Alternatively, as above, s[]&=x could be used too. That syntax would also allow some useful stiff, like }}} <eucode>s=atom(x[])</eucode> {{{ returning a sequence of 1 and 0s more useful than the well known 0. 3/ Relational ops: (lhs =, !=, <,<=,>,>= rhs). Here comes the highly debated stuff. Once again, I fully agree with the vector interpretation being by far the most useful. However, inside an if or while block header, this interpretation wouldn't make sense. This is what Pete is gtiping about, with some reason, because the result cannot be a vector, contrary to one would expect in the general case. I don't see any issue in having the relational operators behave differently whether they are rhs or not. Hence:
if "Pete" = "Lomax" then w/eucode> would be optimised out because the conditional clause is known to be the FALSE constant, while }}} <eucode> s= ("Pete" = "Lomax") w/eucode> won't work because lengths don't match. compare() and equal() can still be used if you need the boolean in a rhs. Note that appearing in a routine argument is the same as being a rhs, since the actual value is being used. So, if you really wish to crash your program, you could still do this: }}} <eucode> funcion id(object x) return x end function if id("Pete "="Lomax") then -- ...
Here, because the = operator defines part of a rhs, id() will return a sequence, and this triggers a run time error because an atom i expecyed. That's how I view sequence operations after using them a fair amount over years, and experienced strengths and weaknesses of the current scheme. There are improvements to make. But remving sequence operations from the language would certainly cripple it beyond repair, and Eu doesn't need that. CChris
4. Re: Sequence Ops
- Posted by jxliv7 <jxliv7 at hotm?il.com> Jul 15, 2007
- 578 views
- Last edited Jul 16, 2007
to Rob and all the numerous others debating Euhporia: i'm wondering if these sequence operations described might be useful in multiple processor applications. that leads to some interesting parallel processing code that might make Euphoria ahead of and simpler than what i see out there. i say "see" because it's been too many years since i have wrapped my mind around machine code, assembler, and all those details to "do". i also look at the few 64-bit changes that are at the machine level/compiler level and again see where Euphoria could be more useful to more programmers. i'm going to have to adjust my time so i spend less time painting canvases and get back into coding. Aside from sex, computers were my first love. i do have a question: instead of changing the core syntax/operation of Euphoria, wouldn't it be better/easier to put together include/DLL/API aps that to the particular things you want...? couldn't enhancements (like == or +/x) be interpreted at that level or would it have to be something that has to be done at another further level...? regards, etc. -- jon
5. Re: Sequence Ops
- Posted by George Walters <gwalters at s?.rr.com> Jul 16, 2007
- 591 views
Geez, I never new anyone who programmed in APL. I used to do that on a big IBM many yeara ago!!
6. Re: Sequence Ops
- Posted by Ricardo Forno <ricardoforno at tutopia.co?> Jul 16, 2007
- 549 views
George Walters wrote: > > Geez, I never new anyone who programmed in APL. I used to do that on a big > IBM many yeara ago!! From about 1965 up to the middle 80's, many of my programs were written in APL. Regards.
7. Re: Sequence Ops
- Posted by Al Getz <Xaxo at ao?.?om> Jul 16, 2007
- 573 views
CChris wrote: > > > Let me state what I think we should do. Because "sequence operations" are > diverse, > I don't think an all-encompassing solution would work, whatever it is. The > contexts > where the various operations are used hardly overlap, so that forcing a > unniform > meaning across them just doesn't make sense for me. > > 1/ [arithmetic op_]assignments: lhs=, +=, -=, *=, /= rhs > > I have always worked in mathematics and statustucs. As a result, I completely > share Rob's views on these operations as extremely useful to keep the source > files readable as well as providing a very needed flexibility. > > Note for Pete: I'd probably stop using Euphoria altogether if 1+{1,2,3} ever > started to return anything other than {2,3,4}, even without an error, because > that's one of its current strengths. Using sq_add() and friends just means > more > typing for no benefit. That's the single construct I use most when doing > scientifical/statistical > computing. > > However, because CPUs don't work exactly as they used to 18 years ago, a good > thing to do would be to assess whether, when the rhs is an atom, the above > sequence > ops are indeed faster than loops. If the benchmarks say no, then we might have > to reconsider their being in the language. If the answer is no when the rhs > is a sequence, I don't think it is much of a problem, because it is a less > typical > use. > > There is a moblem, however, with the semantics of "s[p..q] = x". When x is an > atom, this is clear, and assigns a few contiguous elements the same value. But > when x is a sequence, the semantics change to mean "set a subsequence of s to > x", and the lengths have better match. Perhaps this point should be emphasized > more in the docs, and the awkward, error prone > <euocde>s[p..q]=repeat(x,q-p+1)</eucode> {{{ > be mentioned as the way to do the possibly more natural thing: assigning a > sequence > value to contiguous elements. > > I had suggested s[p..q][]=x a few years ago, see the corresponding thread. > > 2/ The &= operator. > > This one is a special case, because it doesn't apply a transformation to each > element of a sequence as the above, but takes the lhs as a whole. I wish the > former could be done too, and it has been requested a couple times on th list. > > Since "seq[p..q] &= x" is always illegal (except when x is "", in which case > seq remains unchanged), why not use this idiom to apply a "&" to each element > of a sequence? > We'd have > }}} <eucode> > sequence s > s={1,2,3} > ? s & 0 -- {1,2,3,0} > ? s[1..3] & 0 -- {{1,0},{2,0},{3,0}}, currently errors out because lengths > don't match > </eucode> {{{ > Alternatively, as above, s[]&=x could be used too. That syntax would also > allow some useful stiff, like }}} <eucode>s=atom(x[])</eucode> {{{ > returning a </font><font color="#FF00FF">sequence </font><font > color="#330033">of 1 </font><font color="#0000FF">and </font><font > color="#330033">0s more useful than the well > known 0.</font> > > 3/ Relational ops: (lhs =, !=, <,<=,>,>= rhs). > > Here comes the highly debated stuff. > > Once again, I fully agree with the vector interpretation being by far the most > useful. > > However, inside an if or while block header, this interpretation wouldn't make > sense. This is what Pete is gtiping about, with some reason, because the > result > cannot be a vector, contrary to one would expect in the general case. > > I don't see any issue in having the relational operators behave differently > whether they are rhs or not. Hence: > }}} <eucode> > if "Pete" = "Lomax" then > w/eucode> > would be optimised out because the conditional clause is known to be the FALSE > constant, > while > }}} <eucode> > s= ("Pete" = "Lomax") > w/eucode> > won't work because lengths don't match. compare() > and equal() can still be used if you need the boolean > in a rhs. Note that appearing in a routine argument is the same as being a > rhs, since the actual value is being used. > So, if you really wish to crash your program, you could still do this: > }}} <eucode> > funcion id(object x) return x end function > if id("Pete "="Lomax") then -- ... > </eucode> {{{ > Here, because the = operator defines part of a rhs, id() will return a > sequence, > and this triggers a run time error because an atom i expecyed. > > That's how I view sequence operations after using them a fair amount over > years, > and experienced strengths and weaknesses of the current scheme. There are > improvements > to make. But remving sequence operations from the language would certainly > cripple > it beyond repair, and Eu doesn't need that. > > CChris Hello, I think i agree that 1+{1,2,3} should stay the same too, as changing it now would break code, but im not really sure just how much i actually used this in the past. I think a few times but that's it. The reason is because i would seldom have the need to add two things whos type were not the same. Usually, when i add two things it's because they originated from two other sources, and both sources would be basically the same. Still, it is very handy when you want to act on every element in the sequence. Take care, Al E boa sorte com sua programacao Euphoria! My bumper sticker: "I brake for LED's" From "Black Knight": "I can live with losing the good fight, but i can not live without fighting it". "Well on second thought, maybe not."