1. Pass by Reference
- Posted by DerekParnell (admin) Sep 11, 2009
- 2491 views
- Last edited Sep 12, 2009
Forked from Re: switch design
Curious! Ok, let's take you up on your raison d'etre. You participate in the Euphoria forum because you want the help make Euphoria less of a ""sucking programming language". Fine and good ...
Tell us one thing that you feel must change or improve in Euphoria right now. Something that really can't wait any longer. And tell us why it needs to change.
See our discussion about stacks. Some kind of reference semantics in addition to copying semantics. Because the current workaround of using a global sequence variable + index is everywhere in Eu code. It needs to change because:
- It's awfully inconvenient.
- It encourages memory leaks.
- Copy semantics easily introduces O(n) complexity where O(1) would suffice.
Lack of references is the biggest issue with Euphoria right now, it really can't wait any longer.
2. Re: Pass by Reference
- Posted by jaygade Sep 11, 2009
- 2460 views
- Last edited Sep 12, 2009
I know I've lobbied for some kind of "var_id" mechanism before, but good copy-on-write semantics (which I believe that Euphoria has?) and an optimization of the case "foo = func(foo)" should take care of most pass-by-reference needs, IMO.
3. Re: Pass by Reference
- Posted by mattlewis (admin) Sep 12, 2009
- 2374 views
I know I've lobbied for some kind of "var_id" mechanism before, but good copy-on-write semantics (which I believe that Euphoria has?) and an optimization of the case "foo = func(foo)" should take care of most pass-by-reference needs, IMO.
There are some cases where that's possible. I think one place where reference semantics are probably necessary is when you start dealing with OO. Consider the case where you have a member procedure that does something with private data. In this case, you may not even consider this to be passing by reference, but it really is, from a low level perspective. However, I think that it's perfectly natural, and not likely to confuse anyone. (This assumes, of course, that some sort of OO is eventually added to euphoria).
There are certainly benefits to easily passing references, and if you came from another language, it wouldn't surprise me that this was something you found extremely lacking. If we add something like this, we have to be careful to keep it 'euphorian'.
One aspect of that is to avoid making it too complicated, or difficult to use. That's a difficult task. While we've talked about the theoretical performance implications of COW, they appear to mainly be just thattheoretical. I'm sure we could create a situation where it was a real performance killer, but I think that in real applications, it's not.
So the real reason for adding it should be that it makes certain tasks easier (which it certainly will) without making other things more difficult, or causing drastic changes to the language as a whole.
Things like var_id(), and other reflection like techniques are more likely to surface when we get into dynamically evaluated code. It might be a reasonable workaround for a more native implementation of PBR.
Matt
4. Re: Pass by Reference
- Posted by DerekParnell (admin) Sep 12, 2009
- 2355 views
Lack of references is the biggest issue with Euphoria right now
Can I just concentrate on Pass by Reference (PBR) for now rather than references in general.
background
PBR means that when we make a function call, any argument that is passed by reference has a reference to the data given to the routine rather than the actual data itself. The routine can use the reference as if the data was really supplied to it.
The main reasons for doing this are ...
- speed
- allowing a routine to modify arguments whose scope is outside the routine
- defining routines that create data whose life continues after the routine has returned
In languages such as C/C, passing a argument data that is larger than what can be held in a CPU register involves copying data to the call stack. And if that data is changed by the routine, it might have to copy the updated data back to the original location. To avoid this copying, programmers usually pass the address of the data (reference) to the routine thus giving the routine access to the data in its actual location in RAM. This is a lot faster than copying.
A routine that wants to modify any of its arguments, such that the change is still in effect after the routine returns, has to do one of two things.
- Accept a copy of the data, modify it and then return the modified data
- Accept a reference to the data's location and directly modify it in-place.
If a routine is creating new data and wants to return that to its caller, it also has two options...
- Create the data in the routine's stack space and return a copy of it. This slows things down for large data items.
- Create the data in the heap and return a reference to it. Much faster operation. Note: You shouldn't return a reference to the routine's stack space as this space disappears when the routine returns.
So in summary, references work much faster than copying larger data items and the only way a routine can update (anonymous) data is by getting a reference to it. By anonymous I mean data that is stored in a variable whose name is unknown to the routine.
issues
- Immutable Data
Can this be done completely during parse-time or must there be some action at run-time to ensure the integrity of immutable data? If done at run-time, we lose some, and possibly all, of the speed advantage.
- View-Only Data
How does a language prevent a routine from modifying such data if it is passed by reference?
- Transitive Immutability
currently
Actually, Euphoria already uses PBR automatically. However it always ensures that any modifications a routine does to an argument does not affect any other part of the application. It does this by making a copy of the argument just before changing it, and this copy is never implicitly returned by the routine.
This means that you can call a routine, passing a sequence of any length, and it always takes the same time regardless of how big the sequence is. This is because a reference to the sequence is really passed, and if the routine never changes the argument, no copy is done either. Also, if a routine creates a sequence and returns it, what is really returned is a reference to the new sequence so no further copying is done.
However, to enable explicit PBR in order for a routine to modify data that lives outside of the routine, we will have to add some complexity to both the internals and to the syntax, to ensure that Euphoria has enough knowledge about the coder's intentions.
I'm not saying this should be avoided, but just noting that the is no free lunch involved either.
One way to implement it might be to have all variables as view-only by default and for those ones that the coder wishes routines to be able to modify, mark them at declaration time as mutable. E.g
mutable sequence FileName sequence BaseFile . . . GetFile( FileName ) -- Fills in the value of FileName. BaseFile = filebase(FileName) -- BaseFile can only be changed by assignment
Anyhow, there is a lot more discussion yet required on this topic.
5. Re: Pass by Reference
- Posted by jaygade Sep 12, 2009
- 2337 views
Wow. You covered that very well, Derek.
I've got nothing to add really, though I'll wait for further discussion. I like this suggestion at its face value, though.
6. Re: Pass by Reference
- Posted by Critic Sep 12, 2009
- 2340 views
One aspect of that is to avoid making it too complicated, or difficult to use. That's a difficult task. While we've talked about the theoretical performance implications of COW, they appear to mainly be just thattheoretical. I'm sure we could create a situation where it was a real performance killer, but I think that in real applications, it's not.
Perhaps you're right. I am still suspicious.
Things like var_id(), and other reflection like techniques are more likely to surface when we get into dynamically evaluated code. It might be a reasonable workaround for a more native implementation of PBR.
var_id() has a potential problem that routine_id does not have: What if I take a reference to a variable that is allocated on the call stack? The variable disappears after the call, but how is the reference invalidated? Other pass by reference syntaxes do not necessarily have this issue.
7. Re: Pass by Reference
- Posted by Critic Sep 12, 2009
- 2334 views
Can I just concentrate on Pass by Reference (PBR) for now rather than references in general.
You can, but references in general are more powerful than just PBR. AFAIK the current Euphoria implementation avoids building an abstract syntax tree (AST). However an AST (or some other graph structure) would provide a much better infrastructure for various optimizations. But currently Euphoria lacks references and without them building an AST is tedious. PBR does not really help for graph structures.
Can this be done completely during parse-time or must there be some action at run-time to ensure the integrity of immutable data? If done at run-time, we lose some, and possibly all, of the speed advantage.
Not sure, if I understand you completely, but Pascal does it at compile-time: Constants cannot be passed to var/PBR parameters.
How does a language prevent a routine from modifying such data if it is passed by reference?
Pascal managed to do it at compile-time in the 70's.
If data passed to a routine is deemed to be unchangeable by the routine, but that data contains references to other data, is the routine allowed to modify the other data?
Yes.
However, to enable explicit PBR in order for a routine to modify data that lives outside of the routine, we will have to add some complexity to both the internals and to the syntax, to ensure that Euphoria has enough knowledge about the coder's intentions.
I'm not saying this should be avoided, but just noting that the is no free lunch involved either.
Another solution is to introduce a new object datatype that provides OOP features and are always references, just like in Java. Or copy Lua's tables (including its reference semantics) that are more dynamic.
9. Re: Pass by Reference
- Posted by mattlewis (admin) Sep 12, 2009
- 2304 views
Things like var_id(), and other reflection like techniques are more likely to surface when we get into dynamically evaluated code. It might be a reasonable workaround for a more native implementation of PBR.
var_id() has a potential problem that routine_id does not have: What if I take a reference to a variable that is allocated on the call stack? The variable disappears after the call, but how is the reference invalidated? Other pass by reference syntaxes do not necessarily have this issue.
True. That's another area that has to be thought out before it could happen, and just for reasons like that.
Matt
10. Re: Pass by Reference
- Posted by jaygade Sep 12, 2009
- 2293 views
True. That's another area that has to be thought out before it could happen, and just for reasons like that.
Matt
Fair point, but couldn't it just be limited to top-level variables that never go out of scope?
I guess I don't understand enough either about the difference between how the interpreter handles variables and how translated code handles variables.
The action of taking a var_id of something could even be used to make the variable permanent in scope, marking it as such and moving it to the heap if necessary?
11. Re: Pass by Reference
- Posted by DerekParnell (admin) Sep 12, 2009
- 2291 views
Can I just concentrate on Pass by Reference (PBR) for now rather than references in general.
You can, but references in general are more powerful than just PBR. AFAIK the current Euphoria implementation avoids building an abstract syntax tree (AST). However an AST (or some other graph structure) would provide a much better infrastructure for various optimizations. But currently Euphoria lacks references and without them building an AST is tedious. PBR does not really help for graph structures.
References do not have to be addresses. The new eumem.e module can be used to model virtual address spaces in which graphs and trees can be managed just as you would in C using RAM addresses. Plus with added safety and flexibility.
Can this be done completely during parse-time or must there be some action at run-time to ensure the integrity of immutable data? If done at run-time, we lose some, and possibly all, of the speed advantage.
Not sure, if I understand you completely, but Pascal does it at compile-time: Constants cannot be passed to var/PBR parameters.
That's right. And how does the Pascal compiler know which arguments are PBR and which are not? It knows because the coder specifically designates which are which. That means new syntax and more to learn and more complexity in coding. There are trade offs involved and we all still want Euphoria to be a simple-to-use language.
But let's go down that path ... using pretend syntax for now here is a routine that accepts a PBR argument.
procedure toUpper( byref sequence text) for i = 1 to length(text) do if t_character(text[i]) then text[i] = UniCode_Upper(text[i]) end if end for end procedure c
Ok, so this procedure converts characters in the passed sequence into uppercase characters.
sequence Name ... Name = db:fetch("name") toUpper(Name) -- Ok as 'Name' is not immutable. ... include defaults.e Name = db:fetch("name") if length(Name) = 0 then toUpper(DefaultName) -- Not ok as this is a constant in 'defaults.e'. else toUpper(Name) -- Ok as 'Name' is not immutable. end if
Sure, this is a poor way to do it. So how's this ...
if length(Name) = 0 then Name = DefaultName end if toUpper(Name) -- Ok? as 'Name' is not immutable.
Well not exactly. When one sequence is assigned to another, no actual copying of data takes place, just the reference to the original is copied and the reference counter incremented. So now, the call toUpper() would be modifying the constant value.
To avoid this we need to copy the constant data before calling the function. So now the coder must know which things are constants and which are not, know that they have to copy stuff explicitly (sometimes), and know which arguments are PBR ones. The complexity level rises.
I'm not saying that this is too hard to solve, just that by necessity, PBR will bring a little more complexity to the language, rather than make it simpler-to-use.
If data passed to a routine is deemed to be unchangeable by the routine, but that data contains references to other data, is the routine allowed to modify the other data?
Yes.
Ok, but now you add more responsibility onto the coder and make things more complex. For example (and again using pretend syntax) ...
constant FOO = {"bar", "qwerty"} procedure Xyzzy(sequence S) for i = 1 to length(S) do toUpper(S[i]) end for end procedure Xyzzy( byref FOO )
Here we have a a procedure 'Xyzzy' that does not modify the referenced data passed to it (FOO still contains two references and those references are not modified by Xyzzy). But the procedure modifies the data pointed to by FOO. So is FOO a constant or not? Most people would expect that displaying FOO before and after calling Xyzzy() will give the same result.
Again, this can be solved but only by increasing the language's ease-of-use.
However, to enable explicit PBR in order for a routine to modify data that lives outside of the routine, we will have to add some complexity to both the internals and to the syntax, to ensure that Euphoria has enough knowledge about the coder's intentions.
I'm not saying this should be avoided, but just noting that the is no free lunch involved either.
Another solution is to introduce a new object datatype that provides OOP features and are always references, just like in Java. Or copy Lua's tables (including its reference semantics) that are more dynamic.
"Another solution"? - what was the first solution? And what was it solving?
Due to the way that Euphoria datatypes are recorded in RAM during run-time, adding any new datatype will cause some slowness to generally creep into every application, even if they do not use the new datatype.
Again, just noting that there could be trade-offs involved.
12. Re: Pass by Reference
- Posted by Critic Sep 12, 2009
- 2271 views
References do not have to be addresses.
True and irrelevant.
The new eumem.e module can be used to model virtual address spaces in which graphs and trees can be managed just as you would in C using RAM addresses.
I've not looked at eumem.e yet. But I guess C is more convenient in this case.
Plus with added safety and flexibility.
And less speed, I suppose.
That's right. And how does the Pascal compiler know which arguments are PBR and which are not? It knows because the coder specifically designates which are which.
Of course. But so what? Default is pass by value (like it is now), pass by ref can be done with "byref" (or whatever).
There are trade offs involved and we all still want Euphoria to be a simple-to-use language.
No. Lack of reference semantics makes the language easier to learn, but not easier to use, because coding workarounds for the lack of reference semantics is not simple at all. It's tedious and error-prone.
if length(Name) = 0 then Name = DefaultName end if toUpper(Name) -- Ok? as 'Name' is not immutable.
Well not exactly. When one sequence is assigned to another, no actual copying of data takes place, just the reference to the original is copied and the reference counter incremented. So now, the call toUpper() would be modifying the constant value.
Possible solution: The interpreter always copies constants directly. The programmer does not need to bother with this.
Ok, but now you add more responsibility onto the coder and make things more complex. For example (and again using pretend syntax) ...
constant FOO = {"bar", "qwerty"} procedure Xyzzy(sequence S) for i = 1 to length(S) do toUpper(S[i]) end for end procedure Xyzzy( byref FOO )
Here we have a a procedure 'Xyzzy' that does not modify the referenced data passed to it (FOO still contains two references and those references are not modified by Xyzzy). But the procedure modifies the data pointed to by FOO. So is FOO a constant or not? Most people would expect that displaying FOO before and after calling Xyzzy() will give the same result.
I misunderstood your issue. Passing the constant FOO byref should be a compile time error. Same for FOO[1], etc. Everything in FOO is constant.
Again, this can be solved but only by increasing the language's ease-of-use.
You're always talking about ease of learning, dammit!
"Another solution"? - what was the first solution? And what was it solving?
Look, I want "full reference semantics", including the ability to build graphs with cycles. PBR does not give me the full power, so it's not necessary if there'd be a datatype with reference semantics.
Due to the way that Euphoria datatypes are recorded in RAM during run-time, adding any new datatype will cause some slowness to generally creep into every application, even if they do not use the new datatype.
Again, just noting that there could be trade-offs involved.
Yes, probably true.
13. Re: Pass by Reference
- Posted by jacques_desch Sep 12, 2009
- 2253 views
- Last edited Sep 13, 2009
Things like var_id(), and other reflection like techniques are more likely to surface when we get into dynamically evaluated code. It might be a reasonable workaround for a more native implementation of PBR.
var_id() has a potential problem that routine_id does not have: What if I take a reference to a variable that is allocated on the call stack? The variable disappears after the call, but how is the reference invalidated? Other pass by reference syntaxes do not necessarily have this issue.
True. That's another area that has to be thought out before it could happen, and just for reasons like that.
Matt
Can a variable be on the stack if it is not a procedure argument or local variable? If not so I don't see how this can be a problem. At parse time we know that a variable is on stack and so can forbid reference to it. Please show me an exemple where it's not true.
Jacques
14. Re: Pass by Reference
- Posted by DerekParnell (admin) Sep 12, 2009
- 2269 views
- Last edited Sep 13, 2009
At parse time we know that a variable is on stack and so can forbid reference to it.
The problem is not so much that we make a reference to a stack variable, but that we allow that reference to be returned by the routine. For example, a routine could pass a reference to a local variable to another routine and that could be safe. So disallowing references to local vars or arguments is not what is required, but disallowing any such references to exist after the routine ends.
When passing references to stack vars to other routines, it is okay so long as those routines don't store the reference anywhere that will still exist after the original routine ends. This is difficult and requires flow analysis at run-time.
15. Re: Pass by Reference
- Posted by jacques_desch Sep 12, 2009
- 2232 views
- Last edited Sep 13, 2009
At parse time we know that a variable is on stack and so can forbid reference to it.
The problem is not so much that we make a reference to a stack variable, but that we allow that reference to be returned by the routine. For example, a routine could pass a reference to a local variable to another routine and that could be safe. So disallowing references to local vars or arguments is not what is required, but disallowing any such references to exist after the routine ends.
When passing references to stack vars to other routines, it is okay so long as those routines don't store the reference anywhere that will still exist after the original routine ends. This is difficult and requires flow analysis at run-time.
But don't we know at any time that a reference is pointing to stack? If so we can forbid storage or return of that reference without flow analysis?
16. Re: Pass by Reference
- Posted by mattlewis (admin) Sep 12, 2009
- 2248 views
- Last edited Sep 13, 2009
Can a variable be on the stack if it is not a procedure argument or local variable? If not so I don't see how this can be a problem. At parse time we know that a variable is on stack and so can forbid reference to it. Please show me an exemple where it's not true.
Yes, we'd know this at parse time, and simply only allowing var_id references to top level variables is the easiest, and probably least error prone way to implement this.
Matt
17. Re: Pass by Reference
- Posted by ghaberek (admin) Sep 13, 2009
- 2305 views
I really think PBR could be optimized into passing an object to a function and assigning its result to that same object, like:
foo = func(foo)
To me, this seems seamless, if not elegant in its simplicity. It's an inherent and almost understood behavior. To elaborate, I have never thought that passing an object by reference for modification was all that good of an idea...
proc(foo)
Here, it seems to me that the object just goes off into la-la land, never to be seen again. Something happens, but we don't care to know what, as we're not asking for a return value. If a function modifies an object, I should get that object back as the return. If I need to modify a sequence of objects, then I can pass multiple objects in a sequence:
sequence objs = {1, 2, 3, 4} objs = func(objs)
-Greg
18. Re: Pass by Reference
- Posted by Critic Sep 13, 2009
- 2231 views
I really think PBR could be optimized into passing an object to a function and assigning its result to that same object, like:
foo = func(foo)
In practice, "foo" often is something more complex like "universe[x][y][z][t]":
universe[x][y][z][t] = func(universe[x][y][z][t])
Now you have to visually compare the left and right expressions and see if they are the same.
func(byref universe[x][y][z][t])
With this syntax it's clear something is modified too. This really is the same as universe[x][y][z][t] = universe[x][y][z][t] + 1 vs. universe[x][y][z][t] += 1.
Euphoria supports the += operator for a good reason IMHO.
If I need to modify a sequence of objects, then I can pass multiple objects in a sequence:
sequence objs = {1, 2, 3, 4} objs = func(objs)
Except that the code probably goes on with:
object x = objs[1] object y = objs[2] ...
It's tedious.
19. Re: Pass by Reference
- Posted by jemima Sep 13, 2009
- 2221 views
I really think PBR could be optimized into passing an object to a function and assigning its result to that same object, like:
foo = func(foo)
In practice, "foo" often is something more complex like "universe[x][y][z][t]":
universe[x][y][z][t] = func(universe[x][y][z][t])
Now you have to visually compare the left and right expressions and see if they are the same.
func(byref universe[x][y][z][t])
With this syntax it's clear something is modified too.
Unfortunately it might be clear when you are reading the function, but not when you are reading the call to the function.
This really is the same as universe[x][y][z][t] = universe[x][y][z][t] + 1 vs. universe[x][y][z][t] += 1.
Euphoria supports the += operator for a good reason IMHO.
IMO, the equivalent function call syntax would be something like
universe[x][y][z][t] = func($1)
Where $1 means the first (or only) item being assigned on the LHS.
I believe PBR can be achieved by playing with reference counts. Consider the following code:
function notPBR(object x) ... end function thing = notPBR(thing) object x procedure PBR() ... end procedure x = thing thing = 0 PBR() thing = x x = 0
Clearly this is not an elegant way to actually code, but PBR() can modify x without COW overheads, whereas nonPBR() cannot. Perhaps you may need to write an example based on the above to prove the performance difference.
Then you need to think about how to get the compiler to safely mimic this code from some much simpler shorthand syntax, in particular allow a variable to be passed as a parameter and cleared or not ref-counted in one step.
20. Re: Pass by Reference
- Posted by Critic Sep 13, 2009
- 2176 views
In practice, "foo" often is something more complex like "universe[x][y][z][t]":
universe[x][y][z][t] = func(universe[x][y][z][t])
Now you have to visually compare the left and right expressions and see if they are the same.
func(byref universe[x][y][z][t])
With this syntax it's clear something is modified too.
Unfortunately it might be clear when you are reading the function, but not when you are reading the call to the function.
func(byref universe[x][y][z][t])
is the call to the function.
But much Eu code is full of the usage of global variables, so the whole argument "but I want to see immediately if something is modified" is a moot point anyway.
21. Re: Pass by Reference
- Posted by mattlewis (admin) Sep 13, 2009
- 2200 views
But much Eu code is full of the usage of global variables, so the whole argument "but I want to see immediately if something is modified" is a moot point anyway.
I'm not sure you understood the original point. It wasn't about "something is modified." It was specifically about the arguments passed to the routine. Any language (Ok, maybe excepting pure functional) will allow changing some variable not within the scope of the routine.
Also, I'd quibble about your use of the term "global variables." While it's true that there is lots of code using global variables (of course, prior to 4.0, it was either local or global), I think you mean top level variables.
Even so, I don't think this criticism makes a lot of sense. Those top level variables are often the procedural equivalent to an object's member data in OO languages.
Also, your concerns don't mean a whole lot (in terms of expectations of effects of code). We're talking here about what a euphoria programmer expects. Since you've reminded us many times, those are words that will never describe you. I guess you're more of a Pascal programmer, where reference semantics and therefore expectation have existed for a while. A euphoria programmer does not expect for any arguments that are passed to a euphoria routine to be modified.
Even in languages like Java, there is a fair amount of confusion about how arguments are passed (hint: they are passed by value).
Matt
22. Re: Pass by Reference
- Posted by Critic Sep 13, 2009
- 2173 views
I'm not sure you understood the original point. It wasn't about "something is modified." It was specifically about the arguments passed to the routine. Any language (Ok, maybe excepting pure functional) will allow changing some variable not within the scope of the routine.
Fair enough.
Also, I'd quibble about your use of the term "global variables."
Everybody except some Euphorians calls it global variables. Of course it refers to lifetime, not necessarily scope.
Even so, I don't think this criticism makes a lot of sense. Those top level variables are often the procedural equivalent to an object's member data in OO languages.
Maybe, but in contrast to objects in OO languages, the content of global variables doesn't get garbage collected and Euphoria code is often not re-entrant due to globals. So it's clearly technically inferior.
Even in languages like Java, there is a fair amount of confusion about how arguments are passed (hint: they are passed by value).
I know. I've said multiple times, PBR is not necessary if objects have reference semantics.
23. Re: Pass by Reference
- Posted by jeremy (admin) Sep 13, 2009
- 2158 views
But much Eu code is full of the usage of global variables, so the whole argument "but I want to see immediately if something is modified" is a moot point anyway.
Old Euphoria code as that was taught somehow to Euphoria programmers . We are trying to set a right example now with 4.0. I think every single global that was in Euphoria std libs and interpreter/translator code is gone.
Jeremy
24. Re: Pass by Reference
- Posted by mattlewis (admin) Sep 13, 2009
- 2205 views
Also, I'd quibble about your use of the term "global variables."
Everybody except some Euphorians calls it global variables. Of course it refers to lifetime, not necessarily scope.
I disagree, as does Wikipedia:
In computer programming, a global variable is a variable that is accessible in every scope.
I'll allow that other languages may refer to anything outside of an object as 'global', and in some languages, like C, there's not much difference. Nevertheless, and regardless what your favorite language may be, this is a forum for discussing euphoria. A better analogy might be the static variable.
Matt