1. Bug in get() and value(): embedded comments
- Posted by CChris <christian.cuvier at agr?culture.gouv.f?> Jul 25, 2007
- 633 views
The Euphoria documentation on value() is a bit fuzzy, so that the bug mpay have gone unnoticed: * In the description, it reads: "Read the string representation of a Euphoria object, and ..." The definite article "the" would suggest that there is only one string representation for an Euphoria object, and one would expect that this is the one returned by sprint(). * In the comments, it reads: "After reading one valid representation of a Euphoria object, ..." So now there is the possibility of several valid representations for an object; otherwise, "the" would have been used again instead. And indeed, value() can cope with spaces inside a string that represents a sequence, so that the interpretation from the Comments section would seem to be the one to take into account. Since
constant cst={1, -- first 2}
is a valid Euphoria statement (it doesn't trigger a compile error, and ?cst displays {1,2} as expected), one would infer that "{1, -- first\n 2" is a valid representation of an Euphoria object, namely {1,2}. As a consequence, value("{1, -- first\n 2") should return {0,{1,2}} according to the documentation, the inconsistency mentioned above notwithstanding. Yet it returns {1,0}, or {1,0,7,0} using a recent extension. I'll commit a fix next weekend, or next week. It will correct both get() and value(), and requires only changes in get.e. CChris
2. Re: Bug in get() and value(): embedded comments
- Posted by Jason Gade <jaygade at y?hoo?com> Jul 25, 2007
- 587 views
What do comments have to do with value() or Euphoria objects? What do valid Euphoria statements have to do with converting between strings and objects? I'm confused at the need for this. -- "Any programming problem can be solved by adding a level of indirection." --anonymous "Any performance problem can be solved by removing a level of indirection." --M. Haertel "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
3. Re: Bug in get() and value(): embedded comments
- Posted by Jason Gade <jaygade at ya?o?.com> Jul 25, 2007
- 588 views
CChris wrote: > * In the comments, it reads: > "After reading one valid representation of a Euphoria object, ..." > So now there is the possibility of several valid representations for an > object; > otherwise, "the" would have been used again instead. > > And indeed, value() can cope with spaces inside a string that represents a > sequence, > so that the interpretation from the Comments section would seem to be the one > to take into account. BTW, I couldn't find this comment in the source of get.e. I would assume, though, that the word "one" above was used as opposed to the word "two" or "three" or whatever. I really don't think that it was intended to imply more than one valid representation of an object. -- "Any programming problem can be solved by adding a level of indirection." --anonymous "Any performance problem can be solved by removing a level of indirection." --M. Haertel "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
4. Re: Bug in get() and value(): embedded comments
- Posted by Andy Serpa <ac at onehorses?y?com> Jul 25, 2007
- 616 views
CChris wrote: > > > I'll commit a fix next weekend, or next week. It will correct both get() and > value(), and requires only changes in get.e. > I don't think that is a bug, and if you "fix" it you'll be introducing a bug as far as I'm concerned. If you find the documentation confusing, fix the documentation. value() and get() shouldn't be stripping out comments -- those functions read Euphoria objects, not Euphoria statements.
5. Re: Bug in get() and value(): embedded comments
- Posted by CChris <christian.cuvier at agriculture.g?uv.?r> Jul 25, 2007
- 621 views
Jason Gade wrote: > > CChris wrote: > > * In the comments, it reads: > > "After reading one valid representation of a Euphoria object, ..." > > So now there is the possibility of several valid representations for an > > object; > > otherwise, "the" would have been used again instead. > > > > And indeed, value() can cope with spaces inside a string that represents a > > sequence, > > so that the interpretation from the Comments section would seem to be the > > one > > to take into account. > > BTW, I couldn't find this comment in the source of get.e. > > I would assume, though, that the word "one" above was used as opposed to the > word "two" or "three" or whatever. I really don't think that it was intended > to imply more than one valid representation of an object. > > -- > "Any programming problem can be solved by adding a level of indirection." > --anonymous > "Any performance problem can be solved by removing a level of indirection." > --M. Haertel > "Premature optimization is the root of all evil in programming." > --C.A.R. Hoare > j. It is not in the source file, but in %EUDIR%\HTML\lib_u_z.htm. CChris
6. Re: Bug in get() and value(): embedded comments
- Posted by CChris <christian.cuvier at agricu??ure.gouv.fr> Jul 25, 2007
- 610 views
Jason Gade wrote: > > What do comments have to do with value() or Euphoria objects? Comments may appear in string representations of Euphoria objects - more precisely, of sequences. > What do valid > Euphoria statements have to do with converting between strings and objects? > Euphoria statements are conveyed by strings, some of which represent Euphoria objects. When a statement is valid, all the substrings it uses are valid too as object representations - otherwise compile would fail. > I'm confused at the need for this. > This is needed for two types of reasons: 1/ It is valid to use comments inside Euphori object representations, yet value() and get() can't read them properly. This is the very definition of a bug. 2/ Don't you think the following is the simplest .ini file format? "{ -- first parameter, bla bla 127, -- second parameter, for other purposes \"myfile.txt\", -- and on and on }" Making this the contents of a config file allows anyone to edit/customise it without the need of a specific interface. Reading the config file would be done by get(), followed by a few assignments from the sequence read to variables intended to receive the persistent values. Writing is hardly more complex using a sprint() and a list of the comment tags. CChris > -- > "Any programming problem can be solved by adding a level of indirection." > --anonymous > "Any performance problem can be solved by removing a level of indirection." > --M. Haertel > "Premature optimization is the root of all evil in programming." > --C.A.R. Hoare > j.
7. Re: Bug in get() and value(): embedded comments
- Posted by Jason Gade <jaygade at yahoo.?o?> Jul 25, 2007
- 598 views
I'm not so much opposed to the idea, it just seems like you are stretching the interpretation of what the docs say the functions are supposed to do. I don't think it's a bug in either the docs or in the implementation. Maybe a separate function to do what you want would be a better idea... I dunno. I'll see what others have to say. -- "Any programming problem can be solved by adding a level of indirection." --anonymous "Any performance problem can be solved by removing a level of indirection." --M. Haertel "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
8. Re: Bug in get() and value(): embedded comments
- Posted by CChris <christian.cuvier at agriculture.gou?.?r> Jul 25, 2007
- 593 views
Jason Gade wrote: > > CChris wrote: > > * In the comments, it reads: > > "After reading one valid representation of a Euphoria object, ..." > > So now there is the possibility of several valid representations for an > > object; > > otherwise, "the" would have been used again instead. > > > > And indeed, value() can cope with spaces inside a string that represents a > > sequence, > > so that the interpretation from the Comments section would seem to be the > > one > > to take into account. > > BTW, I couldn't find this comment in the source of get.e. > > I would assume, though, that the word "one" above was used as opposed to the > word "two" or "three" or whatever. I really don't think that it was intended > to imply more than one valid representation of an object. > The number of valid representations of a sequence is almost infinite. Add spaces and tabs wherever you want, it's still valid. So "on" really stands for "one of many". CChris > -- > "Any programming problem can be solved by adding a level of indirection." > --anonymous > "Any performance problem can be solved by removing a level of indirection." > --M. Haertel > "Premature optimization is the root of all evil in programming." > --C.A.R. Hoare > j.
9. Re: Bug in get() and value(): embedded comments
- Posted by CChris <christian.cuvier at agri??lture.gouv.fr> Jul 25, 2007
- 591 views
Andy Serpa wrote: > > CChris wrote: > > > > > > I'll commit a fix next weekend, or next week. It will correct both get() and > > value(), and requires only changes in get.e. > > > > I don't think that is a bug, and if you "fix" it you'll be introducing a bug > as far as I'm concerned. Assertion has to be proved, and the second part is even more unsubstantiated. > If you find the documentation confusing, fix the documentation. Yes, the initial "the" is misleading. > value() and get() shouldn't be stripping out comments -- those functions read > Euphoria objects, not Euphoria statements. They read strings into objects, and these strings may contain comments, as I showed in the code snippet in my original post. CChris
10. Re: Bug in get() and value(): embedded comments
- Posted by Matt Lewis <matthewwalkerlewis at gmail.?om> Jul 25, 2007
- 589 views
- Last edited Jul 26, 2007
CChris wrote: > > Jason Gade wrote: > > > > CChris wrote: > > > * In the comments, it reads: > > > "After reading one valid representation of a Euphoria object, ..." > > > So now there is the possibility of several valid representations for an > > > object; > > > otherwise, "the" would have been used again instead. > > > > > > And indeed, value() can cope with spaces inside a string that represents a > > > sequence, > > > so that the interpretation from the Comments section would seem to be the > > > one > > > to take into account. > > > > BTW, I couldn't find this comment in the source of get.e. > > > > I would assume, though, that the word "one" above was used as opposed to the > > word "two" or "three" or whatever. I really don't think that it was intended > > to imply more than one valid representation of an object. > > > > The number of valid representations of a sequence is almost infinite. Add > spaces > and tabs wherever you want, it's still valid. So "on" really stands for "one > of many". I think you're parsing the sentence incorrectly (and it's somewhat vague). I believe that the meaning of "one valid representation" was in the sense of, "it will only read one object, and then it will stop." It really has nothing to do with the number of ways that you could alter the representation using different combinations of whitespace. While what you're doing is sort of interesting, I think you're getting away from the purpose of the function. It definitely is *not* meant to handle comments. FTFM: "This works the same as get(),..." And following the link to get: "Multiple "top-level" objects in the input stream must be separated from each other with one or more "whitespace" characters (blank, tab, \r or \n). Whitespace is not necessary within a top-level object. A call to get() will read one entire top-level object, plus one additional (whitespace) character." Note: no mention of comments. This would definitely fall under the category of enhancement, or feature request, and not bugs, AFAICT. It certainly sounds useful, but without analyzing the impact (speed, etc), I don't think we should necessarily include it. Others may have other reasons to exclude this, but I'll let them speak for themselves. Matt
11. Re: Bug in get() and value(): embedded comments
- Posted by Andy Serpa <ac at on?ho?seshy.com> Jul 25, 2007
- 604 views
- Last edited Jul 26, 2007
Matt Lewis wrote: > > Others may have other reasons to > exclude this, but I'll let them speak for themselves. > The functions are there to read text representations of Euphoria values/objects, not to parse Euphoria *code*. Now "string evaluation" of code would certainly be very useful (and I think Matt implemented this in an alternate version of the PD source, didn't you?) and that's an enhancement I'd like to see -- but that has nothing to do with these functions which were not designed to handle comments. Chris, you're simply misunderstanding the docs...
12. Re: Bug in get() and value(): embedded comments
- Posted by Derek Parnell <ddparnell at big??nd.com> Jul 26, 2007
- 608 views
CChris wrote: > > The Euphoria documentation on value() is a bit fuzzy, so that the bug mpay > have > gone unnoticed: > * In the description, it reads: > "Read the string representation of a Euphoria object, and ..." > The definite article "the" would suggest that there is only one string > representation > for an Euphoria object, and one would expect that this is the one returned by > sprint(). Well, I suspect you are reading the English a bit too literally here. In this case the definite article "the" does not refer to the one and only possble representation of any given Euphoria object, but to the single representation of the object you are trying to get the value of. The value() function works on single objects, so if you happen to have several objects, the documentation is referring to the specific string representation you are passing to the function, you are trying to convert to an object. > * In the comments, it reads: > "After reading one valid representation of a Euphoria object, ..." > So now there is the possibility of several valid representations for an > object; > otherwise, "the" would have been used again instead. No, the value() function works with a single representation. The comments here are alerting you to the idea that you mustn't expect value() to deal with multiple representations in a single call. > And indeed, value() can cope with spaces inside a string that represents a > sequence, > so that the interpretation from the Comments section would seem to be the one > to take into account. > > Since > }}} <eucode> > constant cst={1, -- first > 2} > </eucode> {{{ > is a valid Euphoria statement (it doesn't trigger a compile error, and ?cst > displays {1,2} as expected), one would infer that "{1, -- first\n 2" is a > valid representation of an Euphoria object, namely {1,2}. No, Euphoria source code is not the same as Euphoria objects. > As a consequence, value("{1, -- first\n 2") should return {0,{1,2}} > according > to the documentation, the inconsistency mentioned above notwithstanding. Yet > it returns {1,0}, or {1,0,7,0} using a recent extension. No, the string "{1, -- first\n 2" is not a valid representation of a Euphoria object. Even if you had of included the final ending brace, it is still not a valie object representation. The normal string representation of the cst identifier in your code would be "{1,2}" > I'll commit a fix next weekend, or next week. It will correct both get() and > value(), and requires only changes in get.e. Please do not do that. It seems that you are misunderstanding the documentation and the concept of string representations. -- Derek Parnell Melbourne, Australia Skype name: derek.j.parnell
13. Re: Bug in get() and value(): embedded comments
- Posted by CChris <christian.cuvier at agriculture.g?uv.?r> Jul 26, 2007
- 594 views
Matt Lewis wrote: > > CChris wrote: > > > > Jason Gade wrote: > > > > > > CChris wrote: > > > > * In the comments, it reads: > > > > "After reading one valid representation of a Euphoria object, ..." > > > > So now there is the possibility of several valid representations for an > > > > object; > > > > otherwise, "the" would have been used again instead. > > > > > > > > And indeed, value() can cope with spaces inside a string that represents > a sequence,</font></i> > > > > so that the interpretation from the Comments section would seem to be > > > > the one > > > > to take into account. > > > > > > BTW, I couldn't find this comment in the source of get.e. > > > > > > I would assume, though, that the word "one" above was used as opposed to > > > the > > > word "two" or "three" or whatever. I really don't think that it was > > > intended > > > to imply more than one valid representation of an object. > > > > > > > The number of valid representations of a sequence is almost infinite. Add > > spaces > > and tabs wherever you want, it's still valid. So "on" really stands for "one > > of many". > > I think you're parsing the sentence incorrectly (and it's somewhat vague). > > I believe that the meaning of "one valid representation" was in the sense of, > "it will only read one object, and then it will stop." It really has nothing > to do with the number of ways that you could alter the representation using > different combinations of whitespace. > While you may be right, it is still true that adding syntactically unnecessary whitespace keeps the string/sequence of bytes valid and readable by get()/value(). Since comments are exactly as (un)necessary as whitespace, they should be treated the same, and currently are not. > While what you're doing is sort of interesting, I think you're getting away > from the purpose of the function. It definitely is *not* meant to handle > comments. > This is debatable. Comments are not explicitly mentioned, but did the author mean they were excluded? I explained righ above why I think they are not, but feedback from the author is needed here. Rob? > FTFM: "This works the same as get(),..." > > And following the link to get: > "Multiple "top-level" objects in the input stream must be separated from > each other with one or more "whitespace" characters (blank, tab, \r or \n). > Whitespace is not necessary within a top-level object. A call to get() > will read one entire top-level object, plus one additional (whitespace) > character." Off the point, since I'm concerned with embedded comments appearing between non top level objects. BTW did you try testing the code in the cchris_get branch? It fixes the extra space issue, and I couldn't measure any performance penalty. If this is confirmed, then it could be a good idea to merge that code, but this needs external confirmation. Executables are available at http://oedoc.free.fr/get_fixed/exw.exe (and ex.exe). > > Note: no mention of comments. This would definitely fall under the category > of enhancement, or feature request, and not bugs, AFAICT. It certainly > sounds useful, but without analyzing the impact (speed, etc), I don't think > we should necessarily include it. Others may have other reasons to > exclude this, but I'll let them speak for themselves. > Since the comment mark is two character long, there will have to be a 1 character lookahead buffer somewhere. If it cannot be done without impacting performance, then we should leave the functions as they are and document the fact that comments are not covered. But this will be assessed from actual code, which isn't written yet. CChris > Matt
14. Re: Bug in get() and value(): embedded comments
- Posted by CChris <christian.cuvier at a?riculture.go?v.fr> Jul 26, 2007
- 599 views
Derek Parnell wrote: > > CChris wrote: > > > > The Euphoria documentation on value() is a bit fuzzy, so that the bug mpay > > have > > gone unnoticed: > > * In the description, it reads: > > "Read the string representation of a Euphoria object, and ..." > > The definite article "the" would suggest that there is only one string > > representation > > for an Euphoria object, and one would expect that this is the one returned > > by > > sprint(). > > > Well, I suspect you are reading the English a bit too literally here. In this > case the definite article "the" does not refer to the one and only possble > representation > of any given Euphoria object, but to the single representation of the object > you are trying to get the value of. The value() function works on single > objects, > so if you happen to have several objects, the documentation is referring to > the specific string representation you are passing to the function, you are > trying to convert to an object. > > > * In the comments, it reads: > > "After reading one valid representation of a Euphoria object, ..." > > So now there is the possibility of several valid representations for an > > object; > > otherwise, "the" would have been used again instead. > > No, the value() function works with a single representation. The comments here > are alerting you to the idea that you mustn't expect value() to deal with > multiple > representations in a single call. > Ok, I think the doc should lift the ambiguity here between "unique representation" and "representation of a single object". It still holds true that any Euphoria sequence has plenty of valid forms, adding whitespace at will. So, the documentation issue is separate (and minor). > > And indeed, value() can cope with spaces inside a string that represents a > > sequence, > > so that the interpretation from the Comments section would seem to be the > > one > > to take into account. > > > > Since > > }}} <eucode> > > constant cst={1, -- first > > 2} > > </eucode> {{{ > > is a valid Euphoria statement (it doesn't trigger a compile error, and ?cst > > displays {1,2} as expected), one would infer that "{1, -- first\n 2" is a > > valid representation of an Euphoria object, namely {1,2}. > > > No, Euphoria source code is not the same as Euphoria objects. No one said that. Euphoria source code is a stream of characters, ie a string. Are you saying that a assignment lvalue = manifest rvalue may be valid even though the rvalue doesn't represent validly an object? This is definitely confusing. > > > As a consequence, value("{1, -- first\n 2") should return {0,{1,2}} > > according > > to the documentation, the inconsistency mentioned above notwithstanding. Yet > > it returns {1,0}, or {1,0,7,0} using a recent extension. > > No, the string "{1, -- first\n 2" is not a valid representation of a > Euphoria > object. Even if you had of included the final ending brace, it is still not > a valie object representation. The normal string representation of the cst > identifier > in your code would be "{1,2}" > Or {1, 2}, or { 1 ,\t2\n }, or ... > > > I'll commit a fix next weekend, or next week. It will correct both get() and > > value(), and requires only changes in get.e. > > Please do not do that. It seems that you are misunderstanding the > documentation > and the concept of string representations. > If I am misunderstanding it, and if this concept is along the lines you suggest, then it is a little confusing, because a string would be valid in some contexts and not in others. In that case, let's not call it a bug, but then it should probably be changed as kludgy and inconsistent. Even more so if you consider that Euphoria is an interpreted language, which implies a very close relationship between a string and its evaluation in code. Note however that this is still quite far from an eval()-like functionality, which has some issues of its own. It would be a timid step in that direction, at most/worst. As I replied to Matt, if a performance impact is unavoidable, then the functions should be left alone and the doc made explicit, independently from the abovementioned ambiguity. CChris > -- > Derek Parnell > Melbourne, Australia > Skype name: derek.j.parnell
15. Re: Bug in get() and value(): embedded comments
- Posted by Robert Craig <rds at Rap?d?uphoria.com> Jul 26, 2007
- 627 views
CChris wrote: > While you may be right, it is still true that adding syntactically > unnecessary > whitespace keeps the string/sequence of bytes valid and readable by > get()/value(). > Since comments are exactly as (un)necessary as whitespace, they should be > treated > the same, and currently are not. > > > While what you're doing is sort of interesting, I think you're getting away > > from the purpose of the function. It definitely is *not* meant to handle > > comments. > > This is debatable. Comments are not explicitly mentioned, but did the author > mean they were excluded? I explained righ above why I think they are not, but > feedback from the author is needed here. Rob? That fact that comments are not currently supported is not a bug or oversight on my part, but I would not object to their being supported, if most people think it would be useful, and if there is no significant performance issue. > Since the comment mark is two character long, there will have to be a 1 > character > lookahead buffer somewhere. If it cannot be done without impacting > performance, > then we should leave the functions as they are and document the fact that > comments > are not covered. But this will be assessed from actual code, which isn't > written > yet. You should test the performance. While optimizing an application a long time ago, I noticed that seek() can be more expensive than you might imagine. If you want to improve the docs, pointing out that there can be many different string representations of the same object, please go ahead. e.g. 3.0 +3.0 3.00 3.000 3e0 are all the same object, and are considered by the language definition to be *exactly* the same as 3. The implementation *might* choose to store 3 differently than 3.0, internally, but it doesn't have to. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
16. Re: Bug in get() and value(): embedded comments
- Posted by CChris <christian.cuvier at agri??lture.gouv.fr> Jul 26, 2007
- 632 views
Robert Craig wrote: > > CChris wrote: > > While you may be right, it is still true that adding syntactically > > unnecessary > > whitespace keeps the string/sequence of bytes valid and readable by > > get()/value(). > > Since comments are exactly as (un)necessary as whitespace, they should be > > treated > > the same, and currently are not. > > > > > While what you're doing is sort of interesting, I think you're getting > > > away > > > from the purpose of the function. It definitely is *not* meant to handle > > > comments. > > > > This is debatable. Comments are not explicitly mentioned, but did the author > > mean they were excluded? I explained righ above why I think they are not, > > but > > feedback from the author is needed here. Rob? > > That fact that comments are not currently supported is not > a bug or oversight on my part, but I would not object to their > being supported, if most people think it would be useful, > and if there is no significant performance issue. > > > Since the comment mark is two character long, there will have to be a 1 > > character > > lookahead buffer somewhere. If it cannot be done without impacting > > performance, > > then we should leave the functions as they are and document the fact that > > comments > > are not covered. But this will be assessed from actual code, which isn't > > written > > yet. > > You should test the performance. > While optimizing an application a long time ago, > I noticed that seek() can be more expensive > than you might imagine. > Given that the performance loss would apply in a lot of cases that wouldn't be concerned by embedded comments, this mod is not allowed to fail the test, as far as I'm concerned. I won't code this before next week, as things stand. Likewise, the code in the cchris_get branch on SVN has been isolated there because I wish several persons to test performance on dfferent machines and platforms before it can be considered harmless, and the quirk removal worthwhile. No feedback so far. Now that I think about it, if the proposed mod doesn't hurt performance, then I probably can remove the get() quirk at the same time for no additional cost. Needs actual coding. > If you want to improve the docs, pointing out that > there can be many different string representations of the same > object, please go ahead. e.g. 3.0 +3.0 3.00 3.000 3e0 are all > the same object, and are considered by the language definition > to be *exactly* the same as 3. The implementation > *might* choose to store 3 differently than 3.0, internally, > but it doesn't have to. > This, or emphasizing that sprint() returns a shortest form which is not unique. I think I'll add a sentence about it to cross the t's. > Regards, > Rob Craig > Rapid Deployment Software > <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a> CChris