1. Bug in get() and value(): embedded comments

The Euphoria documentation on value() is a bit fuzzy, so that the bug mpay have
gone unnoticed:
* In the description, it reads:
"Read the string representation of a Euphoria object, and ..."
The definite article "the" would suggest that there is only one string
representation for an Euphoria object, and one would expect that this is the one
returned by sprint().
* In the comments, it reads:
"After reading one valid representation of a Euphoria object, ..."
So now there is the possibility of several valid representations for an object;
otherwise, "the" would have been used again instead.

And indeed, value() can cope with spaces inside a string that represents a
sequence, so that the interpretation from the Comments section would seem to be
the one to take into account.

Since
constant cst={1,  -- first
   2}

is a valid Euphoria statement (it doesn't trigger a compile error, and ?cst
displays {1,2} as expected), one would infer that "{1,  -- first\n   2" is a
valid representation of an Euphoria object, namely {1,2}.
As a consequence, value("{1,  -- first\n   2") should return {0,{1,2}} according
to the documentation, the inconsistency mentioned above notwithstanding. Yet it
returns {1,0}, or {1,0,7,0} using a recent extension.

I'll commit a fix next weekend, or next week. It will correct both get() and
value(), and requires only changes in get.e.

CChris

new topic     » topic index » view message » categorize

2. Re: Bug in get() and value(): embedded comments

What do comments have to do with value() or Euphoria objects? What do valid
Euphoria statements have to do with converting between strings and objects?

I'm confused at the need for this.

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Bug in get() and value(): embedded comments

CChris wrote:
> * In the comments, it reads:
> "After reading one valid representation of a Euphoria object, ..."
> So now there is the possibility of several valid representations for an
> object;
> otherwise, "the" would have been used again instead.
> 
> And indeed, value() can cope with spaces inside a string that represents a
> sequence,
> so that the interpretation from the Comments section would seem to be the one
> to take into account.

BTW, I couldn't find this comment in the source of get.e.

I would assume, though, that the word "one" above was used as opposed to the
word "two" or "three" or whatever. I really don't think that it was intended to
imply more than one valid representation of an object.

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Bug in get() and value(): embedded comments

CChris wrote:
>
> 
> I'll commit a fix next weekend, or next week. It will correct both get() and
> value(), and requires only changes in get.e.
> 

I don't think that is a bug, and if you "fix" it you'll be introducing a bug as
far as I'm concerned.  If you find the documentation confusing, fix the
documentation.  value() and get() shouldn't be stripping out comments -- those
functions read Euphoria objects, not Euphoria statements.

new topic     » goto parent     » topic index » view message » categorize

5. Re: Bug in get() and value(): embedded comments

Jason Gade wrote:
> 
> CChris wrote:
> > * In the comments, it reads:
> > "After reading one valid representation of a Euphoria object, ..."
> > So now there is the possibility of several valid representations for an
> > object;
> > otherwise, "the" would have been used again instead.
> > 
> > And indeed, value() can cope with spaces inside a string that represents a
> > sequence,
> > so that the interpretation from the Comments section would seem to be the
> > one
> > to take into account.
> 
> BTW, I couldn't find this comment in the source of get.e.
> 
> I would assume, though, that the word "one" above was used as opposed to the
> word "two" or "three" or whatever. I really don't think that it was intended
> to imply more than one valid representation of an object.
> 
> --
> "Any programming problem can be solved by adding a level of indirection."
> --anonymous
> "Any performance problem can be solved by removing a level of indirection."
> --M. Haertel
> "Premature optimization is the root of all evil in programming."
> --C.A.R. Hoare
> j.

It is not in the source file, but in %EUDIR%\HTML\lib_u_z.htm.

CChris

new topic     » goto parent     » topic index » view message » categorize

6. Re: Bug in get() and value(): embedded comments

Jason Gade wrote:
> 
> What do comments have to do with value() or Euphoria objects?

Comments may appear in string representations of Euphoria objects - more
precisely, of sequences.

> What do valid
> Euphoria statements have to do with converting between strings and objects?
> 

Euphoria statements are conveyed by strings, some of which represent Euphoria
objects. When a statement is valid, all the substrings it uses are valid too as
object representations - otherwise compile would fail.

> I'm confused at the need for this.
> 

This is needed for two types of reasons:

1/ It is valid to use comments inside Euphori object representations, yet
value() and get() can't read them properly. This is the very definition of a bug.

2/ Don't you think the following is the simplest .ini file format?
"{
-- first parameter, bla bla
127,
-- second parameter, for other purposes
\"myfile.txt\",
-- and on and on
}"

Making this the contents of a config file allows anyone to edit/customise it
without the need of a specific interface. Reading the config file would be done
by get(), followed by a few assignments from the sequence read to variables
intended to receive the persistent values. Writing is hardly more complex using a
sprint() and a list of the comment tags.

CChris
> --
> "Any programming problem can be solved by adding a level of indirection."
> --anonymous
> "Any performance problem can be solved by removing a level of indirection."
> --M. Haertel
> "Premature optimization is the root of all evil in programming."
> --C.A.R. Hoare
> j.

new topic     » goto parent     » topic index » view message » categorize

7. Re: Bug in get() and value(): embedded comments

I'm not so much opposed to the idea, it just seems like you are stretching the
interpretation of what the docs say the functions are supposed to do.

I don't think it's a bug in either the docs or in the implementation.

Maybe a separate function to do what you want would be a better idea... I dunno.
I'll see what others have to say.

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Bug in get() and value(): embedded comments

Jason Gade wrote:
> 
> CChris wrote:
> > * In the comments, it reads:
> > "After reading one valid representation of a Euphoria object, ..."
> > So now there is the possibility of several valid representations for an
> > object;
> > otherwise, "the" would have been used again instead.
> > 
> > And indeed, value() can cope with spaces inside a string that represents a
> > sequence,
> > so that the interpretation from the Comments section would seem to be the
> > one
> > to take into account.
> 
> BTW, I couldn't find this comment in the source of get.e.
> 
> I would assume, though, that the word "one" above was used as opposed to the
> word "two" or "three" or whatever. I really don't think that it was intended
> to imply more than one valid representation of an object.
> 

The number of valid representations of a sequence is almost infinite. Add spaces
and tabs wherever you want, it's still valid. So "on" really stands for "one of
many".

CChris

> --
> "Any programming problem can be solved by adding a level of indirection."
> --anonymous
> "Any performance problem can be solved by removing a level of indirection."
> --M. Haertel
> "Premature optimization is the root of all evil in programming."
> --C.A.R. Hoare
> j.

new topic     » goto parent     » topic index » view message » categorize

9. Re: Bug in get() and value(): embedded comments

Andy Serpa wrote:
> 
> CChris wrote:
> >
> > 
> > I'll commit a fix next weekend, or next week. It will correct both get() and
> > value(), and requires only changes in get.e.
> > 
> 
> I don't think that is a bug, and if you "fix" it you'll be introducing a bug
> as far as I'm concerned. 

Assertion has to be proved, and the second part is even more unsubstantiated.

> If you find the documentation confusing, fix the documentation.

Yes, the initial "the" is misleading.

>  value() and get() shouldn't be stripping out comments -- those functions read
> Euphoria objects, not Euphoria statements.

They read strings into objects, and these strings may contain comments, as I 
showed in the code snippet in my original post.

CChris

new topic     » goto parent     » topic index » view message » categorize

10. Re: Bug in get() and value(): embedded comments

CChris wrote:
> 
> Jason Gade wrote:
> > 
> > CChris wrote:
> > > * In the comments, it reads:
> > > "After reading one valid representation of a Euphoria object, ..."
> > > So now there is the possibility of several valid representations for an
> > > object;
> > > otherwise, "the" would have been used again instead.
> > > 
> > > And indeed, value() can cope with spaces inside a string that represents a
> > > sequence,
> > > so that the interpretation from the Comments section would seem to be the
> > > one
> > > to take into account.
> > 
> > BTW, I couldn't find this comment in the source of get.e.
> > 
> > I would assume, though, that the word "one" above was used as opposed to the
> > word "two" or "three" or whatever. I really don't think that it was intended
> > to imply more than one valid representation of an object.
> > 
> 
> The number of valid representations of a sequence is almost infinite. Add
> spaces
> and tabs wherever you want, it's still valid. So "on" really stands for "one
> of many".

I think you're parsing the sentence incorrectly (and it's somewhat vague).  
I believe that the meaning of "one valid representation" was in the sense of,
"it will only read one object, and then it will stop."  It really has nothing
to do with the number of ways that you could alter the representation using
different combinations of whitespace.

While what you're doing is sort of interesting, I think you're getting away
from the purpose of the function.  It definitely is *not* meant to handle
comments.

FTFM: "This works the same as get(),..."

And following the link to get:
"Multiple "top-level" objects in the input stream must be separated from 
each other with one or more "whitespace" characters (blank, tab, \r or \n).
Whitespace is not necessary within a top-level object. A call to get() 
will read one entire top-level object, plus one additional (whitespace) 
character."

Note: no mention of comments.  This would definitely fall under the category
of enhancement, or feature request, and not bugs, AFAICT.  It certainly
sounds useful, but without analyzing the impact (speed, etc), I don't think
we should necessarily include it.  Others may have other reasons to 
exclude this, but I'll let them speak for themselves.

Matt

new topic     » goto parent     » topic index » view message » categorize

11. Re: Bug in get() and value(): embedded comments

Matt Lewis wrote:
> 
> Others may have other reasons to 
> exclude this, but I'll let them speak for themselves.
>

The functions are there to read text representations of Euphoria values/objects,
not to parse Euphoria *code*.

Now "string evaluation" of code would certainly be very useful (and I think Matt
implemented this in an alternate version of the PD source, didn't you?) and
that's an enhancement I'd like to see -- but that has nothing to do with these
functions which were not designed to handle comments.  Chris, you're simply
misunderstanding the docs...

new topic     » goto parent     » topic index » view message » categorize

12. Re: Bug in get() and value(): embedded comments

CChris wrote:
> 
> The Euphoria documentation on value() is a bit fuzzy, so that the bug mpay
> have
> gone unnoticed:
> * In the description, it reads:
> "Read the string representation of a Euphoria object, and ..."
> The definite article "the" would suggest that there is only one string
> representation
> for an Euphoria object, and one would expect that this is the one returned by
> sprint().


Well, I suspect you are reading the English a bit too literally here. In this
case the definite article "the" does not refer to the one and only possble
representation of any given Euphoria object, but to the single representation of
the object you are trying to get the value of. The value() function works on
single objects, so if you happen to have several objects, the documentation is
referring to the specific string representation you are passing to the function,
you are trying to convert to an object.

> * In the comments, it reads:
> "After reading one valid representation of a Euphoria object, ..."
> So now there is the possibility of several valid representations for an
> object;
> otherwise, "the" would have been used again instead.

No, the value() function works with a single representation. The comments here
are alerting you to the idea that you mustn't expect value() to deal with
multiple representations in a single call.

> And indeed, value() can cope with spaces inside a string that represents a
> sequence,
> so that the interpretation from the Comments section would seem to be the one
> to take into account.
> 
> Since
> }}}
<eucode>
> constant cst={1,  -- first
>    2}
> </eucode>
{{{

> is a valid Euphoria statement (it doesn't trigger a compile error, and ?cst
> displays {1,2} as expected), one would infer that "{1,  -- first\n   2" is a
> valid representation of an Euphoria object, namely {1,2}.


No, Euphoria source code is not the same as Euphoria objects.

> As a consequence, value("{1,  -- first\n   2") should return {0,{1,2}}
> according
> to the documentation, the inconsistency mentioned above notwithstanding. Yet
> it returns {1,0}, or {1,0,7,0} using a recent extension.

No, the string "{1,  -- first\n   2" is not a valid representation of a Euphoria
object. Even if you had of included the final ending brace, it is still not a
valie object representation. The normal string representation of the cst
identifier in your code would be "{1,2}"

 
> I'll commit a fix next weekend, or next week. It will correct both get() and
> value(), and requires only changes in get.e.

Please do not do that. It seems that you are misunderstanding the documentation
and the concept of string representations.

-- 
Derek Parnell
Melbourne, Australia
Skype name: derek.j.parnell

new topic     » goto parent     » topic index » view message » categorize

13. Re: Bug in get() and value(): embedded comments

Matt Lewis wrote:
> 
> CChris wrote:
> > 
> > Jason Gade wrote:
> > > 
> > > CChris wrote:
> > > > * In the comments, it reads:
> > > > "After reading one valid representation of a Euphoria object, ..."
> > > > So now there is the possibility of several valid representations for an
> > > > object;
> > > > otherwise, "the" would have been used again instead.
> > > > 
> > > > And indeed, value() can cope with spaces inside a string that represents
> a sequence,</font></i>
> > > > so that the interpretation from the Comments section would seem to be
> > > > the one
> > > > to take into account.
> > > 
> > > BTW, I couldn't find this comment in the source of get.e.
> > > 
> > > I would assume, though, that the word "one" above was used as opposed to
> > > the
> > > word "two" or "three" or whatever. I really don't think that it was
> > > intended
> > > to imply more than one valid representation of an object.
> > > 
> > 
> > The number of valid representations of a sequence is almost infinite. Add
> > spaces
> > and tabs wherever you want, it's still valid. So "on" really stands for "one
> > of many".
> 
> I think you're parsing the sentence incorrectly (and it's somewhat vague). 
> 
> I believe that the meaning of "one valid representation" was in the sense of,
> "it will only read one object, and then it will stop."  It really has nothing
> to do with the number of ways that you could alter the representation using
> different combinations of whitespace.
> 

While you may be right, it is still true that adding syntactically  unnecessary
whitespace keeps the string/sequence of bytes valid and readable by
get()/value(). Since comments are exactly as (un)necessary as whitespace, they
should be treated the same, and currently are not.

> While what you're doing is sort of interesting, I think you're getting away
> from the purpose of the function.  It definitely is *not* meant to handle
> comments.
> 

This is debatable. Comments are not explicitly mentioned, but did the author
mean they were excluded? I explained righ above why I think they are not, but
feedback from the author is needed here. Rob?

> FTFM: "This works the same as get(),..."
> 
> And following the link to get:
> "Multiple "top-level" objects in the input stream must be separated from 
> each other with one or more "whitespace" characters (blank, tab, \r or \n).
> Whitespace is not necessary within a top-level object. A call to get() 
> will read one entire top-level object, plus one additional (whitespace) 
> character."

Off the point, since I'm concerned with embedded comments appearing between non
top level objects.

BTW did you try testing the code in the cchris_get branch? It fixes the extra
space issue, and I couldn't measure any performance penalty. If this is
confirmed, then it could be a good idea to merge that code, but this needs
external confirmation. Executables are available at
http://oedoc.free.fr/get_fixed/exw.exe (and ex.exe).

> 
> Note: no mention of comments.  This would definitely fall under the category
> of enhancement, or feature request, and not bugs, AFAICT.  It certainly
> sounds useful, but without analyzing the impact (speed, etc), I don't think
> we should necessarily include it.  Others may have other reasons to 
> exclude this, but I'll let them speak for themselves.
> 

Since the comment mark is two character long, there will have to be a 1
character lookahead buffer somewhere. If it cannot be done without impacting
performance, then we should leave the functions as they are and document the fact
that comments are not covered. But this will be assessed from actual code, which
isn't written yet.

CChris
> Matt

new topic     » goto parent     » topic index » view message » categorize

14. Re: Bug in get() and value(): embedded comments

Derek Parnell wrote:
> 
> CChris wrote:
> > 
> > The Euphoria documentation on value() is a bit fuzzy, so that the bug mpay
> > have
> > gone unnoticed:
> > * In the description, it reads:
> > "Read the string representation of a Euphoria object, and ..."
> > The definite article "the" would suggest that there is only one string
> > representation
> > for an Euphoria object, and one would expect that this is the one returned
> > by
> > sprint().
> 
> 
> Well, I suspect you are reading the English a bit too literally here. In this
> case the definite article "the" does not refer to the one and only possble
> representation
> of any given Euphoria object, but to the single representation of the object
> you are trying to get the value of. The value() function works on single
> objects,
> so if you happen to have several objects, the documentation is referring to
> the specific string representation you are passing to the function, you are
> trying to convert to an object.
> 
> > * In the comments, it reads:
> > "After reading one valid representation of a Euphoria object, ..."
> > So now there is the possibility of several valid representations for an
> > object;
> > otherwise, "the" would have been used again instead.
> 
> No, the value() function works with a single representation. The comments here
> are alerting you to the idea that you mustn't expect value() to deal with
> multiple
> representations in a single call.
> 

Ok, I think the doc should lift the ambiguity here between "unique
representation" and "representation of a single object".

It still holds true that any Euphoria sequence has plenty of valid forms, adding
whitespace at will. So, the documentation issue is separate (and minor).

> > And indeed, value() can cope with spaces inside a string that represents a
> > sequence,
> > so that the interpretation from the Comments section would seem to be the
> > one
> > to take into account.
> > 
> > Since
> > }}}
<eucode>
> > constant cst={1,  -- first
> >    2}
> > </eucode>
{{{

> > is a valid Euphoria statement (it doesn't trigger a compile error, and ?cst
> > displays {1,2} as expected), one would infer that "{1,  -- first\n   2" is a
> > valid representation of an Euphoria object, namely {1,2}.
> 
> 
> No, Euphoria source code is not the same as Euphoria objects.

No one said that.
Euphoria source code is a stream of characters, ie a string.  Are you saying
that a assignment
lvalue = manifest rvalue
may be valid even though the rvalue doesn't represent validly an object? This is
definitely confusing.

> 
> > As a consequence, value("{1,  -- first\n   2") should return {0,{1,2}}
> > according
> > to the documentation, the inconsistency mentioned above notwithstanding. Yet
> > it returns {1,0}, or {1,0,7,0} using a recent extension.
> 
> No, the string "{1,  -- first\n   2" is not a valid representation of a
> Euphoria
> object. Even if you had of included the final ending brace, it is still not
> a valie object representation. The normal string representation of the cst
> identifier
> in your code would be "{1,2}" 
> 

Or {1, 2}, or { 1 ,\t2\n }, or ...

>  
> > I'll commit a fix next weekend, or next week. It will correct both get() and
> > value(), and requires only changes in get.e.
> 
> Please do not do that. It seems that you are misunderstanding the
> documentation
> and the concept of string representations. 
> 

If I am misunderstanding it, and if this concept is along the lines you suggest,
then it is a little confusing, because a string would be valid in some contexts
and not in others. In that case, let's not call it a bug, but then it should
probably be changed as kludgy and inconsistent. Even more so if you consider that
Euphoria is an interpreted language, which implies a very close relationship
between a string and its evaluation in code. Note however that this is still
quite far from an eval()-like functionality, which has some issues of its own. It
would be a timid step in that direction, at most/worst.

As I replied to Matt, if a performance impact is unavoidable, then the functions
should be left alone and the doc made explicit, independently from the
abovementioned ambiguity.

CChris
> -- 
> Derek Parnell
> Melbourne, Australia
> Skype name: derek.j.parnell

new topic     » goto parent     » topic index » view message » categorize

15. Re: Bug in get() and value(): embedded comments

CChris wrote:
> While you may be right, it is still true that adding syntactically 
> unnecessary
> whitespace keeps the string/sequence of bytes valid and readable by
> get()/value().
> Since comments are exactly as (un)necessary as whitespace, they should be
> treated
> the same, and currently are not.
> 
> > While what you're doing is sort of interesting, I think you're getting away
> > from the purpose of the function.  It definitely is *not* meant to handle
> > comments.
> 
> This is debatable. Comments are not explicitly mentioned, but did the author
> mean they were excluded? I explained righ above why I think they are not, but
> feedback from the author is needed here. Rob?

That fact that comments are not currently supported is not
a bug or oversight on my part, but I would not object to their
being supported, if most people think it would be useful,
and if there is no significant performance issue.
 
> Since the comment mark is two character long, there will have to be a 1
> character
> lookahead buffer somewhere. If it cannot be done without impacting
> performance,
> then we should leave the functions as they are and document the fact that
> comments
> are not covered. But this will be assessed from actual code, which isn't
> written
> yet.

You should test the performance.
While optimizing an application a long time ago,
I noticed that seek() can be more expensive
than you might imagine. 

If you want to improve the docs, pointing out that
there can be many different string representations of the same
object, please go ahead. e.g. 3.0 +3.0 3.00 3.000 3e0 are all 
the same object, and are considered by the language definition 
to be *exactly* the same as 3. The implementation 
*might* choose to store 3 differently than 3.0, internally,
but it doesn't have to.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

16. Re: Bug in get() and value(): embedded comments

Robert Craig wrote:
> 
> CChris wrote:
> > While you may be right, it is still true that adding syntactically 
> > unnecessary
> > whitespace keeps the string/sequence of bytes valid and readable by
> > get()/value().
> > Since comments are exactly as (un)necessary as whitespace, they should be
> > treated
> > the same, and currently are not.
> > 
> > > While what you're doing is sort of interesting, I think you're getting
> > > away
> > > from the purpose of the function.  It definitely is *not* meant to handle
> > > comments.
> > 
> > This is debatable. Comments are not explicitly mentioned, but did the author
> > mean they were excluded? I explained righ above why I think they are not,
> > but
> > feedback from the author is needed here. Rob?
> 
> That fact that comments are not currently supported is not
> a bug or oversight on my part, but I would not object to their
> being supported, if most people think it would be useful,
> and if there is no significant performance issue.
>  
> > Since the comment mark is two character long, there will have to be a 1
> > character
> > lookahead buffer somewhere. If it cannot be done without impacting
> > performance,
> > then we should leave the functions as they are and document the fact that
> > comments
> > are not covered. But this will be assessed from actual code, which isn't
> > written
> > yet.
> 
> You should test the performance.
> While optimizing an application a long time ago,
> I noticed that seek() can be more expensive
> than you might imagine. 
> 

Given that the performance loss would apply in a lot of cases that wouldn't be
concerned by embedded comments, this mod is not allowed to fail the test, as far
as I'm concerned. I won't code this before next week, as things stand.

Likewise, the code in the cchris_get branch on SVN has been isolated there
because I wish several persons to test performance on dfferent machines and
platforms before it can be considered harmless, and the quirk removal worthwhile.
No feedback so far.

Now that I think about it, if the proposed mod doesn't hurt performance, then I
probably can remove the get() quirk at the same time for no additional cost.
Needs actual coding.

> If you want to improve the docs, pointing out that
> there can be many different string representations of the same
> object, please go ahead. e.g. 3.0 +3.0 3.00 3.000 3e0 are all 
> the same object, and are considered by the language definition 
> to be *exactly* the same as 3. The implementation 
> *might* choose to store 3 differently than 3.0, internally,
> but it doesn't have to.
> 

This, or emphasizing that sprint() returns a shortest form which is not unique.
I think I'll add a sentence about it to cross the t's.

> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a>

CChris

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu