1. Octal and Binary Literals

Currently Eu does not allow literal octal or binary numbers to be specified
(except perhaps as a string passed to a decode function), and I wondered what
would everyone would consider to be the best way to allow such. At the same time,
I would like to solve the problem of not being able to specify "escape" chars in
strings.

Implementation should be based on the following, in order:
  1: Unambiguity and Least Surprise
  2: Zero impact on legacy code
  3: Minimal performance impact

Legacy:
  NN	-- decimal, N is 0..9
  #NN	-- hex, N is 0..9, A..F, a..f

Notes: While the NNh, 0xNN, and #xNN cases shown below are not strictly 
       necessary (assuming we keep #NN) they may help with consistency.

Option 1: Use a Suffix character.
  NNo	-- octal, N is 0..7
  NNh	-- hex, N is 0..9, A..F, a..f
  NNb	-- binary, N is 0 or 1
 Notes: the NNb case would need to be followed by a delimeter, ie in "1bh" 
        the 'b' does not terminate the token since it is followed by a valid 
        hex char, and is equal to "27".
	This would probably yield the highest performance impact, simply
	because you have to check for o/h/b after every number.

Option 2: Use the leading 0 trick from C.
  NN	-- decimal (first char is not 0)
  0NN	-- octal
  0xNN	-- hex
  0bNN	-- binary
 Notes: this means that 012 == 10. Any legacy code with leading zeroes will 
        be broken.

Option 3: Use the 0X style notation.
  0oNN	-- octal
  0xNN	-- hex
  0bNN	-- binary
 Notes: This would be my personal choice.

Option 4: As 0X, but use #.
  #oNN	-- octal
  #xNN	-- hex
 Notes: There is no obvious way to specify binary numbers as the 'b' (and to 
        a lesser extent a 'd' for explicitly decimal) clash with legacy use.
	This would probably yield the lowest performance impact, since the 
        only compiler changes are needed after it has detected a # character.
	Options 2 and 3 occupy the middle ground, having to check for 
        leading 0.

It may also be sensible to allow a 'd' to explicitly state "this is decimal",
so that in option 1, NNd (with same delimeter note) is equal to NN, or in 
option 2 and 3, 0dNN, but not apparently possible under option 4.

I have a need to specify octal numbers for an assembly project. I want to use
"bytewise octal" whereby each byte is represented by exactly 3 octal digits, ie
FFFFFFFF(hex) = 377377377377(octal).
Can anyone think of a problem with using "NNob", "0obNN", or "#obNN" for this?
If you take specific exception to that going into "official Eu", then please take
a moment to consider whether you could properly justify your objections and/or
what factors might lessen them.

For strings, I think the best option is "\#NN". I doubt there is much call for
octal codes anymore. I also think we should force it to be exactly two hex
digits, no more and no less. For example, suppose I want {7,'H','e','l',l',o'}
then I might code "\#07Hello". Any objections to that? Obviously someone is bound
to mention Unicode, so is there a sensible way to say "a two digit hex no" vs "a
four digit hex no" etc? Otherwise "\#0700000" becomes ambiguous, meaning either
{7,'0','0','0','0','0'} or {#700,'0','0','0'}.

Regards,
Pete

new topic     » topic index » view message » categorize

2. Re: Octal and Binary Literals

I like the prefix version best:

  #oNN	-- octal
  #xNN	-- hex

I'm sure I've seen '%' as a binary prefix specifier:
%01011101 -- #5C

I'm not sure about usage in strings...

--
A complex system that works is invariably found to have evolved from a simple
system that works.
--John Gall's 15th law of Systemantics.

"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare

j.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Octal and Binary Literals

Pete Lomax wrote:
> 
> Currently Eu does not allow literal octal or binary numbers to be specified
> (except perhaps as a string passed to a decode function), and I wondered what
> would everyone would consider to be the best way to allow such. At the same
> time, I would like to solve the problem of not being able to specify "escape"
> chars in strings.
> 
> Implementation should be based on the following, in order:
>   1: Unambiguity and Least Surprise
>   2: Zero impact on legacy code
>   3: Minimal performance impact
> 
> Legacy:
>   NN	-- decimal, N is 0..9
>   #NN	-- hex, N is 0..9, A..F, a..f
> 
> Notes: While the NNh, 0xNN, and #xNN cases shown below are not strictly 
>        necessary (assuming we keep #NN) they may help with consistency.
> 
> Option 1: Use a Suffix character.
>   NNo	-- octal, N is 0..7
>   NNh	-- hex, N is 0..9, A..F, a..f
>   NNb	-- binary, N is 0 or 1
>  Notes: the NNb case would need to be followed by a delimeter, ie in "1bh" 
>         the 'b' does not terminate the token since it is followed by a valid
> 
>         hex char, and is equal to "27".
> 	This would probably yield the highest performance impact, simply
> 	because you have to check for o/h/b after every number.
> 
> Option 2: Use the leading 0 trick from C.
>   NN	-- decimal (first char is not 0)
>   0NN	-- octal
>   0xNN	-- hex
>   0bNN	-- binary
>  Notes: this means that 012 == 10. Any legacy code with leading zeroes will
> 
>         be broken.
> 
> Option 3: Use the 0X style notation.
>   0oNN	-- octal
>   0xNN	-- hex
>   0bNN	-- binary
>  Notes: This would be my personal choice.
> 
> Option 4: As 0X, but use #.
>   #oNN	-- octal
>   #xNN	-- hex
>  Notes: There is no obvious way to specify binary numbers as the 'b' (and to
> 
>         a lesser extent a 'd' for explicitly decimal) clash with legacy use.
> 	This would probably yield the lowest performance impact, since the 
>         only compiler changes are needed after it has detected a # character.
> 	Options 2 and 3 occupy the middle ground, having to check for 
>         leading 0.
> 
> It may also be sensible to allow a 'd' to explicitly state "this is decimal",
> so that in option 1, NNd (with same delimeter note) is equal to NN, or in 
> option 2 and 3, 0dNN, but not apparently possible under option 4.
> 
> I have a need to specify octal numbers for an assembly project. I want to use
> "bytewise octal" whereby each byte is represented by exactly 3 octal digits,
> ie FFFFFFFF(hex) = 377377377377(octal). 
> Can anyone think of a problem with using "NNob", "0obNN", or "#obNN" for this?
> If you take specific exception to that going into "official Eu", then please
> take a moment to consider whether you could properly justify your objections
> and/or what factors might lessen them.
> 
> For strings, I think the best option is "\#NN". I doubt there is much call for
> octal codes anymore. I also think we should force it to be exactly two hex
> digits,
> no more and no less. For example, suppose I want {7,'H','e','l',l',o'} then
> I might code "\#07Hello". Any objections to that? Obviously someone is bound
> to mention Unicode, so is there a sensible way to say "a two digit hex no" vs
> "a four digit hex no" etc? Otherwise "\#0700000" becomes ambiguous, meaning
> either {7,'0','0','0','0','0'} or {#700,'0','0','0'}.
> 
> Regards,
> Pete

Why not simply:
#(2)1110010101
#(16)FFE4
#(10)365
#(8)755

and so on? using both lower and upper case extra digits, the nn in #(nn)might
extend from 2 to 62. Not that I'd use base 53 numeration often, though.

Performance is not affected (just one extra if statement when scanning '#'), and
no existing code harbors this and would broken as a result from this addition.

CChris

new topic     » goto parent     » topic index » view message » categorize

4. Re: Octal and Binary Literals

Pete Lomax wrote:
> 
> Implementation should be based on the following, in order:
>   1: Unambiguity and Least Surprise
>   2: Zero impact on legacy code
>   3: Minimal performance impact

We should put this up on the wall somewhere. :)

> Option 1: Use a Suffix character.
>   NNo	-- octal, N is 0..7
>   NNh	-- hex, N is 0..9, A..F, a..f
>   NNb	-- binary, N is 0 or 1
>  Notes: the NNb case would need to be followed by a delimeter, ie in "1bh" 
>         the 'b' does not terminate the token since it is followed by a valid
>         hex char, and is equal to "27".

Not if it's a lowercase b.  But that might be a bit more confusing than 
we'd really like.

> 	This would probably yield the highest performance impact, simply
> 	because you have to check for o/h/b after every number.

We'd have to change the scanner, to be sure, but the performance hit might
not be so bad.

> Option 2: Use the leading 0 trick from C.
>  Notes: this means that 012 == 10. Any legacy code with leading zeroes will
>         be broken.

Yes, it's a fairly common pattern, but I think we may end up breaking a
lot of stuff.

> Option 3: Use the 0X style notation.
>   0oNN	-- octal
>   0xNN	-- hex
>   0bNN	-- binary
>  Notes: This would be my personal choice.

I like this.   Presumably, now we'd have 2 ways to specify hexadecimal.  
What about using lowercase for hexadecimal numbers of this format?  
Probably too confusing, but would be consistent with C.

> Option 4: As 0X, but use #.
>   #oNN	-- octal
>   #xNN	-- hex
>  Notes: There is no obvious way to specify binary numbers as the 'b' (and to
>         a lesser extent a 'd' for explicitly decimal) clash with legacy use.
> 	This would probably yield the lowest performance impact, since the 
>         only compiler changes are needed after it has detected a # character.
> 	Options 2 and 3 occupy the middle ground, having to check for 
>         leading 0.

But all hexadecimal digits have to be uppercase, so while it may be opening
the door for typos, it's not ambiguous.

> It may also be sensible to allow a 'd' to explicitly state "this is decimal",
> so that in option 1, NNd (with same delimeter note) is equal to NN, or in 
> option 2 and 3, 0dNN, but not apparently possible under option 4.
> 
> I have a need to specify octal numbers for an assembly project. I want to use
> "bytewise octal" whereby each byte is represented by exactly 3 octal digits,
> ie FFFFFFFF(hex) = 377377377377(octal). 
> Can anyone think of a problem with using "NNob", "0obNN", or "#obNN" for this?
> If you take specific exception to that going into "official Eu", then please
> take a moment to consider whether you could properly justify your objections
> and/or what factors might lessen them.
> 
> For strings, I think the best option is "\#NN". I doubt there is much call for
> octal codes anymore. I also think we should force it to be exactly two hex
> digits,
> no more and no less. For example, suppose I want {7,'H','e','l',l',o'} then
> I might code "\#07Hello". Any objections to that? Obviously someone is bound
> to mention Unicode, so is there a sensible way to say "a two digit hex no" vs
> "a four digit hex no" etc? Otherwise "\#0700000" becomes ambiguous, meaning
> either {7,'0','0','0','0','0'} or {#700,'0','0','0'}.

I agree with "\#NN".  If you're putting things in quotes, then we're 
presuming that you're building a string.  Euphoria doesn't support 
Unicode (yet?). But if you were trying to build, say, a UTF-8 string,
then you're still working with bytes.  Even UFT-16/32 would lend 
themselves to this.  You'd just need to string multiple "\#NN"s 
together.

Matt

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu