1. Octal and Binary Literals
- Posted by Pete Lomax <petelomax at bl?eyonder.?o.uk> Sep 07, 2007
- 562 views
Currently Eu does not allow literal octal or binary numbers to be specified (except perhaps as a string passed to a decode function), and I wondered what would everyone would consider to be the best way to allow such. At the same time, I would like to solve the problem of not being able to specify "escape" chars in strings. Implementation should be based on the following, in order: 1: Unambiguity and Least Surprise 2: Zero impact on legacy code 3: Minimal performance impact Legacy: NN -- decimal, N is 0..9 #NN -- hex, N is 0..9, A..F, a..f Notes: While the NNh, 0xNN, and #xNN cases shown below are not strictly necessary (assuming we keep #NN) they may help with consistency. Option 1: Use a Suffix character. NNo -- octal, N is 0..7 NNh -- hex, N is 0..9, A..F, a..f NNb -- binary, N is 0 or 1 Notes: the NNb case would need to be followed by a delimeter, ie in "1bh" the 'b' does not terminate the token since it is followed by a valid hex char, and is equal to "27". This would probably yield the highest performance impact, simply because you have to check for o/h/b after every number. Option 2: Use the leading 0 trick from C. NN -- decimal (first char is not 0) 0NN -- octal 0xNN -- hex 0bNN -- binary Notes: this means that 012 == 10. Any legacy code with leading zeroes will be broken. Option 3: Use the 0X style notation. 0oNN -- octal 0xNN -- hex 0bNN -- binary Notes: This would be my personal choice. Option 4: As 0X, but use #. #oNN -- octal #xNN -- hex Notes: There is no obvious way to specify binary numbers as the 'b' (and to a lesser extent a 'd' for explicitly decimal) clash with legacy use. This would probably yield the lowest performance impact, since the only compiler changes are needed after it has detected a # character. Options 2 and 3 occupy the middle ground, having to check for leading 0. It may also be sensible to allow a 'd' to explicitly state "this is decimal", so that in option 1, NNd (with same delimeter note) is equal to NN, or in option 2 and 3, 0dNN, but not apparently possible under option 4. I have a need to specify octal numbers for an assembly project. I want to use "bytewise octal" whereby each byte is represented by exactly 3 octal digits, ie FFFFFFFF(hex) = 377377377377(octal). Can anyone think of a problem with using "NNob", "0obNN", or "#obNN" for this? If you take specific exception to that going into "official Eu", then please take a moment to consider whether you could properly justify your objections and/or what factors might lessen them. For strings, I think the best option is "\#NN". I doubt there is much call for octal codes anymore. I also think we should force it to be exactly two hex digits, no more and no less. For example, suppose I want {7,'H','e','l',l',o'} then I might code "\#07Hello". Any objections to that? Obviously someone is bound to mention Unicode, so is there a sensible way to say "a two digit hex no" vs "a four digit hex no" etc? Otherwise "\#0700000" becomes ambiguous, meaning either {7,'0','0','0','0','0'} or {#700,'0','0','0'}. Regards, Pete
2. Re: Octal and Binary Literals
- Posted by Jason Gade <jaygade at yaho?.co?> Sep 07, 2007
- 568 views
I like the prefix version best: #oNN -- octal #xNN -- hex I'm sure I've seen '%' as a binary prefix specifier: %01011101 -- #5C I'm not sure about usage in strings... -- A complex system that works is invariably found to have evolved from a simple system that works. --John Gall's 15th law of Systemantics. "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
3. Re: Octal and Binary Literals
- Posted by CChris <christian.cuvier at ?gricult?re.gouv.fr> Sep 07, 2007
- 536 views
Pete Lomax wrote: > > Currently Eu does not allow literal octal or binary numbers to be specified > (except perhaps as a string passed to a decode function), and I wondered what > would everyone would consider to be the best way to allow such. At the same > time, I would like to solve the problem of not being able to specify "escape" > chars in strings. > > Implementation should be based on the following, in order: > 1: Unambiguity and Least Surprise > 2: Zero impact on legacy code > 3: Minimal performance impact > > Legacy: > NN -- decimal, N is 0..9 > #NN -- hex, N is 0..9, A..F, a..f > > Notes: While the NNh, 0xNN, and #xNN cases shown below are not strictly > necessary (assuming we keep #NN) they may help with consistency. > > Option 1: Use a Suffix character. > NNo -- octal, N is 0..7 > NNh -- hex, N is 0..9, A..F, a..f > NNb -- binary, N is 0 or 1 > Notes: the NNb case would need to be followed by a delimeter, ie in "1bh" > the 'b' does not terminate the token since it is followed by a valid > > hex char, and is equal to "27". > This would probably yield the highest performance impact, simply > because you have to check for o/h/b after every number. > > Option 2: Use the leading 0 trick from C. > NN -- decimal (first char is not 0) > 0NN -- octal > 0xNN -- hex > 0bNN -- binary > Notes: this means that 012 == 10. Any legacy code with leading zeroes will > > be broken. > > Option 3: Use the 0X style notation. > 0oNN -- octal > 0xNN -- hex > 0bNN -- binary > Notes: This would be my personal choice. > > Option 4: As 0X, but use #. > #oNN -- octal > #xNN -- hex > Notes: There is no obvious way to specify binary numbers as the 'b' (and to > > a lesser extent a 'd' for explicitly decimal) clash with legacy use. > This would probably yield the lowest performance impact, since the > only compiler changes are needed after it has detected a # character. > Options 2 and 3 occupy the middle ground, having to check for > leading 0. > > It may also be sensible to allow a 'd' to explicitly state "this is decimal", > so that in option 1, NNd (with same delimeter note) is equal to NN, or in > option 2 and 3, 0dNN, but not apparently possible under option 4. > > I have a need to specify octal numbers for an assembly project. I want to use > "bytewise octal" whereby each byte is represented by exactly 3 octal digits, > ie FFFFFFFF(hex) = 377377377377(octal). > Can anyone think of a problem with using "NNob", "0obNN", or "#obNN" for this? > If you take specific exception to that going into "official Eu", then please > take a moment to consider whether you could properly justify your objections > and/or what factors might lessen them. > > For strings, I think the best option is "\#NN". I doubt there is much call for > octal codes anymore. I also think we should force it to be exactly two hex > digits, > no more and no less. For example, suppose I want {7,'H','e','l',l',o'} then > I might code "\#07Hello". Any objections to that? Obviously someone is bound > to mention Unicode, so is there a sensible way to say "a two digit hex no" vs > "a four digit hex no" etc? Otherwise "\#0700000" becomes ambiguous, meaning > either {7,'0','0','0','0','0'} or {#700,'0','0','0'}. > > Regards, > Pete Why not simply: #(2)1110010101 #(16)FFE4 #(10)365 #(8)755 and so on? using both lower and upper case extra digits, the nn in #(nn)might extend from 2 to 62. Not that I'd use base 53 numeration often, though. Performance is not affected (just one extra if statement when scanning '#'), and no existing code harbors this and would broken as a result from this addition. CChris
4. Re: Octal and Binary Literals
- Posted by Matt Lewis <matthewwalkerlewis at gm?il?com> Sep 07, 2007
- 541 views
Pete Lomax wrote: > > Implementation should be based on the following, in order: > 1: Unambiguity and Least Surprise > 2: Zero impact on legacy code > 3: Minimal performance impact We should put this up on the wall somewhere. :) > Option 1: Use a Suffix character. > NNo -- octal, N is 0..7 > NNh -- hex, N is 0..9, A..F, a..f > NNb -- binary, N is 0 or 1 > Notes: the NNb case would need to be followed by a delimeter, ie in "1bh" > the 'b' does not terminate the token since it is followed by a valid > hex char, and is equal to "27". Not if it's a lowercase b. But that might be a bit more confusing than we'd really like. > This would probably yield the highest performance impact, simply > because you have to check for o/h/b after every number. We'd have to change the scanner, to be sure, but the performance hit might not be so bad. > Option 2: Use the leading 0 trick from C. > Notes: this means that 012 == 10. Any legacy code with leading zeroes will > be broken. Yes, it's a fairly common pattern, but I think we may end up breaking a lot of stuff. > Option 3: Use the 0X style notation. > 0oNN -- octal > 0xNN -- hex > 0bNN -- binary > Notes: This would be my personal choice. I like this. Presumably, now we'd have 2 ways to specify hexadecimal. What about using lowercase for hexadecimal numbers of this format? Probably too confusing, but would be consistent with C. > Option 4: As 0X, but use #. > #oNN -- octal > #xNN -- hex > Notes: There is no obvious way to specify binary numbers as the 'b' (and to > a lesser extent a 'd' for explicitly decimal) clash with legacy use. > This would probably yield the lowest performance impact, since the > only compiler changes are needed after it has detected a # character. > Options 2 and 3 occupy the middle ground, having to check for > leading 0. But all hexadecimal digits have to be uppercase, so while it may be opening the door for typos, it's not ambiguous. > It may also be sensible to allow a 'd' to explicitly state "this is decimal", > so that in option 1, NNd (with same delimeter note) is equal to NN, or in > option 2 and 3, 0dNN, but not apparently possible under option 4. > > I have a need to specify octal numbers for an assembly project. I want to use > "bytewise octal" whereby each byte is represented by exactly 3 octal digits, > ie FFFFFFFF(hex) = 377377377377(octal). > Can anyone think of a problem with using "NNob", "0obNN", or "#obNN" for this? > If you take specific exception to that going into "official Eu", then please > take a moment to consider whether you could properly justify your objections > and/or what factors might lessen them. > > For strings, I think the best option is "\#NN". I doubt there is much call for > octal codes anymore. I also think we should force it to be exactly two hex > digits, > no more and no less. For example, suppose I want {7,'H','e','l',l',o'} then > I might code "\#07Hello". Any objections to that? Obviously someone is bound > to mention Unicode, so is there a sensible way to say "a two digit hex no" vs > "a four digit hex no" etc? Otherwise "\#0700000" becomes ambiguous, meaning > either {7,'0','0','0','0','0'} or {#700,'0','0','0'}. I agree with "\#NN". If you're putting things in quotes, then we're presuming that you're building a string. Euphoria doesn't support Unicode (yet?). But if you were trying to build, say, a UTF-8 string, then you're still working with bytes. Even UFT-16/32 would lend themselves to this. You'd just need to string multiple "\#NN"s together. Matt