Re: Interpreter Mod We Can All Get Behind

new topic     » goto parent     » topic index » view thread      » older message » newer message

CChris wrote:
>  
> Also U"<unicode string>" would be desirable too. Some variatuins on U would
> give more control over the unicode encoding that's desired. For instance U for
> UTF8, uU for UTF16LE, Uu for UTF16BE, uUU for UTF32LE and UUu for UTF32BE?
> Just
> suggestions.

I think that you'd probably want to stick with (internally) the equivalent 
of wide characters (wchar) which are 4 bytes each.  It pretty naturally 
aligns with the sequence.

The only encoding issues would be in the files themselves, at which point,
you're correct that we probably need some way to identify how things are
encoded in the file.  I'd suggest UTF8 as the best way to encode euphoria
source, since most of the characters will be from ASCII, making UTF8
the most efficient.  But if the scanner is UTF8 enabled, is there any
real need to identify which strings are or are not unicode?  Using
sequences, there's no need to distinguish between char widths as with
C/C++.

This all assumes that the interpreter is capable of dealing with unicode.
wxEuphoria now handles it pretty seamlessly, although, of course, if you
use any funky characters, your strings will look kind of funny:
string = "This uses a unicode character: " & 2015

If we decide to go with UTF8 as the standard, we'll need to have a 
library (presumably in eu, for the front end) that is capable of 
decoding UTF8.  And then, of course, there will be all sorts of decisions
about how to handle puts/printf/etc.  But if we go with straight wide chars,
then it might actually make things a bit simpler, like it did with 
wxEuphoria (because I don't have to cast a long to a char).   Not sure
how all this would affect DOS, however.

Matt

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu