Re: UTF-8

new topic     » goto parent     » topic index » view thread      » older message » newer message

UTF8 is the way Linux likes to handle things and so does the Internet. For all the older "languages" it is tempting therefore to stick to 8 bit and use windows codepages.

Microsoft's internal default is LE16 (16 bit Little Endian). They chose that (I think) because Intel CPUs handle 16 bit words that way in the memory. wxwidgets has also chosen LE16. wxWidgets 2.9 is totally Unicode 16 bit. QT is going the Unicode route and of course VC is already that.

I would urge you to take the LE16 route. It would require some rewriting of the basic code, but Euphoria has a tremendous advantage over other languages because you never created a string type. Therefore creating a true Unicode characters string type and converting everything to 16 bit Unicode will not be too difficult.

For me Unicode 16 bit still causes problems with Indic characters as Indic languages are syllabic. The decision makers in India accepted a ANSI type character set instead of asking for a full syllabic character set of some 6000 characters for each of the Indic languages. If that was the case the collation algorithms would be much better handled.

I hope that Euphoria takes the 16 (or 32) bit Unicode route, even in the next 4.# version. Look at AutoIT 3.6. They changed to 16 bit Unicode in version 3.4 http://www.autoitscript.com/ Some of the good BASICs are also changing to 16 bit Unicode e.g.RealBasic, PowerBasic, Purebasic

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu