1. Unicode in Source code

Problem: Euphoria only accepts ASCII source files. UTF16 or UCS2 files are blatantly rejected because of the null (0) bytes embedded, and that byte is considered an illegal character by the scanner. UTF-8 encoding is widely used since it maintain compatibility with ASCII, but unfortunately the bytes 128-255 is reserved for (strange) shrouding that was used during commercial days of Eu.

Solution: Why not to write source code in RTF? It is 7-bit ASCII. Theoretically, RTF allows 8-bit after "\bin" tag, but it is rarely used feature, especially in the source code.

RTF is very simple format. You need just remove all tags to get pure ASCII-7. Formally, it is proprietary, but Microsoft opened the specification and imposes no restrictions on its use. Besides the format will melt apparent after examining of any RTF-file. Last but not least, my preferable text editor FocusWriter uses RTF for saved files by default.

new topic     » topic index » view message » categorize

2. Re: Unicode in Source code

SocIoDim said...

Problem: Euphoria only accepts ASCII source files. UTF16 or UCS2 files are blatantly rejected because of the null (0) bytes embedded, and that byte is considered an illegal character by the scanner. UTF-8 encoding is widely used since it maintain compatibility with ASCII, but unfortunately the bytes 128-255 is reserved for (strange) shrouding that was used during commercial days of Eu.

Actually, this is no longer entirely true. Current versions of Euphoria no longer reserve bytes 128-255, and UTF-8 encoded files are supported as source code.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu