OpenEuphoria: Forum: Re: Use a BOM to identify Unicode source files

Re: Use a BOM to identify Unicode source files

new topic » goto parent » topic index » view thread » older message » newer message

Posted by Vinoba Feb 17, 2011
1467 views

In general, YTF-8 would more complexity than using UTF-16 little-endian. In fact the correct approach would be to go completely UTF-16 little-endian and make 9 bit characters an exception that can be easily handled. Whilst a lot of people would make Microsoft the excuse for going the little endian route, mine is a little more thought out approach. The Intel CPU which most of us use has a 16 and 32 bit read and write; the 8 bit is there becuase 8088 processor was that. Most other processors on the market are also 16/32/64 bit. And, of course, everything related to Windows is 16-bit little endian. In any case, I feel a BOM should be a requirement in all future string related software.

ArthurCrump said...

In the recent Wiki article about Unicode plans source files may be confused with early shrouded output. If a Unicode source file was required to begin with a Unicode Byte Order Mark (BOM), could this ever look like the beginning of a shrouded file?

Files with a BOM at the front would begin:

EF,#BB,#BF if coded in UTF-8, probably the prefered encoding
FE,#FF if coded in UTF-16 big-endian
FF,#FE if coded in UTF-16 little-endian

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: Use a BOM to identify Unicode source files

Search

Include:

Quick Links

User menu

Misc Menu