Re: Use a BOM to identify Unicode source files

new topic     » goto parent     » topic index » view thread      » older message » newer message

In general, YTF-8 would more complexity than using UTF-16 little-endian. In fact the correct approach would be to go completely UTF-16 little-endian and make 9 bit characters an exception that can be easily handled. Whilst a lot of people would make Microsoft the excuse for going the little endian route, mine is a little more thought out approach. The Intel CPU which most of us use has a 16 and 32 bit read and write; the 8 bit is there becuase 8088 processor was that. Most other processors on the market are also 16/32/64 bit. And, of course, everything related to Windows is 16-bit little endian. In any case, I feel a BOM should be a requirement in all future string related software.

ArthurCrump said...

In the recent Wiki article about Unicode plans source files may be confused with early shrouded output. If a Unicode source file was required to begin with a Unicode Byte Order Mark (BOM), could this ever look like the beginning of a shrouded file?

Files with a BOM at the front would begin:

  1. EF,#BB,#BF if coded in UTF-8, probably the prefered encoding
  2. FE,#FF if coded in UTF-16 big-endian
  3. FF,#FE if coded in UTF-16 little-endian
new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu