OpenEuphoria: Forum: Use a BOM to identify Unicode source files

1. Use a BOM to identify Unicode source files

Posted by ArthurCrump Feb 17, 2011
1499 views

In the recent Wiki article about Unicode plans source files may be confused with early shrouded output. If a Unicode source file was required to begin with a Unicode Byte Order Mark (BOM), could this ever look like the beginning of a shrouded file?

Files with a BOM at the front would begin:

EF,#BB,#BF if coded in UTF-8, probably the prefered encoding
FE,#FF if coded in UTF-16 big-endian
FF,#FE if coded in UTF-16 little-endian

new topic » topic index » view message » categorize

2. Re: Use a BOM to identify Unicode source files

Posted by jimcbrown (admin) Feb 17, 2011
1517 views

ArthurCrump said...

In the recent Wiki article about Unicode plans source files may be confused with early shrouded output. If a Unicode source file was required to begin with a Unicode Byte Order Mark (BOM), could this ever look like the beginning of a shrouded file?

Files with a BOM at the front would begin:

EF,#BB,#BF if coded in UTF-8, probably the prefered encoding
FE,#FF if coded in UTF-16 big-endian
FF,#FE if coded in UTF-16 little-endian

I'm not sure, but I'd really doubt it.

Scrambling added another layer of complexity ... but even so, the odds...

In any case, support for these formats were dropped in 2.5 when "shrouded" was changed to mean IL bytecode files. So backwards compatibility is no longer an issue.

new topic » goto parent » topic index » view message » categorize

3. Re: Use a BOM to identify Unicode source files

Posted by Vinoba Feb 17, 2011
1469 views

In general, YTF-8 would more complexity than using UTF-16 little-endian. In fact the correct approach would be to go completely UTF-16 little-endian and make 9 bit characters an exception that can be easily handled. Whilst a lot of people would make Microsoft the excuse for going the little endian route, mine is a little more thought out approach. The Intel CPU which most of us use has a 16 and 32 bit read and write; the 8 bit is there becuase 8088 processor was that. Most other processors on the market are also 16/32/64 bit. And, of course, everything related to Windows is 16-bit little endian. In any case, I feel a BOM should be a requirement in all future string related software.

ArthurCrump said...

In the recent Wiki article about Unicode plans source files may be confused with early shrouded output. If a Unicode source file was required to begin with a Unicode Byte Order Mark (BOM), could this ever look like the beginning of a shrouded file?

Files with a BOM at the front would begin:

EF,#BB,#BF if coded in UTF-8, probably the prefered encoding
FE,#FF if coded in UTF-16 big-endian
FF,#FE if coded in UTF-16 little-endian

new topic » goto parent » topic index » view message » categorize

OpenEuphoria

1. Use a BOM to identify Unicode source files

2. Re: Use a BOM to identify Unicode source files

3. Re: Use a BOM to identify Unicode source files

Search

Include:

Quick Links

User menu

Misc Menu