1. Documentation of x"00 00 fe ff"
- Posted by petelomax Jun 28, 2010
- 1076 views
I think I know what this is supposed to mean, but is there any documentation for it? I had quite a good hunt but came up empty.
BTW, the "go" search buttons on the wiki and manual tabs go to google, which is not helpful, whereas the other 5 go to http://oe.cowgar.com/search/results.wc
2. Re: Documentation of x"00 00 fe ff"
- Posted by tinstaafl Jun 28, 2010
- 1066 views
i would guess that it's a Unicode BOM (Byte Order Mark), which is usually written at the beginning of a Unicode text.
thx, phil long
3. Re: Documentation of x"00 00 fe ff"
- Posted by mattlewis (admin) Jun 28, 2010
- 1136 views
I think I know what this is supposed to mean, but is there any documentation for it? I had quite a good hunt but came up empty.
The manual up online here is a little out of date. Here is an excerpt from a more recent build of the docs:
4.1.1.2 Character Strings and Individual Characters
using hexadecimal byte strings e.g.
x"" -- ==> {#65,#66,#67,#AE}
using word strings hexadecimal (for utf-16) and double word hexadecimal (for utf-32) e.g.
u"" -- ==> {#65,#66,#67,#AE} U"" -- ==> {#65,#66,#67,#AE}
The value of the three strings above are equivalent. Spaces seperate values to other elements. When you put too many hex characters together for the kind of string they are split up appropriately for you:
x"" -- 8-bit ==> {#65,#66,#67,#AE} u"" -- 16-bit ==> {#6566,#67AE} U"" -- 32-bit ==> {#6566,#67AE} U"" -- 32-bit ==> {#656667AE}
String literals encoded as ASCII, UTF-8, UTF-16, UTF-32 or really any encoding that uses elements that are 32-bits long or shorter can be built with U"" syntax. Literals of encodings that have 16-bit long or shorter or 8-bit long or shorter elements can be built using u"" syntax or x"" syntax respectively. Use delimiters, such as spaces, to break the ambiguity and improve readability.
The following is code with a vaild UTF8 encoded string:
sequence utf8_val = x"" -- This is ">e"
However, it is up to the coder to know the correct code-point values for these to make any sense in the encoding the coder is using. That is to say, it is possible for the coder to use the x"", u"", and U"" syntax to create literals that are not valid UTF strings.
Matt