Re: Documentation of x"00 00 fe ff"
- Posted by mattlewis (admin) Jun 28, 2010
- 1135 views
I think I know what this is supposed to mean, but is there any documentation for it? I had quite a good hunt but came up empty.
The manual up online here is a little out of date. Here is an excerpt from a more recent build of the docs:
4.1.1.2 Character Strings and Individual Characters
using hexadecimal byte strings e.g.
x"" -- ==> {#65,#66,#67,#AE}
using word strings hexadecimal (for utf-16) and double word hexadecimal (for utf-32) e.g.
u"" -- ==> {#65,#66,#67,#AE} U"" -- ==> {#65,#66,#67,#AE}
The value of the three strings above are equivalent. Spaces seperate values to other elements. When you put too many hex characters together for the kind of string they are split up appropriately for you:
x"" -- 8-bit ==> {#65,#66,#67,#AE} u"" -- 16-bit ==> {#6566,#67AE} U"" -- 32-bit ==> {#6566,#67AE} U"" -- 32-bit ==> {#656667AE}
String literals encoded as ASCII, UTF-8, UTF-16, UTF-32 or really any encoding that uses elements that are 32-bits long or shorter can be built with U"" syntax. Literals of encodings that have 16-bit long or shorter or 8-bit long or shorter elements can be built using u"" syntax or x"" syntax respectively. Use delimiters, such as spaces, to break the ambiguity and improve readability.
The following is code with a vaild UTF8 encoded string:
sequence utf8_val = x"" -- This is ">e"
However, it is up to the coder to know the correct code-point values for these to make any sense in the encoding the coder is using. That is to say, it is possible for the coder to use the x"", u"", and U"" syntax to create literals that are not valid UTF strings.
Matt