Re: Documentation of x"00 00 fe ff"

new topic     » goto parent     » topic index » view thread      » older message » newer message
petelomax said...

I think I know what this is supposed to mean, but is there any documentation for it? I had quite a good hunt but came up empty.

The manual up online here is a little out of date. Here is an excerpt from a more recent build of the docs:

TFM said...

4.1.1.2 Character Strings and Individual Characters

using hexadecimal byte strings e.g.

x"" -- ==> {#65,#66,#67,#AE} 

using word strings hexadecimal (for utf-16) and double word hexadecimal (for utf-32) e.g.

u"" -- ==> {#65,#66,#67,#AE} 
U"" -- ==> {#65,#66,#67,#AE} 

The value of the three strings above are equivalent. Spaces seperate values to other elements. When you put too many hex characters together for the kind of string they are split up appropriately for you:

x""  -- 8-bit  ==> {#65,#66,#67,#AE} 
u""  -- 16-bit ==> {#6566,#67AE} 
U""  -- 32-bit ==> {#6566,#67AE} 
U""   -- 32-bit ==> {#656667AE} 

String literals encoded as ASCII, UTF-8, UTF-16, UTF-32 or really any encoding that uses elements that are 32-bits long or shorter can be built with U"" syntax. Literals of encodings that have 16-bit long or shorter or 8-bit long or shorter elements can be built using u"" syntax or x"" syntax respectively. Use delimiters, such as spaces, to break the ambiguity and improve readability.

The following is code with a vaild UTF8 encoded string:

sequence utf8_val = x"" -- This is ">e" 

However, it is up to the coder to know the correct code-point values for these to make any sense in the encoding the coder is using. That is to say, it is possible for the coder to use the x"", u"", and U"" syntax to create literals that are not valid UTF strings.

Matt

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu