Re: UTF-8 encoding vs UTF-32

new topic     » goto parent     » topic index » view thread      » older message » newer message
Nevla said...

Thank you again, Irv, for the time you took to try it out. I really appreciate it.

So, to sum up this thread, it would seem that as long as the foreign language text is displayed and handled properly by EuGtk, I should not even worry about which encoding Euphoria is actually using. I just go with the "Euphoria encoding", whatever that may be.

I seems that the UTF-8 vs UTF-16 vs UTF-32 question has become irrelevant...

It would, perhaps, be nice if Eu had a UTF variable type, so that the 2, 3 or 4 bytes which make up the UTF could be packed into one Eu atom. There'd be less wasted space that way, and things like sorting might be easier.

Since apparently UTF can be sent as 2, 3 or 4 bytes, figuring out how to sort those strings is going to be tricky. Having every character 32 bytes long (even though UTF seems only to need 21 bytes) would probably make that task easier.

Also, length() doesn't give the results you'd expect when used with UTF strings. For example, "I don't know" 我不知道 is {230,136,145,228,184,141,231,159,165,233,129,147} or 12 'bytes'

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu