unicode and puts
- Posted by Frank Dowling <frank at frankied.com> Jun 06, 2007
- 529 views
I'm having a problem outputting unicode strings - now it's no drama doing this with files, but gets hairy for me when serving a page from my server. However, when I ran into a situation which involved the page being in russian, with Lithuanian and English as alternate language options to view the same page, I decided to make the whole page straight unicode, and ran into this: I've never had this problem with UTF-8, probably because I only used it for the odd character such as the Maori 'a' in New Zealand english, which has no all zero bytes in it (maybe UTF-8 doesn't encode with leading or trailing zeros, I don't know, but its irrelevent to unicode) where s = the unicode string "Hello World", the following statement:
puts(1,s)
will output "H" for to the screen because of puts() aborting when the null character is reached. putting two bytes (1 character) at a time will work, obviously for both Big and Little Endian, but there is a performance hit and adding another layer between the output of your page and Euphoria's puts() procedure. The only way I can see to get around it is implementing the two bytes chunking technique, with an alias of puts (which I already use for my web CMS Framework.) Is there a *really quick and easy* way to get around this? Also, as someone who uses Euphoria primarily for CGI, the last release for me was the most directly significant of all of them because of the "include" changes. I can now relax and spread my files out a bit :P