Re: unicode and puts
- Posted by Craig Welch <euphoriah at cwelch.org> Jun 06, 2007
- 510 views
FD(censored) wrote: > The only way I can see to get around it is implementing the two bytes chunking > technique, with an alias of puts (which I already use for my web CMS Framework.) > > Is there a *really quick and easy* way to get around this? Ah, Euphoria and Unicode. Don't you just love it? I've taken a different approach. I use HTML encoding for all of my Unicode. To demonstrate, go to http://www.wazu.jp/hosting/pricing.exu Click on 'price this plan', and if you haven't changed any of the default numbers, it will verify it and add the ordering section to the page. In any of the 'name' fields, enter your non-Latin characters. Russian, Japanese, whatever. Do *not* click that you've agreed to the terms and conditions. That way, the page will fail with an error message, but your entered characters will be re-displayed. That's the key ... how they were re-displayed. 1) The page is 'charset=utf-8'. That means that no matter how you enter the characters, the browser will POST them back to the CGI program as UTF-8. Let's say that {229,147,169} is input. That's how it's stored in the database. 2) The UTF-8 (1 - 4 bytes) character is converted to a hex number (its Unicode number). This example would become 54E9. 3) The hex is turned into decimal. The above example would be 21737. 4) The number is turned into its HTML representation, 哩 5) Each character in the page buffer is replaced as above, and the buffer written out. 6) <SHAMELESS PLUG> Then go to http://www.wazu.jp/ to get some Unicode fonts to test with </SHAMELESS PLUG> HTH. -- Craig PS You should turn off directory listing.