OpenEuphoria: Forum: UTF-8 in Windows

1. UTF-8 in Windows

Posted by rodoval Sep 13, 2008
1259 views

Hello!

Can the Windows version of Euphoria output text in UTF-8 format?

I have finished a program in Linux and now I am testing it in Windows XP. In Linux, it seems that the format of the code file determines the format of the output; I have checked this in UTF-8 and ISO-8859-15 redirecting the output for a file from a terminal. In Windows this works only for DOS and ANSI (Windows-1252) modes. I am using the 3.1.1 version of Euphoria and exwc in Windows.

new topic » topic index » view message » categorize

2. Re: UTF-8 in Windows

Posted by DerekParnell (admin) Sep 13, 2008
1284 views

rodoval said...

Can the Windows version of Euphoria output text in UTF-8 format?

Yes, but not by default. It does not convert Extended ASCII text into UTF-8. Of course, plain ASCII (values 0 - 127) is already in UTF8 format but the byte values 128 - 255, used in code pages, are not converted.

There will be a routine in Version 4 to convert many codes pages into Unicode (UTF32, UTF16, and UTF8).

By the way, I assume you are talking about sending text to a console window and not a Windows control object. In which case you must also set the console code page to 65001 and use the "Lucinda Console" font.

So, in summary, for Euphoria 3 programs, you have to convert your extended ASCII characters into the utf-8 equivalents before sending them to a Windows console.

new topic » goto parent » topic index » view message » categorize

3. Re: UTF-8 in Windows

Posted by rodoval Sep 14, 2008
1263 views

DerekParnell said...

rodoval said...

Can the Windows version of Euphoria output text in UTF-8 format?

Yes, but not by default. It does not convert Extended ASCII text into UTF-8. Of course, plain ASCII (values 0 - 127) is already in UTF8 format but the byte values 128 - 255, used in code pages, are not converted.

There will be a routine in Version 4 to convert many codes pages into Unicode (UTF32, UTF16, and UTF8).

By the way, I assume you are talking about sending text to a console window and not a Windows control object. In which case you must also set the console code page to 65001 and use the "Lucinda Console" font.

So, in summary, for Euphoria 3 programs, you have to convert your extended ASCII characters into the utf-8 equivalents before sending them to a Windows console.

Thanks, Derek. It is strange that, in my tests, the output seems ok when redirected to a file, but it is not correctly displayed on the console. This is a example:

- Open a console, set the Lucida Console font and execute "chcp 65001".

- Create the file "test.ex" using Notepad in the "UTF-8 without BOM" format with this single line:

  puts(1, "Ã±u\n")

The first letter of the string is a "n with tilde", a spanish letter not in the 0-127 range.

- Executing from the console "exwc test.ex" the "extended" letter is not displayed (a rectangle appears instead).

- "exwc test.ex > test.txt" create a UTF-8 file (checked with Notepad), but surprisingly, "type test.txt" now show correctly the "n with tilde".

new topic » goto parent » topic index » view message » categorize

4. Re: UTF-8 in Windows

Posted by jacquesd Sep 14, 2008
1248 views