Re: UTF-8

new topic     » goto parent     » topic index » view thread      » older message » newer message
mattlewis said...

Euphoria atoms are stored as either 31-bit signed integers, or doubles. So some 32-bit numbers (anything greater than 230-1) are stored as doubles, and so less efficient. I'm not familiar enough with UTF-32 to know how much this affects anything. Of course, 64-bit euphoria (already working in the 4.1 implementation) will use 63-bit signed integers and extended precision floating point numbers. ..... Matt

I think 31 bits is OK, but i will look at it again and report back in detail. As a quick comment, the absence of the higher-most bit in 4 bytes might only affect some (hopefully minor) East Asian languages. Of course with 63 or 64 bits we will be able to accommodate all the Planetary and many of the trans-universe languages

mattlewis said...

Derek has done some of work on unicode routines for euphoria (mostly standard library stuff, IIRC). There's a unicode branch in the repo if you're interested in taking a look. This may or may not make it into 4.1.

From what I did with wxEuphoria, I think it's mainly I/O stuff that needs updating. For instance, the built-in sprint() coerces things to C chars, so UTF-8 seems to work with it, but not UTF-16. For wxEuphoria, I had to write my own w_sprintf(), which was based on euphoria's sprintf(), but using wxWidget style characters. Matt

I was wondering if you have looked at Microsoft's intermediate solution (tchar) and now wchar and things like ...MessageW() etc, as a good migration solution.

I will try and look at the Unicode branch you mentioned above, and see what goodies you have for me there. I want more than a kid gets going halooweening!

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu