Re: UTF-8

new topic     » goto parent     » topic index » view thread      » older message » newer message
Vinoba said...
LarryMiller said...

While great for files, UTF-8 is not very easy to work with internally. UTF-16, or even UTF-32, is much easier to work with as an internal format. UTF-32 is a more natural fit for Euphoria which is already using 32 bits to store characters.

To the best of my knowledge, file names are in Ansi or in Unicode 16 bit or 32 bit. I do not know of them being in UTF8. I missed out on 32 bit storage of characters in Euphoria. If that is the case, the implementation of 32 bit Unicode should be very easy and will put Euphoria well ahead of most BASICS (and Clipper, Harbour, Lua, Agena, AutoIT) again, and will be a great tool for international trade software. I would appreciate some pointers/samples regarding the 32 bit storage of characters in Euphoria. I am 100% comfortable with Assembly/Machine language, so a couple of hints is all I need to look at the storage.

I did NOT see that (32 bit storage) in the Unicode conversion software somebody has written for Euphoria.

Euphoria atoms are stored as either 31-bit signed integers, or doubles. So some 32-bit numbers (anything greater than 230-1) are stored as doubles, and so less efficient. I'm not familiar enough with UTF-32 to know how much this affects anything. Of course, 64-bit euphoria (already working in the 4.1 implementation) will use 63-bit signed integers and extended precision floating point numbers.

Derek has done some of work on unicode routines for euphoria (mostly standard library stuff, IIRC). There's a unicode branch in the repo if you're interested in taking a look. This may or may not make it into 4.1.

From what I did with wxEuphoria, I think it's mainly I/O stuff that needs updating. For instance, the built-in sprint() coerces things to C chars, so UTF-8 seems to work with it, but not UTF-16. For wxEuphoria, I had to write my own w_sprintf(), which was based on euphoria's sprintf(), but using wxWidget style characters.

Matt

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu