Re: UTF-8 encoding vs UTF-32

new topic     » goto parent     » topic index » view thread      » older message » newer message

UTF-8 and UTF-32 are very different.
UTF-8 is currently the standard in Linux and on the internet. That should be an incentive for Euphoris to go the UTF-8 route.
A 4 byte store of integers in a standard for Euphoria, so theoretically to implement at 32 bit character as a standard for character sequences should not be difficult. However, I have tried to access the actual pointer to a Integer numeric sequence to enable me to reach individual 4 byte characters.
Sequence of bytes treated as four byte words and using Peek and Poke would work.
It would be easy for C programmer to implement 4 distinct types of sequences, viz:
1. ASCII/ANSI one byte/character. For God's sake, get away from this. More than 75% of the population of the world languages cannot use this. 2. UTF-8. There are developments already existing within euphoria in this area. 3. 16 bit characters as mentioned in the Unicode standard and used by Microsoft internally in Windows XP,7, 8 since about 2002. These are a very good step ahead and easy to implement in Euphoria. 4. 32 bit characters as mentioned in the extended Unicode standard to accommodate mainly the full range of Chinese characters - This is currently 21 bits only and should be easy to incorporate into Euphoria as a 4th type of character sequence.

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu