Re: UTF-8
- Posted by Vinoba Mar 10, 2011
- 2236 views
ArthurCrump said...
The current Unicode specification fits into 21 bits.
UTF-32 characters are in the range 0-#10FFFF
Thanks for jogging my memory. When I was looking for "personal space" for better collation of for 6000-8000 characters in one of the Indic languages, I had decided upon using E000 area. Then I vaguely remember that I was also looking at the pages above #10FFFF to do the same i.e. to find 12 times 6000 character space and I stopped because I fell ill.
Yes, 31 bit integer Euphoria can definitely cope with the "32 bit Unicode" and my Indic OEM extensions after 21 bits. I hope I can keep good health to do this. I will look more closely at the Unicode branch as suggested by Matt Lewis, and see if I can cope with it and/or improve upon it.