Re: Status of Edix
- Posted by jimcbrown (admin) Aug 27, 2019
- 1083 views
The original Unicode character could be represented by precisely 2 bytes each. Therefore, writing a piece of software to reach the nth character in a sequence, was very simple, IF you represented them in 2 byte fields
when UTF8 is used to represent these, the nth character position cannot be estimated or guestimated; you have to crawl along to find the nth character.
Yep, but...
Therefore, UTF16 was invented, and it was good for the original extent (64K characters) but not enough for the extended characters.
You are thinking of UCS-2. UTF-16 is an extension of UCS-2, which can represent the full new Unicode character set, including the extended characters. However, it does this by using four bytes for the extended characters. (The original set is still represented by two bytes.) So UTF-16 suffers from the same problem as UTF-8 in terms of getting the nth character (however fewer characters are affected).