OpenEuphoria: Forum: Re: Euphoria and Unicode

Re: Euphoria and Unicode

new topic » goto parent » topic index » view thread » older message » newer message

Posted by DerekParnell (admin) Oct 24, 2008
952 views

HappyGene said...

Wayyyl...

I'm pedantic; sorry but you'll just have to adjust any expectations you have of me blink

HappyGene said...

By "process and maintain w/o error" and preservation I mean:

- Assigning and transporting wide string char data retrieved from other in- and out-of-process data objects such as DDE if available, ODBC data sets and Euphoria keyboard input;

By "WIDE" I assume you mean UTF-16 encoding.

Assigning is just copying numeric values around. That's ok.
Transporting(?) I guess means moving text sequences to/from external storage. Not so straight forward. See below...

HappyGene said...

- manipulating those strings with standard functions like Trim/Mid/Replace/[=, <>, like];

Manipulating is fine, except when its based on the values within the string. So trim() is a problem because it only trims white-space and so far it only knows about ASCII whitespace. UTF-16 whitespace is a superset of ASCII. There are rare times when subscripting UTF-16 will fail but that is mainly when dealing with some Chinese ideographs, because these might take 32-bits rather than 16-bits to encode.

Comparisions between UTF-16 strings is not easy. Equality tests are okay, but anything based on collating order is a problem. Euphoria only does ASCII. IT can't tell if A-Grave is lower or higher than A-Acute, for example - and this is usually language based anyway aside from UTF-16 encoding. That is, different languages collate the same characters in different orders.

HappyGene said...

- and, if not by reference, passing/retrieving full width strings as parameters...

This is not so easy. There is no built-in way to convert a Euphoria text sequence (which is stored as an array of 30-bit values) to a RAM array of 16-bit values. The function to do this isn't difficult and someone can show you how.

HappyGene said...

...all without stripping any foreign language info from the tuple when placing/returning it to whatever data store I choose.

"tuple"? Do you mean sequence? What and how is "foreign language info" stored in the sequence? Euphoria reads and writes bytes. Each byte read in occupies a sequence element. If the file is stored in UTF-16 format (with or without BOM), you will have to have a special read/write routine to convert bytes read in to UTF-16 values. Likewise, to write UTF-16 values you will need to have a special routine to convert them to a byte stream.

HappyGene said...

Does that clarify? I'm sure there are many other ways of getting this across and I'll be glad to re-phrase for anyone to understand. I'm good with words.

Another way is, "If I build/use an international app with Euhporia, will terrorists put a contract out on me because I didn't use the right 'n'?"

Yes, I believe they will.

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: Euphoria and Unicode

Search

Include:

Quick Links

User menu

Misc Menu