OpenEuphoria: Forum: Re: Euphoria vs The Other Guys --- and RTFM

Re: Euphoria vs The Other Guys --- and RTFM

new topic » goto parent » topic index » view thread » older message » newer message

Posted by DerekParnell (admin) May 03, 2014
1657 views

_tom said...

In my first look at text data I was locked into the old idea that one byte was one character; that makes indexing a character in a string very easy. UTF-8 results in variable length encodings; indexing individual characters in Euphoria is no longer fun. In a Python3 string: x = "▒∆ Hello", printing x[1] produces ∆, which is {226,136,134}. How is Euphoria going to evolve to get a similiar convenience?

It's easy. A unicode string in Euphoria is just one element = one code point, where each element is a 32-bit integer. In other words, unicode strings in Euphoria are held in a sequence using UTF32 encoding. We would have functions that convert to and from other UTF encodings. So in your example above, the resulting sequence would be {38424, 34576, 72, 101, 108, 108, 111}

Of course, I'm speaking about future functionality as it currently doesn't support unicode source text.

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: Euphoria vs The Other Guys --- and RTFM

Search

Include:

Quick Links

User menu

Misc Menu