Re: Euphoria 2.5 Unicode
- Posted by kbochert at copper.net Dec 05, 2003
- 482 views
On 5 Dec 2003 at 7:48, Mario Steele wrote: > > > Hey Al, > > Thanks for your response, and as for that Unicode thing about it, I do > know of a easy way to make that work to, but still allow for a single > string type definition. All you would have to do is do a quick byte > scan through the input stream, to see weither the char is in a 0 to 255 > range, or a 0 to 255*2 range. A example of this would be: > > function need_2_bytes(object stream) > for x = 1 to length(stream) do > if stream[x] > 255 and stream[x] < 255*2 then > return 1 > elsif stream[x] > 255*2 then > return -1 > end if > end for > return 0 > end function > > Oviously, the first character that it runs into, to be unicode, then we > don't need to check any further, we assume it's unicode, and allcoate > the string as such, and if it returns -1, then we have a type_check > error, which means that someone threw in there something that's bigger > then 256*2. And if we get 0 back, then the stream can be put into > single byte character holders. And I'm sure there are faster > algorithims out there, that can scan byte wise in a much faster fashion > then this. And the problem is, people don't want to deal with the > memory routines themselvs, unless it's like explicitly needed by > windows. They'd rather use Sequences, and their programs get bloated. > That's why Sequences are so popular in Euphoria, cause it get's away > from PTRs that are dependant in C/C++. But again, this is just a simple > Programmer writting his two cents in. LOL > > L8ers, > EuMario Keep in mind that 'Unicode' refers to a mapping of character images to integers, and not how the string is represented in memory. The common representations of Unicode are: UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4LE, UCS-4BE, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE Karl Bochert