OpenEuphoria: Forum: Re: Euphoria 2.5 Unicode

Re: Euphoria 2.5 Unicode

new topic » topic index » view thread » older message » newer message

Posted by kbochert at copper.net Dec 05, 2003
482 views

On 5 Dec 2003 at 7:48, Mario Steele wrote:

> 
> 
> Hey Al,
> 
> Thanks for your response, and as for that Unicode thing about it, I do 
> know of a easy way to make that work to, but still allow for a single 
> string type definition.  All you would have to do is do a quick byte 
> scan through the input stream, to see weither the char is in a 0 to 255 
> range, or a 0 to 255*2 range.  A example of this would be:
> 
> function need_2_bytes(object stream)
>     for x = 1 to length(stream) do
>         if stream[x] > 255 and stream[x] < 255*2 then
>             return 1
>         elsif stream[x] > 255*2 then
>             return -1
>         end if
>     end for
>     return 0
> end function
> 
> Oviously, the first character that it runs into, to be unicode, then we 
> don't need to check any further, we assume it's unicode, and allcoate 
> the string as such, and if it returns -1, then we have a type_check 
> error, which means that someone threw in there something that's bigger 
> then 256*2.  And if we get 0 back, then the stream can be put into 
> single byte character holders.  And I'm sure there are faster 
> algorithims out there, that can scan byte wise in a much faster fashion 
> then this.  And the problem is, people don't want to deal with the 
> memory routines themselvs, unless it's like explicitly needed by 
> windows.  They'd rather use Sequences, and their programs get bloated. 
>  That's why Sequences are so popular in Euphoria, cause it get's away 
> from PTRs that are dependant in C/C++.  But again, this is just a simple 
> Programmer writting his two cents in. LOL
> 
> L8ers,
> EuMario

Keep in mind that 'Unicode' refers to a mapping of character images 
to integers, and not how the string is represented in memory.
The common representations of Unicode are:
UCS-2, UCS-2BE, UCS-2LE,
UCS-4, UCS-4LE, UCS-4BE,
UTF-8,
UTF-16, UTF-16BE, UTF-16LE,
UTF-32, UTF-32BE, UTF-32LE

Karl Bochert

OpenEuphoria

Re: Euphoria 2.5 Unicode

Search

Include:

Quick Links

User menu

Misc Menu