Re: String?

new topic     » goto parent     » topic index » view thread      » older message » newer message

Rolf Schröder wrote:
> 
> 
> Hi 'string fans'!
> 
> As I know, a character is a byte that represents a human readable or
> printable symbol. A character string (synonymous: string) is a series of
> characters. i.e., a series of bytes representing human readable|printable
> symbols (words, sentences,...).

Well, that's one interpretation. Another is that a character is any value
in an encoding set, such as ASCII, EBCDIC, or Unicode. Each character in 
the set has a unique value and may have a glyph (displayable
representation).

Not all characters are displayable. Some characters have the same glyph.

> Is it important to differentiate between a general byte series (#00 to #FF)
> and a 'string'?

In some sets, not all character values can be contained in a single byte.
 
> 1) If there are 256 readable|printable symbols assigned to the
>    numbers #00 to #FF, then it's impossible do decide, if you have a
>    'string' or not!
> 
> 2) If you declare at least one byte not to be a readable|printable
>    symbol, then you may declare any byte series of this type as a 'string' in
>    comparison to a generally byte series, which may contain any byte between
>    #00 and #FF. In C, i.e., #00 is assumed to be such a byte, and therefore
>    a byte series ending with the byte #00 is declared as such a type of
>    string (Null terminated string). This makes sense only for specially
>    written 'string handling routines' (stringcmp(), printf(),...), nothing
>    else.
> 
> 3) For I know what I would like to read|write|print, Euphoria gives you the
>    opportunity to decide, what you would like to handle as a 'string' or not.
>    In practice I don't see any necessity to have a so called string type, it
>    makes no real sense. However, if you believe you need it, then use a
>    type function similar like that, what Nicholas Koceja has given as an
>    example.

Well that's one way of looking at things, but its not generic enough.
 
> Do you really think a sting type makes sense in Euphoria? I don't! 

It all depends...

Everything depends on interpretation. An ATOM is just a set of bytes in
RAM that Euphoria has been instructed to interpret in a specific manner.
So are INTEGER and SEQUENCE types. These are also just sets of bytes that 
are interpreted by Euphoria in a specific and documented manner.

If Euphoria was to have a string type, it would be the same deal. It would
just be the coder telling Euphoria to interpret a set of bytes in a specific
manner. The difficulty is deciding what the "specific manner" would be.

For example, we might decide that a string is really a restricted form of
sequence - one that is only allowed to contain 32-bit unsigned integers
that are interpreted as UTF-32 UNICODE characters. In reality, they would
still be a set of bytes in RAM, but now we would have a specific and
documented intepretation of them. Maybe we could chose to have UTF-8
encoding to save RAM usage as a trade off for of extra processing time.

What would be the advantage of this? Well it would mean that Euphoria
would be able to trap assignments of non-Unicode characters to string
elements (characters?). Sure this can be done now with the 'type' system
but a built-in method that is consistant, faster, and automatic is better
than the generic 'type' method.

It would also mean that other built-in and library routines could perform
processing more relevant to the data. Such as displaying the value in
string notation "John" rather than numbers. If we needed to see numbers
we could always assign as string to a sequence (like we can assign an
integer to an atom).

It may also be argued that a string type might lead to fewer bugs in
some applications, less time involved in debugging ('cos its easier to
read strings rather than numbers), and easy take-up for new Euphoria
coders.

What are the costs? Increased complexity in the Euphoria product which
would mean more testing, potentially more bugs, and slower execution
times. The extent of these costs are not measurable at this stage and
probably won't be until strings are actually implemented. 

So in the end, it really depends on whether RDS can risk the costs
for the benefits.

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu