1. String?
Hi 'string fans'!
As I know, a character is a byte that represents a human readable or
printable symbol. A character string (synonymous: string) is a series of
characters. i.e., a series of bytes representing human readable|printable
symbols (words, sentences,...).
Is it important to differentiate between a general byte series (#00 to #FF)
and a 'string'?
1) If there are 256 readable|printable symbols assigned to the
numbers #00 to #FF, then it's impossible do decide, if you have a
'string' or not!
2) If you declare at least one byte not to be a readable|printable
symbol, then you may declare any byte series of this type as a 'string' in
comparison to a generally byte series, which may contain any byte between
#00 and #FF. In C, i.e., #00 is assumed to be such a byte, and therefore
a byte series ending with the byte #00 is declared as such a type of
string (Null terminated string). This makes sense only for specially
written 'string handling routines' (stringcmp(), printf(),...), nothing
else.
3) For I know what I would like to read|write|print, Euphoria gives you the
opportunity to decide, what you would like to handle as a 'string' or not.
In practice I don't see any necessity to have a so called string type, it
makes no real sense. However, if you believe you need it, then use a
type function similar like that, what Nicholas Koceja has given as an
example.
Do you really think a sting type makes sense in Euphoria? I don't!
--
----------------------------------------------------
| Dr.Rolf Schröder | E B |
| Möörkenweg 37 | C |
| 21029 Hamburg | D |
| Deutschland | A |
| Earth |-------------------------------|
| Solar System | Earth Phone : +49-40-724-4650 |
| Milky Way | National Fax: 0721-151-577722 |
| Local Group | mailto:Rolf at RSchr.de |
| Known Universe | http://www.rschr.de |
----------------------------------------------------
2. Re: String?
Rolf wrote:
> Hi 'string fans'!
> As I know, a character is a byte that represents a human readable or
> printable symbol. A character string (synonymous: string) is a series of
> characters. i.e., a series of bytes representing human readable|printable
> symbols (words, sentences,...).
Again: I never heard or read, that the definition of "character" or
"string" depends on the question, whether or not something is printable.
E.g. in BASIC, this is clearly *not* the case. You might also want to
look here:
http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?characters
http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?string
Of course, under certain circumstances, it is useful to know, whether or
not a given string is printable, but it has nothing got to do with its
*definition*.
> Is it important to differentiate between a general byte series (#00 to #FF)
> and a 'string'?
No, this is the same.
> 1) If there are 256 readable|printable symbols assigned to the
> numbers #00 to #FF, then it's impossible do decide, if you have a
> 'string' or not!
>
> 2) If you declare at least one byte not to be a readable|printable
> symbol, then you may declare any byte series of this type as a 'string' in
> comparison to a generally byte series, which may contain any byte between
> #00 and #FF. In C, i.e., #00 is assumed to be such a byte, and therefore
> a byte series ending with the byte #00 is declared as such a type of
> string (Null terminated string). This makes sense only for specially
> written 'string handling routines' (stringcmp(), printf(),...), nothing
> else.
>
> 3) For I know what I would like to read|write|print, Euphoria gives you the
> opportunity to decide, what you would like to handle as a 'string' or not.
> In practice I don't see any necessity to have a so called string type, it
> makes no real sense. However, if you believe you need it, then use a
> type function similar like that, what Nicholas Koceja has given as an
> example.
Like him, you are missing the point.
Using such a user-defined string type doesn't solve the problem: If a
Euphoria program reads e.g. {74,111,104,110} from a file, there is no way
to find out, whether this sequence means "John", or the weight of the
members of my family, or whatever.
Surely any string has to be a sequence of special integers (which can be
checked by such a user-defined type), but not any such sequence is a
string!
Regards,
Juergen
3. Re: String?
Rolf Schröder wrote:
> 3) For I know what I would like to read|write|print, Euphoria gives you the
> opportunity to decide, what you would like to handle as a 'string' or not.
> In practice I don't see any necessity to have a so called string type, it
> makes no real sense. However, if you believe you need it, then use a
> type function similar like that, what Nicholas Koceja has given as an
> example.
This is an 'opportunity' not unlike our recently-enjoyed 'opportunity'
to pay income taxes. It obligates me to do lots of extra work, costing me
time and money, and I seldom if ever see any benefits.
The way I see it, is if in my program I declare
"This is a sequence of human-readable characters\n",
then clearly it was intended to be a sequence of human readable
characters, and Euphoria should be smart enough to *remember* that for
at least a few minutes, so later, when I want to display that sequence,
Euphoria will do so correctly.
If I had intended it to be {84,104,105,115,32,105,115,32,97,32,115...
(perhaps a list of ages or weights or something) then I would have
entered them as {84,104,105,115,32,105,115,32,97,32,115... wouldn't
I?
In the rare instance where someone might want to display the
ASCII equivalents, or do "math" on that sequence, then *that* is where
the programmer should have to go to extra lengths to coerce the
data into some other form. Not every single time he uses it.
> Do you really think a sting type makes sense in Euphoria? I don't!
Absolutely.
When I first started programming, computers were primarily for crunching
numbers, and text was only a secondary concern. That day is long past.
Irv
4. Re: String?
Juergen Luethje wrote:
>
> Rolf wrote:
> ...
> > As I know, a character is a byte that represents a human readable or
> > printable symbol. A character string (synonymous: string) is a series of
> > characters. i.e., a series of bytes representing human readable|printable
> > symbols (words, sentences,...).
>
> Again: I never heard or read, that the definition of "character" or
> "string" depends on the question, whether or not something is printable.
> E.g. in BASIC, this is clearly *not* the case. You might also want to
> look here:
> <a
> href="http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?characters">http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?characters</a>
> <a
> href="http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?string">http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?string</a>
>
Jürgen,
I see just that stated there what I said, may be with different words.
I got this (computer related) definition from: "Dictionary of Computer
Terms" (Webster's) and also from "Computer & Internet Dictionary"
(Random House).
What is a character for you then (computer related)?
Later you wrote:
> Like him, you are missing the point.
> Using such a user-defined string type doesn't solve the problem: If a
> Euphoria program reads e.g. {74,111,104,110} from a file, there is no way
> to find out, whether this sequence means "John", or the weight of the
> members of my family, or whatever.
That's true, specially if a fith byte woul be a zero!
Excuse me, but now I think YOU are missing the point: the decision, if you
want to print it as an ASCII string or if you want to print simply the
numbers, the decision comes by selection the 'tool' YOU select: format {%s}
in printf() gives you the text, and i.e. format {%d,%d,%d,%d} in printf
would give you the plain numbers.
Sincerely, Rolf
5. Re: String?
On Mon, 31 May 2004 09:33:25 -0700, Rolf Schr=F6der
<guest at RapidEuphoria.com> wrote:
>Excuse me, but now I think YOU are missing the point: the decision, if you=
>want to print it as an ASCII string or if you want to print simply the
>numbers, the decision comes by selection the 'tool' YOU select: format {%s=
}
>in printf() gives you the text, and i.e. format {%d,%d,%d,%d} in printf=
=20
>would give you the plain numbers.
>
Hi Rolf,
OK, I agree that in more than 90% of print cases, the programmer can
easily apply the correct format info, however:
Slightly restating the previous example:
sequence weights
weights={74,111,104,110}
sequence name
name="John"
There is additional meaning obvious to anyone reading the source,
which is lost in the assignment. Since equal(weights,name) will return
true, any attempt at an "IsString" function is doomed to get one of
them wrong. Sure you can do something like:
constant tInt=1, tFlt=2, tSeq=3, tStr=4
weights={tSeq,{74,111,104,110}}
name={tStr,"John"}
Which I think is about the easiest way to preserve the semantic
information. Not exactly nice though, is it?
There may not be a whole lot a string type will allow that you cannot
possibly do without. But to imply it has no merit is silly.
Adding strings might more than double the program size and probably
make everything 50% slower, so I could accept an argument against it
on technical grounds.
But being able to read values in the trace window, ex.err, and output
from ?weights and ?name, is an overwhelming argument in favour.
Of course you may actually be the second person on the planet that
actually likes to see name (and weights) in the trace window appear as
{74J,111o,104h,110n} ?
That definitely falls into the class of Necessary Evil, not the realm
of Good Ideas.
Regards,
Pete
6. Re: String?
Juergen Luethje wrote:
>
> Rolf wrote:
>
> > Hi 'string fans'!
>
>
>
> > As I know, a character is a byte that represents a human readable or
> > printable symbol. A character string (synonymous: string) is a series of
> > characters. i.e., a series of bytes representing human readable|printable
> > symbols (words, sentences,...).
>
> Again: I never heard or read, that the definition of "character" or
> "string" depends on the question, whether or not something is printable.
> E.g. in BASIC, this is clearly *not* the case. You might also want to
> look here:
> <a
> href="http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?characters">http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?characters</a>
> <a
> href="http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?string">http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?string</a>
>
> Of course, under certain circumstances, it is useful to know, whether or
> not a given string is printable, but it has nothing got to do with its
> *definition*.
>
> > Is it important to differentiate between a general byte series (#00 to #FF)
> > and a 'string'?
>
> No, this is the same.
>
> > 1) If there are 256 readable|printable symbols assigned to the
> > numbers #00 to #FF, then it's impossible do decide, if you have a
> > 'string' or not!
> >
> > 2) If you declare at least one byte not to be a readable|printable
> > symbol, then you may declare any byte series of this type as a 'string'
> > in
> > comparison to a generally byte series, which may contain any byte between
> > #00 and #FF. In C, i.e., #00 is assumed to be such a byte, and therefore
> > a byte series ending with the byte #00 is declared as such a type of
> > string (Null terminated string). This makes sense only for specially
> > written 'string handling routines' (stringcmp(), printf(),...), nothing
> > else.
> >
> > 3) For I know what I would like to read|write|print, Euphoria gives you the
> > opportunity to decide, what you would like to handle as a 'string' or
> > not.
> > In practice I don't see any necessity to have a so called string type, it
> > makes no real sense. However, if you believe you need it, then use a
> > type function similar like that, what Nicholas Koceja has given as an
> > example.
>
> Like him, you are missing the point.
> Using such a user-defined string type doesn't solve the problem: If a
> Euphoria program reads e.g. {74,111,104,110} from a file, there is no way
> to find out, whether this sequence means "John", or the weight of the
> members of my family, or whatever.
First off, Why shuold the program differenciate from these two? take this
example: The "key.ex" program.
-- find out what numeric key code is generated by any key on the keyboard
-- usage:
-- ex key
integer code
puts(1, "Press any key. I'll show you the key code. Press q to quit\n\n")
while 1 do
code = get_key()
if code != -1 then
<b> printf(1, "The key code is: %d\n", code)</b>
if code = 'q' then
exit
end if
end if
end while
In the program, it thinks of code as an 'integer'. With a string typpe, you
could NOT have code as integer, it would have to be OBJECT, since a Chaacter
could be aan integer, or a String. the key statement is in bold. If there
was a String type, this statement would cause an error. Furthermore, if you
defined double-quotes as a "String", then the '%d' would cause the error, since
a "string" can not have a decimal value. If you make it so that a "string"
can have one, you would be destroying what you tried to prove in the first
place.
Take a look at this:
sequence string, seq
atom at
string = ""
seq = {}
procedure same(object a)
if equal(string, a) then
puts(1, "TRUE\n\n")
else
puts(1, "FALSE\n\n")
end if
end procedure
puts(1, "\"\" = {}\n")
same(seq)
puts(1, "\"abcd\" = {\'a\',\'b\',\'c\',\'d\'}\n")
string = "abcd"
seq = {'a','b','c','d'}
same(seq)
puts(1, "\"a\" = \'a\'\n")
string = "a"
at = 'a'
same(at)
while get_key() = -1 do
end while
This may make you think that there is no differenciation. However, the
program will output the following, then wait for a key to be pressed
before exiting:
"" = {}
TRUE
"abcd" = {'a','b','c','d'}
TRUE
"a" = 'a'
FALSE
An atom, 'a', is not possible if quotes were strings.
To tell you the honest truth, a "string" type would be pointless, and would
make Euphoria a tad slower. If you were to include a string type, you would
have to change how Euphoria looks at sequences, and "strings", for that matter.
As you all know, Euphoria is a fast, and easy to understand (Not always easy to
learn) Programming language. As it states in "ed.doc", Euphoria uses pure
numbers. The smallet unit is an "atom". Atoms and Sequences are the ony two
explicitly defined types. An Integer is a type of atom, and an object can be a
sequence, as well as an atom. What I think we need to find out, is this
Question:
"What exactly is a String?" If a string is just defined as in BASIC: "A string
of
letters or numnbers." then yes, we could benifit from it. However, if a string
is defined as a sequence of characters, then ANY seqence of positive integers
would
fit the description. Although the character page only goes to 255, you can
"print"
ANY positive integer. Try it yourself. Therefdore, the question of wether a
string
type should be included or not all depends on the answer to the question:
"What exactly defines a String?"
> Surely any string has to be a sequence of special integers (which can be
> checked by such a user-defined type), but not any such sequence is a
> string!
>
> Regards,
> Juergen
>
>
| Programs Incomplete: | 20-30 |
| Operating System: | Windows XP |
7. Re: String?
Rolf Schröder wrote:
>
>
> Hi 'string fans'!
>
> As I know, a character is a byte that represents a human readable or
> printable symbol. A character string (synonymous: string) is a series of
> characters. i.e., a series of bytes representing human readable|printable
> symbols (words, sentences,...).
Well, that's one interpretation. Another is that a character is any value
in an encoding set, such as ASCII, EBCDIC, or Unicode. Each character in
the set has a unique value and may have a glyph (displayable
representation).
Not all characters are displayable. Some characters have the same glyph.
> Is it important to differentiate between a general byte series (#00 to #FF)
> and a 'string'?
In some sets, not all character values can be contained in a single byte.
> 1) If there are 256 readable|printable symbols assigned to the
> numbers #00 to #FF, then it's impossible do decide, if you have a
> 'string' or not!
>
> 2) If you declare at least one byte not to be a readable|printable
> symbol, then you may declare any byte series of this type as a 'string' in
> comparison to a generally byte series, which may contain any byte between
> #00 and #FF. In C, i.e., #00 is assumed to be such a byte, and therefore
> a byte series ending with the byte #00 is declared as such a type of
> string (Null terminated string). This makes sense only for specially
> written 'string handling routines' (stringcmp(), printf(),...), nothing
> else.
>
> 3) For I know what I would like to read|write|print, Euphoria gives you the
> opportunity to decide, what you would like to handle as a 'string' or not.
> In practice I don't see any necessity to have a so called string type, it
> makes no real sense. However, if you believe you need it, then use a
> type function similar like that, what Nicholas Koceja has given as an
> example.
Well that's one way of looking at things, but its not generic enough.
> Do you really think a sting type makes sense in Euphoria? I don't!
It all depends...
Everything depends on interpretation. An ATOM is just a set of bytes in
RAM that Euphoria has been instructed to interpret in a specific manner.
So are INTEGER and SEQUENCE types. These are also just sets of bytes that
are interpreted by Euphoria in a specific and documented manner.
If Euphoria was to have a string type, it would be the same deal. It would
just be the coder telling Euphoria to interpret a set of bytes in a specific
manner. The difficulty is deciding what the "specific manner" would be.
For example, we might decide that a string is really a restricted form of
sequence - one that is only allowed to contain 32-bit unsigned integers
that are interpreted as UTF-32 UNICODE characters. In reality, they would
still be a set of bytes in RAM, but now we would have a specific and
documented intepretation of them. Maybe we could chose to have UTF-8
encoding to save RAM usage as a trade off for of extra processing time.
What would be the advantage of this? Well it would mean that Euphoria
would be able to trap assignments of non-Unicode characters to string
elements (characters?). Sure this can be done now with the 'type' system
but a built-in method that is consistant, faster, and automatic is better
than the generic 'type' method.
It would also mean that other built-in and library routines could perform
processing more relevant to the data. Such as displaying the value in
string notation "John" rather than numbers. If we needed to see numbers
we could always assign as string to a sequence (like we can assign an
integer to an atom).
It may also be argued that a string type might lead to fewer bugs in
some applications, less time involved in debugging ('cos its easier to
read strings rather than numbers), and easy take-up for new Euphoria
coders.
What are the costs? Increased complexity in the Euphoria product which
would mean more testing, potentially more bugs, and slower execution
times. The extent of these costs are not measurable at this stage and
probably won't be until strings are actually implemented.
So in the end, it really depends on whether RDS can risk the costs
for the benefits.
--
Derek Parnell
Melbourne, Australia
8. Re: String?
On Mon, 31 May 2004 14:27:12 -0700, Nicholas Koceja
<guest at RapidEuphoria.com> wrote:
>First off, Why shuold the program differenciate from these two?
As per my previous post, not much apart from the loss of the obvious,
"common sense" meaning which is plain discarded, at a low level.
>take this example:
> The "key.ex" program.
>}}}
<eucode>
>-- find out what numeric key code is generated by any key on the keyboard
>-- usage:
>-- ex key
>
>integer code
>
>puts(1, "Press any key. I'll show you the key code. Press q to quit\n\n")
>while 1 do
> code = get_key()
> if code != -1 then
><b> printf(1, "The key code is: %d\n", code)</b>
> if code = 'q' then
> exit
> end if
> end if
>end while
></eucode>
{{{
>In the program, it thinks of code as an 'integer'. With a string typpe, you
>could NOT have code as integer, it would have to be OBJECT, since a Chaacter
>could be aan integer, or a String. the key statement is in bold. If there
>was a String type, this statement would cause an error.
You lost me completely.
<aside>I would fight tooth and nail against a "Character" type... can
we just agree that is not the issue here?</aside>
.
Having a string type won't cause an error; defining some variable as a
string when it gets assigned either a string or an integer would, but
no different to defining a sequence var and assigning an int to it.
Supplying a string param to printf when an integer was expected also
might cause an error, but why any more so than a sequence?
> Furthermore, if you
>defined double-quotes as a "String", then the '%d' would cause the error, since
>a "string" can not have a decimal value. If you make it so that a "string"
>can have one, you would be destroying what you tried to prove in the first
>place.
I fail to understand what you are trying to say here either ;-((
>Take a look at this:
>}}}
<eucode>
>sequence string, seq
>atom at
>
>string = ""
>seq = {}
>procedure same(object a)
> if equal(string, a) then
> puts(1, "TRUE\n\n")
> else
> puts(1, "FALSE\n\n")
> end if
>end procedure
>
>puts(1, "\"\" = {}\n")
>same(seq)
>puts(1, "\"abcd\" = {\'a\',\'b\',\'c\',\'d\'}\n")
>string = "abcd"
>seq = {'a','b','c','d'}
>same(seq)
>puts(1, "\"a\" = \'a\'\n")
>string = "a"
>at = 'a'
>same(at)
>while get_key() = -1 do
> end while
></eucode>
{{{
>This may make you think that there is no differenciation.
Not for a moment. A string and a character (which I would prefer to
keep as an integer) are indeed different. (I admit that I made that
mistake once (implementing the now shelved pleumage), never again)
>To tell you the honest truth, a "string" type would be pointless
It might not add much extra functionality, in truth I agree.
It would add a welcome degree of readability on the diagnostic side..
>, and would make Euphoria a tad slower.
By a fair margin, I reckon.
> If you were to include a string type, you would
>have to change how Euphoria looks at sequences, and "strings", for that matter.
>As you all know, Euphoria is a fast, and easy to understand (Not always easy to
>learn) Programming language. As it states in "ed.doc", Euphoria uses pure
>numbers. The smallet unit is an "atom". Atoms and Sequences are the ony two
>explicitly defined types. An Integer is a type of atom, and an object can be a
>sequence, as well as an atom. What I think we need to find out, is this
>Question:
>"What exactly is a String?" If a string is just defined as in BASIC: "A string
>of
>letters or numnbers." then yes, we could benifit from it. However, if a
>string
>is defined as a sequence of characters, then ANY seqence of positive integers
>would
>fit the description. Although the character page only goes to 255, you can
>"print"
>ANY positive integer.
Well, yes, but that is another bit of a fudge and a clear indication
that unicode support is a country mile away...
> Try it yourself. Therefdore, the question of wether a string
>type should be included or not all depends on the answer to the question:
>"What exactly defines a String?"
The fundamental thing I would claim is that if it looks like a string
in the code you write, it is a string; if it looks like a sequence of
numbers, it is a sequence of numbers.
The point is to tally what the code reads like and how the trace
window and the ? primitive operate.
There should indeed be some (hopefully rarely used) functions to
convert between the two, just in case you ever need them..
Pete
9. Re: String?
Derek Parnell wrote:
<snip>
> What would be the advantage of this? Well it would mean that Euphoria
> would be able to trap assignments of non-Unicode characters to string
> elements (characters?). Sure this can be done now with the 'type' system
> but a built-in method that is consistant, faster, and automatic is better
> than the generic 'type' method.
Obviously, the user 'type' system does slow things down (significantly)
when doing something like checking a long string character-by-character.
So it would be better if it were built into Eu itself - like the integer
checking. Add to that the fact that user-written type checking does
nothing to simplify or eliminate errors when doing output, so it's a
half-solution at best.
Consider this, however:
Strings can reasonably be expected to be entered either in the source
code, where no one in their right mind would go to the trouble to do it
this way: name = {74,111,104,110}
or via keyboard, where they get entered character-by-character -
meaning that if you actually were to type '{74,111,104,110}' and
hit enter, you would not have anything resembling "John".
It seems to me that it shouldn't be too difficult to say that
a leading '{' could always tag the input as a sequence of objects,
while a leading '"' could tag it as a string.
Just like:
constant a = 'A'
constant b = "A"
Note that these produce different results, based solely on whether
a single or a double quote is used.
Makes me wonder why, if that can be done, what is it so hard to
differntiate between a double quote and a curly bracket?
In neither of these cases would there be any significant slowdown,
the first would be done at parse time, and the second would be limited by
typing speed. Only in the (rare) instances where math is performed on
strings would there be any chance of slowness actually being a factor.
> It would also mean that other built-in and library routines could perform
> processing more relevant to the data. Such as displaying the value in
> string notation "John" rather than numbers.
And it would do away with the multiple-choice quiz everytime we want
to output something. This is a major point of confusion for newcomers
to Euphoria, and just causes extra work for all of us.
> So in the end, it really depends on whether RDS can risk the costs
> for the benefits.
I think it depends more on whether the design of the language allows
for another type. There was discussion on this list years ago regarding
this, and I believe a bit of detective work indicated that it was not
possible.
Regards,
Irv
10. Re: String?
Irv Mullins wrote:
> Makes me wonder why, if that can be done, what is it so hard to
> differntiate between a double quote and a curly bracket?
By ensuring that pointers fall on four byte boundaries, and dropping the
precision of integers, Euphoria frees up a couple bits in the C int datatype,
which it uses to flag the type stored inside, which is something like:
- positive integer
- negative integer
- pointer to atom
- pointer to sequence
- undefined
Since there's no remaining bits in the int that can be used to flag the
datatype, the only other "simple" option would be to add an extra field to
the sequence structure.
The addition of the field wouldn't be too expensive, but you'd then have to
perform an additional test on sequences to determine:
1. Is the sequence a string?
2. If it's a string, is it still a string after the last operation?
Some operations - concatenation and slicing - would be "free", since you
guarantee that the data in the sequence would still be a string. But for
other operations - bitwise, math and comparison - you'd have to scan the
string to ensure that it was still a valid string.
Since any sequence could possibly be a string (you don't know until you test
it), Euphoria would have to perform at least the first test on all sequences.
This is guaranteed to slow things down a bit, I think that'll be a hard thing
to sell to Robert.
-- David Cuny
11. Re: String?
David Cuny wrote:
> By ensuring that pointers fall on four byte boundaries, and dropping the
> precision of integers, Euphoria frees up a couple bits in the C int datatype,
> which it uses to flag the type stored inside, which is something like:
>
> - positive integer
> - negative integer
> - pointer to atom
> - pointer to sequence
> - undefined
>
> Since there's no remaining bits in the int that can be used to flag the
> datatype, the only other "simple" option would be to add an extra field to
> the sequence structure.
Thanks, that confirms my suspicion. So I guess we're out of luck with
regard to strings, structures, or similar things until everyone is
running 64-bit systems?
Then we'll either get higher precision integers, or 4294967296 new data types!
Perhaps a compromise would be in order.
Irv
12. Re: String?
Pete wrote:
> On Mon, 31 May 2004 09:33:25 -0700, Rolf Schroeder
> <guest at RapidEuphoria.com> wrote:
>
>> Excuse me, but now I think YOU are missing the point: the decision, if you
>> want to print it as an ASCII string or if you want to print simply the
>> numbers, the decision comes by selection the 'tool' YOU select: format {%s}
>> in printf() gives you the text, and i.e. format {%d,%d,%d,%d} in printf
>> would give you the plain numbers.
I know, Rolf. That's exactly what I (and other people, too) don't like.
Mainly because in some situations, we don't have the possibility to
select anything: this applies to trace() and to output to the "ex.err"
file.
Also, RDS claims that Euphoria is simpler than BASIC. While this might
be true in general, in BASIC we can do this:
dim s as string, i as integer
s = "My age is"
i = 99
print s i
That's what I call simple.
In Euphoria, it's currently not possible to have a generic output
routine such as 'print' in BASIC, because sometimes only the programmer
(and not the program) knows, what a given sequence means. Although I
like Euphoria's pretty_print(), and Pete's version IMHO does even
smarter guessing, any output routine sometimes can't do anything else
than *guess*, what it should do. This is not satisfactory, IMHO.
> Hi Rolf,
>
> OK, I agree that in more than 90% of print cases, the programmer can
> easily apply the correct format info, however:
>
> Slightly restating the previous example:
>
> sequence weights
> weights={74,111,104,110}
> sequence name
> name="John"
>
> There is additional meaning obvious to anyone reading the source,
> which is lost in the assignment. Since equal(weights,name) will return
> true, any attempt at an "IsString" function is doomed to get one of
> them wrong.
Yes, and using a user-defined string type does *not* solve the problem.
> Sure you can do something like:
>
> constant tInt=1, tFlt=2, tSeq=3, tStr=4
>
> weights={tSeq,{74,111,104,110}}
> name={tStr,"John"}
>
> Which I think is about the easiest way to preserve the semantic
> information. Not exactly nice though, is it?
No, not too nice. And it also doesn't make the output of trace() and the
output to "ex.err" better readable.
But something like that is what I would Euphoria like to do *internally*
(if the cost is not too big).
> There may not be a whole lot a string type will allow that you cannot
> possibly do without. But to imply it has no merit is silly.
>
> Adding strings might more than double the program size and probably
> make everything 50% slower, so I could accept an argument against it
> on technical grounds.
Mee too.
> But being able to read values in the trace window, ex.err, and output
> from ?weights and ?name, is an overwhelming argument in favour.
>
> Of course you may actually be the second person on the planet that
> actually likes to see name (and weights) in the trace window appear as
> {74J,111o,104h,110n} ?
>
> That definitely falls into the class of Necessary Evil, not the realm
> of Good Ideas.
Regards,
Juergen
13. Re: String?
I think that the string type would be very useful.
It would be good if it could be implimented much like this:
string myStr
sequence mySeq
myStr = "John"
myStr = {74,111,104,110}
myStr &= get_key()
if myStr = "John" then --this could be invaluable, I hate having to
type equal()
mySeq = "John" -- should still display as {74,111,104,110}
because it is declared as a string, it doesnt matter how you assign it,
it will always be shown as a string.. By the same token, you should
still be able to assign double quotes to a sequence, only it is
displayed as integers.
if you ever see the need to output the integer values of a string, or
vice versa, then you could just assign it to the relevant data type, or
have a VB style CStr() or CSeq() function.
The major problem i have with sequences is the fact that you need to use
equal() for simple strings.
StewartML,
Scotland
14. Re: String?
StewartML wrote:
>
> The major problem i have with sequences is the fact that you need to use
> equal() for simple strings.
function e( sequence x, sequence y )
return equal(x,y)
end function
sequence seq1, seq2
if e(seq1, seq2) then end if
I just saved you 57% typing!!!
I don't know how bad a speed hit this takes, though.
Yes, you could save 75% from my method with
seq1 = seq2
:)
15. Re: String?
On Tue, 01 Jun 2004 20:00:15 +0000, StewartML <Stewart at isoclass.co.uk>
wrote:
<snip>
>if you ever see the need to output the integer values of a string,
That particular need could be easily met, eg:
sequence s
string t
t="hello"
s=repeat(0,length(t))
for i=1 to length(t) do
s[i]=t[i]
end for
As an application programmer, I would have no qualms with having to do
something like that to "rip apart" a string, and of course if you do
need such in several places, it is trivial to code as a function.
>or vice versa, then you could just assign it to the relevant data type,
I think you can guess how I think a sequence could easily be converted
into a proper string (with integer(), >0, and <256 checks, of course)
It may be tempting to think you can automate such conversions, but I
think that will (may) cause problems, and is not really needed.
(btw, thanks - that just cleared up a few things for me)
>
>The major problem i have with sequences is the fact that you need to use
>equal() for simple strings.
Tell me about it
Apart from the way upper() and lower() are
currently implemented (which cannot in anyones mind be the best), just
how often are =, !=, <, <=, >, >= actually used as sequence ops?
Regards,
Pete
16. Re: String?
Pete Lomax wrote:
>
> On Tue, 01 Jun 2004 20:00:15 +0000, StewartML <Stewart at isoclass.co.uk>
> wrote:
>
> <snip>
> >if you ever see the need to output the integer values of a string,
> That particular need could be easily met, eg:
> sequence s
> string t
> t="hello"
> s=repeat(0,length(t))
> for i=1 to length(t) do
> s[i]=t[i]
> end for
>
> As an application programmer, I would have no qualms with having to do
> something like that to "rip apart" a string, and of course if you do
> need such in several places, it is trivial to code as a function.
I would have thought that all one needed to do was ...
sequence s
string t
t="hello"
s=t
This is like what one does for integers and atoms.
atom x
integer y
y=1
x=y
As a string is really a subset of a sequence, converting it to a sequence
should be a trival effort for the interpreter.
> >or vice versa, then you could just assign it to the relevant data type,
> I think you can guess how I think a sequence could easily be converted
> into a proper string (with integer(), >0, and <256 checks, of course)
Your 'proper string' is still only good for certain subsets of strings. It
wouldn't work for Unicode strings. And ASCII or EBCDIC strings either as you
exclude the NUL character.
The NUL is a valid character. For example many printers and modems need
it in their control strings.
> It may be tempting to think you can automate such conversions, but I
> think that will (may) cause problems, and is not really needed.
>
> (btw, thanks - that just cleared up a few things for me)
>
> >
> >The major problem i have with sequences is the fact that you need to use
> >equal() for simple strings.
> Tell me about it
Apart from the way upper() and lower() are
> currently implemented (which cannot in anyones mind be the best), just
> how often are =, !=, <, <=, >, >= actually used as sequence ops?
I'm with you on this one though. The upper/lower functions are only good
for ASCII encoding. Try this on for size ...
? lower({74.5, 104.01})
? upper({74.5, 104.01})
RDS is concerned that some requested features for Euphoria might be
little used and thus not really worth the trouble of adding them...such
as using relationship operators as if they were sequence operations
--
Derek Parnell
Melbourne, Australia
17. Re: String?
On Tue, 01 Jun 2004 16:34:38 -0700, Derek Parnell
<guest at RapidEuphoria.com> wrote:
>Try this on for size ...
>
> ? lower({74.5, 104.01})
> ? upper({74.5, 104.01})
LOL
It makes a bit more sense if you write it like this:
?lower({'J'+0.5,'h'+0.1})
?{'j'+0.5,'h'+0.1}
?upper({'J'+0.5,'h'+0.1})
?{'J'+0.5,'H'+0.1}
18. Re: String?
On 2 Jun 2004 11:47:25 +0200, Christian Cuvier
<Christian.CUVIER at agriculture.gouv.fr> wrote:
>}}}
<eucode>
>
>--find first pair of adjacent distinct values:
>p=find(0,(s&s[length(s)] = prepend(s,s[1]))-1 -- -1 if no such pair
There is a syntax error in that, and even if I fix it a simple for
loop is about twice as fast when s={1,1,1,1,1,1,1,1,1,2}
>--find first repeated value:
>p=find(1,(s&s[length(s)] = prepend(s,s[1]))-1 -- -1 if no such pair
Another syntax error, and that does not work at all. If I fix it so it
does, a for loop is still twice as fast when s={1,2,3,4,5,6,7,7,8,9}
>--find first mismatch:
>p=find(0,(s1 = s2))
That one, OK, I grant you is slightly faster. and somewhat easier to
type. You could still code it as p=find(0,eq(s1,s2)) though, provided
there was a new builtin eq() function to replace the sequence op..
>--is a sequence strictly increasing?
>p=find(1,(s&(s[length(s)]+1) >= prepend(s,s[1]-1)) --0 means yes
Again, a simple for loop is over twice as fast.
>
></eucode>
{{{
>
>Could go on for pages. These operators are quite useful in conjunction with
>find(). match() allows even niftier tricks.
Well, now that I have just tested them, I know they are not as fast as
people like to think they are, and they can easily be replaced with a
builtin function.
Pete
19. Re: String?
On Tue, 01 Jun 2004 07:50:08 -0700, irv mullins
<guest at RapidEuphoria.com> wrote:
>Add to that the fact that user-written type checking does
>nothing to simplify or eliminate errors when doing output, so it's a
>half-solution at best.
I still don't get that.
20. Re: String?
Pete Lomax wrote:
>
> On Tue, 01 Jun 2004 07:50:08 -0700, irv mullins
> <guest at RapidEuphoria.com> wrote:
>
> >Add to that the fact that user-written type checking does
> >nothing to simplify or eliminate errors when doing output, so it's a
> >half-solution at best.
> I still don't get that.
It really isn't that complicated:
Consider the following -
a = 12
b = 23.6609
c = "Hello"
d = [1,2,6,"Hi"]
>>> print a, b, c, d
12 23.6609 "Hello" [1,2,6,"Hi"]
That's python. You don't have to come up with the "right" function
to properly print variables, python manages to keep track for itself
what is a string and what is an integer, a float, or a sequence.
Lua does much the same.
Both, of course also have a printf() type of func for when you actually
need special formatting, and like Euphoria's printf(), they are a bit slower
than print.
Now, without using printf, let's see you get Euphoria to display
the contents of variable d:
constant d = {1,2,"Hi"}
print doesn't work, it displays: {1,2,{72,105}} - where's the "Hi"?
? doesn't work either, it displays:
{
1,
2,
{72,105}
}
puts() won't even run:
test.exu:10
sequence found inside character string
--> see ex.err
So not only do you have to pick and choose the correct output function
for each variable (and each member of the variable) separately, but if
the contents of a variable change, or the nesting changes, you have to
go back and rewrite every line that outputs that variable.
Try changing {1,2,"Hi"} to {1,2,{"Hello","World"}} and see if it still
works. Not even prinf() will help you here.
printf(1,"%d %d %s %s\n",{d[1],d[2],d[3][1],d[3][2]})
Now make it {1,2,3,{"Hello","World"}} and see what happens.
Does anyone think that meets the definition of "simple"?
Python, Lua, and several other languages handle this in a
straightforward manner, even though they do not have typed
variables. Surely if Euphoria is going to make us declare types,
it could make use of that information later. And user-written
type checking isn't going to help.
By the way, no one need bring up the "but that would make
Euphoria slower" argument. I have already benchmarked Euphoria and Lua on
output, and Lua wins handily.
Irv
21. Re: String?
On Wed, 02 Jun 2004 15:43:14 -0700, irv mullins
<guest at RapidEuphoria.com> wrote:
>It really isn't that complicated:
Now I see.
>By the way, no one need bring up the "but that would make
>Euphoria slower" argument. I have already benchmarked Euphoria and Lua on
>output, and Lua wins handily.
ex.exe is about 50 times faster than exw.exe for console display...
Pete
22. Re: String?
Pete Lomax wrote:
>
> On Tue, 01 Jun 2004 07:50:08 -0700, irv mullins
> <guest at RapidEuphoria.com> wrote:
>
> >Add to that the fact that user-written type checking does
> >nothing to simplify or eliminate errors when doing output, so it's a
> >half-solution at best.
> I still don't get that.
Error detection is not the same as error prevention. The fact that one finds an
error
does not stop the cause of the error.
--
Derek Parnell
Melbourne, Australia
23. Re: String?
Support for Euphoria from the creator of C++?
There are only two kinds of programming languages: those people always bitch
about and those nobody uses
--Bjarne Stroustrup
24. Re: String?
On Wed, 02 Jun 2004 17:32:46 -0700, Derek Parnell
<guest at RapidEuphoria.com> wrote:
>Pete Lomax wrote:
>>
>> On Tue, 01 Jun 2004 07:50:08 -0700, irv mullins
>> <guest at RapidEuphoria.com> wrote:
>>
>> >Add to that the fact that user-written type checking does
>> >nothing to simplify or eliminate errors when doing output, so it's a
>> >half-solution at best.
>> I still don't get that.
>
>Error detection is not the same as error prevention. The fact that one finds an
>error
>does not stop the cause of the error.
As I now understand it, irv was not talking about errors at all.
He was talking about natural expression.
Which I got already.
Regards,
Pete
25. Re: String?
On Wed, 02 Jun 2004 18:04:06 -0700, Evan Marshall
<guest at RapidEuphoria.com> wrote:
>There are only two kinds of programming languages: those people always bitch
>about and those nobody uses
>--Bjarne Stroustrup
Saw him at the Lakeside, when he beat the Crafty Cockney by four sets
I'll get me coat
Pete