1. Strings

------=_NextPart_000_00C6_01BF4299.B05BD0E0
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I really want to stay out of the war over strings, but I thought I'd =
share the string type I use--easier to read than the others I've seen in =
these posts, but should be comparably efficient:

global type string(object theObject)
object temp
if not sequence(theObject) return 0 end if
for i=3D1 to length(theObject) do
    temp=3DtheObject[i]
    if not integer(temp) or temp<0 or temp>255 then return 0 end if=20
end for
return 1
end type

--Mike Nelson

------=_NextPart_000_00C6_01BF4299.B05BD0E0
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD W3 HTML//EN">
<HTML>
<HEAD>

<META content=3Dtext/html;charset=3Diso-8859-1 =
http-equiv=3DContent-Type>
<META content=3D'"MSHTML 4.72.3110.7"' name=3DGENERATOR>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT color=3D#000000 size=3D2>I really want to stay out of the war =
over=20
strings, but I thought I'd share the string type I use--easier to read =
than the=20
others I've seen in these posts, but should be comparably=20
efficient:</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT color=3D#000000 size=3D2>global type string(object=20
theObject)</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2></FONT><FONT size=3D2>object =
temp</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>if not sequence(theObject) return 0 =
end=20
if</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>for i=3D1 to length(theObject) =
do</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>&nbsp;&nbsp;&nbsp;=20
temp=3DtheObject[i]</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>&nbsp;&nbsp;&nbsp; if not =
integer(temp) or=20
temp&lt;0 or temp&gt;255 then return 0 end if </FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>end for</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>return 1</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>end type</FONT></DIV>
<DIV>&nbsp;</DIV>

------=_NextPart_000_00C6_01BF4299.B05BD0E0--

new topic     » topic index » view message » categorize

2. Re: Strings

There is a war over strings?

----- Original Message -----
From: Michael Nelson <mike-nelson-ODAAT at WORLDNET.ATT.NET>
To: <EUPHORIA at LISTSERV.MUOHIO.EDU>
Sent: Friday, December 10, 1999 1:04 AM
Subject: Strings


I really want to stay out of the war over strings, but I thought I'd share
the string type I use--easier to read than the others I've seen in these
posts, but should be comparably efficient:

global type string(object theObject)
object temp
if not sequence(theObject) return 0 end if
for i=1 to length(theObject) do
    temp=theObject[i]
    if not integer(temp) or temp<0 or temp>255 then return 0 end if
end for
return 1
end type

--Mike Nelson

new topic     » goto parent     » topic index » view message » categorize

3. Strings

Okay, so I'm going to propose two things with regards to strings. Even though I
said that I wouldn't propose new stuff. Plus, I suppose it could be handled by
ESL (once we get around to it) if it isn't implemented internally (preferred0.

A string built-in data type with:
Byte-size ASCII strings. For Kat, since she can't have goto.
Unicode UTF-8 strings.
One built-in type should be able to handle both.

Atom has integer as a subclass for efficiency. I think that sequence can have
string as a subclass as well, since strings are a "basic" type in most
programming projects. Strings can be up-cast to sequences, like integers can be
up-cast to atoms.

One question: are string constants stored as byte-strings or as DWORD-strings in
the Euphoria interpreter?

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Strings

Jason Gade wrote:
> 
> Okay, so I'm going to propose two things with regards to strings. Even though
> I said that I wouldn't propose new stuff. Plus, I suppose it could be handled
> by ESL (once we get around to it) if it isn't implemented internally
> (preferred0.
> 
> A string built-in data type with:
> Byte-size ASCII strings. For Kat, since she can't have goto.
> Unicode UTF-8 strings.
> One built-in type should be able to handle both.
> 
> Atom has integer as a subclass for efficiency. I think that sequence can have
> string as a subclass as well, since strings are a "basic" type in most
> programming
> projects. Strings can be up-cast to sequences, like integers can be up-cast
> to atoms.

I would be very happy if this was implemented! smile Is there any reason to not
have built-in strings?

~Ryan W. Johnson

Fluid Application Environment
http://www.fluidae.com/

[cool quote here, if i ever think of one...]

new topic     » goto parent     » topic index » view message » categorize

5. Re: Strings

Ryan W. Johnson wrote:
> 
> Jason Gade wrote:
> > 
> > Okay, so I'm going to propose two things with regards to strings. Even
> > though
> > I said that I wouldn't propose new stuff. Plus, I suppose it could be
> > handled
> > by ESL (once we get around to it) if it isn't implemented internally
> > (preferred0.
> > 
> > A string built-in data type with:
> > Byte-size ASCII strings. For Kat, since she can't have goto.
> > Unicode UTF-8 strings.
> > One built-in type should be able to handle both.
> > 
> > Atom has integer as a subclass for efficiency. I think that sequence can
> > have
> > string as a subclass as well, since strings are a "basic" type in most
> > programming
> > projects. Strings can be up-cast to sequences, like integers can be up-cast
> > to atoms.
> 
> I would be very happy if this was implemented! smile Is there any
> reason to not have built-in strings?

I admit I was tired and bored last night when I posted that. I've been thinking
about it all morning.

One reason for *not* having built-in strings is that sequences handle 99% of the
functionality of strings already.

This proposal would get more complicated when you want sequences of strings as
well. The string type would only be able to apply to a single-level sequence.

But a question that occurs to me is what percentage of sequences in any given
Euphoria application represent text strings?

I think it really only matters for efficiency when working with large amounts of
text data. Because sequence elements are 4-bytes each.

So, basically, I retract my proposal. smile

But it was helpful for reminding me of what features I want to see in an
Euphoria Standard Library string module.

> ~Ryan W. Johnson
> 
> Fluid Application Environment
> <a href="http://www.fluidae.com/">http://www.fluidae.com/</a>
> 
> [cool quote here, if i ever think of one...]


--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Strings

Jason Gade wrote:
> 
> Ryan W. Johnson wrote:
> > 
> > Jason Gade wrote:
> > > 
> > > Okay, so I'm going to propose two things with regards to strings. Even
> > > though
> > > I said that I wouldn't propose new stuff. Plus, I suppose it could be
> > > handled
> > > by ESL (once we get around to it) if it isn't implemented internally
> > > (preferred0.
> > > 
> > > A string built-in data type with:
> > > Byte-size ASCII strings. For Kat, since she can't have goto.
> > > Unicode UTF-8 strings.
> > > One built-in type should be able to handle both.
> > > 
> > > Atom has integer as a subclass for efficiency. I think that sequence can
> > > have
> > > string as a subclass as well, since strings are a "basic" type in most
> > > programming
> > > projects. Strings can be up-cast to sequences, like integers can be
> > > up-cast
> > > to atoms.
> > 
> > I would be very happy if this was implemented! smile Is there any
> > reason to not have built-in strings?
> 
> I admit I was tired and bored last night when I posted that. I've been
> thinking
> about it all morning.
> 
> One reason for *not* having built-in strings is that sequences handle 99% of
> the functionality of strings already.
> 
> This proposal would get more complicated when you want sequences of strings
> as well. The string type would only be able to apply to a single-level
> sequence.
> 
> But a question that occurs to me is what percentage of sequences in any given
> Euphoria application represent text strings?
> 
> I think it really only matters for efficiency when working with large amounts
> of text data. Because sequence elements are 4-bytes each.
> 
> So, basically, I retract my proposal. smile
> 
> But it was helpful for reminding me of what features I want to see in an
> Euphoria
> Standard Library string module.
> 
> > ~Ryan W. Johnson
> > 
> > Fluid Application Environment
> > <a href="http://www.fluidae.com/">http://www.fluidae.com/</a>
> > 
> > [cool quote here, if i ever think of one...]
> 
> 
> --
> "Any programming problem can be solved by adding a level of indirection."
> --anonymous
> "Any performance problem can be solved by removing a level of indirection."
> --M. Haertel
> "Premature optimization is the root of all evil in programming."
> --C.A.R. Hoare
> j.


Hi again,


You're right in that the main advantage to having a string type
would be mostly in making a text editor, where there is so much
text the memory savings would be great, but yes, quite a few
apps wont benefit much with *that* kind of definition of 'string'.
There is, however, another definition of 'string' where it's actually
a memory element:

string s
s="My Window"

where internally s is a pointer to the string, so it can be passed
to a C function like:

x=CreateWindow(s,...)

without having the bother of s=allocate_string("My Window").

This would mean other function such as 'printf(..)' would
have to also support this new kind of data type:

printf(1,"%s\n",{s})

where Euphoria would recognize 's' as a memory string object and
make the necessary call to print that type of object rather than
say a sequence string.



Take care,
Al

And, good luck with your Euphoria programming!

My bumper sticker: "I brake for LED's"

new topic     » goto parent     » topic index » view message » categorize

7. Re: Strings

Al Getz wrote:
> 
> Jason Gade wrote:
> > 
> > Ryan W. Johnson wrote:
> > > 
> > > Jason Gade wrote:
> > > > 
> > > > Okay, so I'm going to propose two things with regards to strings. Even
> > > > though
> > > > I said that I wouldn't propose new stuff. Plus, I suppose it could be
> > > > handled
> > > > by ESL (once we get around to it) if it isn't implemented internally
> > > > (preferred0.
> > > > 
> > > > A string built-in data type with:
> > > > Byte-size ASCII strings. For Kat, since she can't have goto.
> > > > Unicode UTF-8 strings.
> > > > One built-in type should be able to handle both.
> > > > 
> > > > Atom has integer as a subclass for efficiency. I think that sequence can
> > > > have
> > > > string as a subclass as well, since strings are a "basic" type in most
> > > > programming
> > > > projects. Strings can be up-cast to sequences, like integers can be
> > > > up-cast
> > > > to atoms.
> > > 
> > > I would be very happy if this was implemented! smile Is there any
> > > reason to not have built-in strings?
> > 
> > I admit I was tired and bored last night when I posted that. I've been
> > thinking
> > about it all morning.
> > 
> > One reason for *not* having built-in strings is that sequences handle 99% of
> > the functionality of strings already.
> > 
> > This proposal would get more complicated when you want sequences of strings
> > as well. The string type would only be able to apply to a single-level
> > sequence.
> > 
> > But a question that occurs to me is what percentage of sequences in any
> > given
> > Euphoria application represent text strings?
> > 
> > I think it really only matters for efficiency when working with large
> > amounts
> > of text data. Because sequence elements are 4-bytes each.
> > 
> > So, basically, I retract my proposal. smile
> > 
> > But it was helpful for reminding me of what features I want to see in an
> > Euphoria
> > Standard Library string module.
> > 
> > > ~Ryan W. Johnson
> > > 
> > > Fluid Application Environment
> > > <a href="http://www.fluidae.com/">http://www.fluidae.com/</a>
> > > 
> > > [cool quote here, if i ever think of one...]
> > 
> > 
> > --
> > "Any programming problem can be solved by adding a level of indirection."
> > --anonymous
> > "Any performance problem can be solved by removing a level of indirection."
> > --M. Haertel
> > "Premature optimization is the root of all evil in programming."
> > --C.A.R. Hoare
> > j.
> 
> 
> Hi again,
> 
> 
> You're right in that the main advantage to having a string type
> would be mostly in making a text editor, where there is so much
> text the memory savings would be great, but yes, quite a few
> apps wont benefit much with *that* kind of definition of 'string'.
> There is, however, another definition of 'string' where it's actually
> a memory element:
> 
> string s
> s="My Window"
> 
> where internally s is a pointer to the string, so it can be passed
> to a C function like:
> 
> x=CreateWindow(s,...)
> 
> without having the bother of s=allocate_string("My Window").

Well, using allocate_string doesn't seem like *too* much work to me. But you
lose a lot of the dynamics of sequences with manual allocation.

> 
> This would mean other function such as 'printf(..)' would
> have to also support this new kind of data type:
> 
> printf(1,"%s\n",{s})
> 
> where Euphoria would recognize 's' as a memory string object and
> make the necessary call to print that type of object rather than
> say a sequence string.
 
Well, you could always use C for stuff like that... But routines that convert
between static strings in memory and sequences would be useful. So if 's' was a
pointer to a string then you could do:

printf(1, "%s\n", stringz(s))

> 
> Al
> 
> 
> My bumper sticker: "I brake for LED's"

This is just a mental exercise, but something else occurs to me. Euphoria uses
bit-flags to determine the type of data that it is working with -- a pointer to a
sequence or a double, or a 31-bit integer. See
http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=A&fromYear=A&toMonth=A&toYear=A&postedBy=Robert+Craig&keywords=%22bit+fiddling%22

Euphoria *could* use bit flags to say "pointer to string".
 
--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Strings

All of my libraries already support memory based 
"C" type string handling which are written in
eurphoria assembler.

Bernie

My files in archive:
WMOTOR, XMOTOR, W32ENGIN, MIXEDLIB, EU_ENGIN, WIN32ERU, WIN32API 

Can be downloaded here:
http://www.rapideuphoria.com/cgi-bin/asearch.exu?dos=on&win=on&lnx=on&gen=on&keywords=bernie+ryan

new topic     » goto parent     » topic index » view message » categorize

9. Re: Strings

Jason Gade wrote:
> This is just a mental exercise, but something else occurs to me. Euphoria uses
> bit-flags to determine the type of data that it is working with -- a pointer
> to a sequence or a double, or a 31-bit integer. See
> <a
> href="http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=A&fromYear=A&toMonth=A&toYear=A&postedBy=Robert+Craig&keywords=%22bit+fiddling%22">http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=A&fromYear=A&toMonth=A&toYear=A&postedBy=Robert+Craig&keywords=%22bit+fiddling%22</a>
> 
> Euphoria *could* use bit flags to say "pointer to string".

Oh, and it looks like Euphoria *could* offer the programmer a way to check
whether a variable has been initialized.

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

10. Re: Strings

> Oh, and it looks like Euphoria *could* offer the programmer a way to chec=
k whether a variable has been initialized.

I proposed this a while ago. I believe object() should return true for
initialized, and false for unitinialized, like this:

sequence s

    if not object( s ) then
        -- not initialized!
        s = {}
    end if


~Greg

new topic     » goto parent     » topic index » view message » categorize

11. Strings

Kat wrote:
> 
> Eu doesn't have a string type, there is no string in Eu. They are sequences,
> where every 8bit character takes up 32bits. And you are adding a function that
> replaces 3 whole lines of code that do the same thing, a trivial thing, while
> important stuff doesn't get added.

This is something that seems like it should be trivial, but isn't, given the
design of euphoria.  Adding a new primitive would require changes all over
the place (hundreds of places, easily).  We'd probably have to redo some of
the test macros, and I suspect that we'd lose a lot of the current speed
of euphoria.

If someone can figure out an easy way to do this, I suspect it'd get into
the language pretty quickly.  Of course, that ignores the prospect of 
Unicode, which is a whole 'nother can of worms, and also something that
will require some drastic recoding in the back end, though due to the
way that sequences are implemented, probably none in the front end.

Matt

new topic     » goto parent     » topic index » view message » categorize

12. Re: Strings

Matt Lewis wrote:
> 
> If someone can figure out an easy way to do this, I suspect it'd get into
> the language pretty quickly.  Of course, that ignores the prospect of 
> Unicode, which is a whole 'nother can of worms, and also something that
> will require some drastic recoding in the back end, though due to the
> way that sequences are implemented, probably none in the front end.
> 
> Matt
It seems you can do Unicode well enough already:

sequence shawn
shawn = { 's', 'h', #0430, 'w','n' } -- a is cyrillic a.

You can use poke2 from words.e or poke2 from my own pokpeek2.e
and use with a unicode C routine.  All manipulation works
like any another sequence

Shawn Pringle

new topic     » goto parent     » topic index » view message » categorize

13. Re: Strings

Shawn Pringle wrote:
> 
> Matt Lewis wrote:
> > 
> > If someone can figure out an easy way to do this, I suspect it'd get into
> > the language pretty quickly.  Of course, that ignores the prospect of 
> > Unicode, which is a whole 'nother can of worms, and also something that
> > will require some drastic recoding in the back end, though due to the
> > way that sequences are implemented, probably none in the front end.
> > 
> > Matt
> It seems you can do Unicode well enough already:
> 
> sequence shawn
> shawn = { 's', 'h', #0430, 'w','n' } -- a is cyrillic a.

Yes, wxEuphoria does this currently.

> You can use poke2 from words.e or poke2 from my own pokpeek2.e
> and use with a unicode C routine.  All manipulation works
> like any another sequence

Or you can poke2 with the built-in in 4.0.  But then you'll also have to
wrap all of your I/O functions, too.  That's where all the work will be.

Matt

new topic     » goto parent     » topic index » view message » categorize

14. Strings

Yes I'm going to wade into this again.
1st) memory conservation is good especially when there is a large amount of
waste.
2nd) speed decrease is not nice, but acceptable

because most importantly
***Type checking reduces error's***

It also reduce's the need for manual error detection which can get quite omplex
and slow
and consuming with euphoria. The alternative is to let the routine die from an
obscure error
and let the finger be pointed at the routine it dies in rather than the one who
started the
problem.

I will again say string's are good.

Consider the standard indexing on a sequence, it must require quite complex
operation's.
Where as the indexing on a string would require a simple bounds check, offset
calculation and a
peek.

As intel chips align to 4 bytes. Unless the string has a length which is a
multiple of 4 there
is a gap of up to 3 bytes to append into reducing the time for that.

There are many other benefits but I'm going to diner.
-------------------------
Sincerely,
Mathew Hounsell

mat.hounsell at excite.com

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu