Re: Interesting Experiment With String/Sequence Slicing

new topic     » goto parent     » topic index » view thread      » older message » newer message

Hi Robert,

thank you for your reply. it is very informative
and has helped me a lot in understanding more
in how euphoria works. i also looked at your
suggested solution and have been taking that approach
by iteratively going through the string or buffer read in
rather than the match and slice method.  however,
there is still a lot of places where slicing occurs
for various reasons and when running time profile
all the big clock ticks are happening where slicing
are occuring.  i was even more surprised when
i found large clock ticks in places where i am just trying to
reference to a slice of a sequence.  as you have pointed
out that you 'no' longer use direct reference to the
sequence when doing slicing but would it not be more
optimal to allow direct reference when there is
obvious situations when the sequence is only for viewing
only eg.
        if equal(sLine[i..i+5],"</TD>")
but i guess that would mean we will just end up in
messy pointers to this and that mess like they have in C.
i have never wrote an interpreter before and certainly
not any time in the future as my technical skills are very
inadequate but i always had keen interest
in how interpreters work in particularly euphoria
as it is very well designed althought it has been 
a surprise for me to discover the string slicing issue
which will become a slight concern when 
undertaking projects that may involve quite
a lot string slashing and cutting.

regards,
sam lie
Down Under, Australia



----- Original Message ----- 
From: "Robert Craig" <rds at RapidEuphoria.com>
To: "EUforum" <EUforum at topica.com>
Sent: Wednesday, August 22, 2001 2:01 PM
Subject: Re: Interesting Experiment With String/Sequence Slicing


> 
> Sam Lie writes:
> > the string that i am searching for
> > are tag types eg.  <TD> ** NAME ** </TD>
> > and i run into problem when </TD> or parts of
> > it occur on the next line.  so i will still have to
> > read all the text into one big sequence and then
> > do the string slicing and chopping. 
> 
> The best (fastest) algorithm for this problem would be to read
> a character at a time with getc(), copying input characters to
> the output file with puts(), and whenever you see '<', check to see if 
> it's followed by 'T' 'D' '>' .
> When you find the start tag, set a variable to suspend
> writing characters to the output file,
> and start looking for '<' followed by '/' 'T' 'D' '>'.
> (allow for blanks, tabs and new-lines where they are allowed).
> This will take more code but it will be extremely fast.
> 
> What you are doing now with match() and slicing,
> requires that you copy an average of 25,000 characters 
> every time you match a start or end tag.
> 
> > i suppose i can
> > very much live with slicing but i am just wondering
> > could there be a function similar to slicing but only
> > works on strings and is much faster by moving blocks
> > of memory rather than copy one sequence element
> > to another one at a time.  
> 
> No. Euphoria does not distinguish strings from
> sequences of integers internally. As I mentioned,
> I used to cleverly point to slices within a sequence
> rather than copying them, but this led to inefficiencies
> elsewhere, and only helps when you never modify the
> sliced data or the original sequence,
> so I dropped that idea a few years ago.
> 
> > however maybe thats not possible
> > because the sequence is not represented as a set of
> > continuous memory block.   
> 
> A sequence *is* represented as a contiguous block
> of memory.
> 
> > today i also came across something very interestings.
> >
> > example 1
> > for i=1 to 1000 do
> >     sLine2 = sLine
> >     sLine2 = sLine2[2..length(sLine2)]
> > end for
> 
> The above would be lightning fast, except that
>            sLine2 = sLine
> creates 2 references to the same sequence, so I can't
> simply adjust the internal pointers. I have to copy n-1
> elements to a brand new sLine2 sequence (otherwise sLine
> would be mistakenly altered).
> 
> > example 2
> > for i=1 to 1000 do
> >     sLine2 = sLine
> >     sLine3 = sLine2[length(sLine2)-10..length(sLine2)]
> > end for
> > example 2 was faster than example 1 by a factor of 100
> 
> Of course.
> You are only copying 11 elements, not thousands.
> 
> > so my thought was that example 1 was slower because it
> > had to copy more items when doing the slicing.  but then again i 
> > thought why is it copying items, as Derek had suggested 
> > in the case of example 1 we could just blank out/remove  the first 
> > sequence element and return the sLine2 as is.  
> 
> This optimization does happen when there is only one 
> reference on a sequence. 
> 
> In your example 1 you get
>       Sline2 = Sline
> for free, but you have to pay when you try to modify the 
> (shared) sequence.
> 
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
> 
> 
> 
> 
> 
> 



*********************************************************************************
This email and any files transmitted with it may be legally privileged 
and confidential.  If you are not the intended recipient of this email,
you must not disclose or use the information contained in it.  If you 
have received this email in error, please notify us by return email and 
permanently delete the document.
*********************************************************************************

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu