1. Interesting Experiment With String/Sequence Slicing

hi there all,

i  have also just became a registered user of euphoria too
and i must say this is the first time i have found something that
is heaps better than VisualBasic.  i am really interested in how
far one can push euphoria in terms of speed.  one thing
that i noticed was the time it took to slice strings
specially a very large string read from a text file say 5,000 lines,
it seems to be the bottle neck to the global search and replace
function (after originally suspecting it to be the match function
which was quite fine).   i have set up the test below and found
some interesting results :

        1. by allocating the string first then slice appeared to be faster
(anyone can explain?)
        2. measuring string allocation and string slicing on their own
        seems to be really really fast but when combined together
        it appeared very slow.  (which led myself to suspect that
       allocation and slicing only creates references in the intepreter
        and only start executing when the references are actually
        used as variables)

my conclusion from this test is that the either the slicing or the
allocation
method in the interpreter is 'iterative' as opposed to just cutting and
moving
large chunks of string around in the memory, which explains the poor
perfomance?
but there is a good case for iterative because it caters for sequences of
all data
types/structures  rather than just the string type and if this be the case
then
maybe it could provide a cause for us to petition Robert to create
allocation/slicing functions specifically just for the string type sequence
ie utilising the chunk memory cut and move method.  overall my conclusion
could be immoderatly incorrect but i would love hear comments from yous.
thanks

sam lie


----------------------------------------------------------------------
-- standard sequence slicing method

t = time()

for i=1 to 2000 do
 sLine2 = sLine[2..length(sLine)]
end for

t = time() - t
printf(1, "%6.2f seconds\n", t)

----------------------------------------------------------------------
-- faster slicing method by allocating sLine2=sLine  first

t = time()

for i=1 to 2000 do
 sLine2 = sLine
 sLine2 = sLine2[2..length(sLine2)]
end for

t = time() - t
printf(1, "%6.2f seconds\n", t)

----------------------------------------------------------------------
-- time to allocate string (really fast hardly measurable)

t = time()

for i=1 to 2000 do
 sLine2 = sLine
end for

t = time() - t
printf(1, "%6.2f seconds\n", t)

----------------------------------------------------------------------
-- time to slice string without allocating it to another variable (really
fast hardly measurable)

t = time()

for i=1 to 2000 do
 sLine = sLine[2..length(sLine)]
end for

t = time() - t
printf(1, "%6.2f seconds\n", t)




*********************************************************************************
This email and any files transmitted with it may be legally privileged 
and confidential.  If you are not the intended recipient of this email,
you must not disclose or use the information contained in it.  If you 
have received this email in error, please notify us by return email and 
permanently delete the document.
*********************************************************************************

new topic     » topic index » view message » categorize

2. Re: Interesting Experiment With String/Sequence Slicing

slie writes:
<several examples of slicing>

Without going through your examples in detail,
I'll just give you a few hints:

1. Whenever you see:

         x = y
    
     nothing happens except for a pointer being copied.
     and maybe a reference count being incremented.
     If y is a sequence, x and y will now point at a single
     copy of the same sequence.

2. When you see:

     y = x[a..b]

     Then the data from a to b is copied and a new sequence is
     created. (Back in v1.4 and earlier I used to just have y point at the data
     in x that it needed, but I later changed this).

3. When you see:

     x = x[a..b]

    and there are no other references to the sequence,
    then I simply adjust the pointers to the beginning and end of x,
    without copying anything. If I am left with a huge amount of
    unused space, then I might decide to free it up.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

3. Re: Interesting Experiment With String/Sequence Slicing

hi Robert,

sorry to keep bring this issue up as the program i am writting
seems to be severely affected by the slowness in the string slicing
function in euphoria.  the program i'm working on is basically doing a lot
of string
string and replace (for large text files) which relies much on the string
slicing.
i am just curious as to ways the program can be optimised and i have
run out of avenues except looking how i can get around the slicing issue.
in my last email i wrote :

>>my conclusion from this test is that the either the slicing or the
>>allocation method in the interpreter is 'iterative' as opposed to just
cutting and
>>moving large chunks of string around in the memory, which explains the
poor perfomance?

i am just wild guessing here, but could it be true that a function can be
specifically written to deal with string type sequences and so speeding up
the slicing function?

regards,
sam lie
Down Under, Australia



----- Original Message -----
From: "Robert Craig" <rds at RapidEuphoria.com>
To: "EUforum" <EUforum at topica.com>
Sent: Friday, August 17, 2001 2:23 AM
Subject: Re: Interesting Experiment With String/Sequence Slicing


>
> slie writes:
> <several examples of slicing>
>
> Without going through your examples in detail,
> I'll just give you a few hints:
>
> 1. Whenever you see:
>
>          x = y
>
>      nothing happens except for a pointer being copied.
>      and maybe a reference count being incremented.
>      If y is a sequence, x and y will now point at a single
>      copy of the same sequence.
>
> 2. When you see:
>
>      y = x[a..b]
>
>      Then the data from a to b is copied and a new sequence is
>      created. (Back in v1.4 and earlier I used to just have y point at the
data
>      in x that it needed, but I later changed this).
>
> 3. When you see:
>
>      x = x[a..b]
>
>     and there are no other references to the sequence,
>     then I simply adjust the pointers to the beginning and end of x,
>     without copying anything. If I am left with a huge amount of
>     unused space, then I might decide to free it up.
>
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
>
>
>
>
>
>



*********************************************************************************
This email and any files transmitted with it may be legally privileged 
and confidential.  If you are not the intended recipient of this email,
you must not disclose or use the information contained in it.  If you 
have received this email in error, please notify us by return email and 
permanently delete the document.
*********************************************************************************

new topic     » goto parent     » topic index » view message » categorize

4. Re: Interesting Experiment With String/Sequence Slicing

hi Mike

> (insertIndex() & removeIndex()). I found an up-to 4-fold or more 
> increase in speed. I don't have time to explain the method right now but 

i have looked at insertIndex() & removeIndex() but 
am still not sure how it works in my situation.  i am working
with just a simple sequence of chars (string) but i believe
 insertIndex() & removeIndex() applies more to lists/sequences
of objects.  but insertIndex() & removeIndex() seems to be very
interesting, maybe i missed something subtle there.

regards,
sam lie
Down Under, Australia




----- Original Message ----- 
From: "Mike" <vulcan at win.co.nz>
To: "EUforum" <EUforum at topica.com>
Sent: Tuesday, August 21, 2001 1:23 PM
Subject: RE: Interesting Experiment With String/Sequence Slicing


> 
> Hi Sam, I have found a way of increasing the poor performance of slicing 
> really large sequences and have used this in my windows editor. This 
> method is also in the file tk_misc.e included with win32lib.ew 
> (insertIndex() & removeIndex()). I found an up-to 4-fold or more 
> increase in speed. I don't have time to explain the method right now but 
> if you want more help AFTER studying these routines then you could write 
> to me direct. vulcan at win.co.nz
> 
> 
> slie at theage.fairfax.com.au wrote:
> > hi Robert,
> > 
> > sorry to keep bring this issue up as the program i am writting
> > seems to be severely affected by the slowness in the string slicing
> > function in euphoria.  the program i'm working on is basically doing a 
> > lot
> > of string
> > string and replace (for large text files) which relies much on the 
> > string
> > slicing.
> > i am just curious as to ways the program can be optimised and i have
> > run out of avenues except looking how i can get around the slicing 
> > issue.
> 
> 
> 
> 
> 
> 



*********************************************************************************
This email and any files transmitted with it may be legally privileged 
and confidential.  If you are not the intended recipient of this email,
you must not disclose or use the information contained in it.  If you 
have received this email in error, please notify us by return email and 
permanently delete the document.
*********************************************************************************

new topic     » goto parent     » topic index » view message » categorize

5. Re: Interesting Experiment With String/Sequence Slicing

> From: slie at theage.fairfax.com.au
> To: EUforum <EUforum at topica.com>
> Reply-To: EUforum at topica.com
> Subject: Re: Interesting Experiment With 
> String/Sequence Slicing
> Date: 21/08/2001 12:58:45 PM
> 
> 
> i am just wild guessing here, but could it 
> be true that a function can be
> specifically written to deal with string 
> type sequences and so speeding up
> the slicing function?
> 

Hi Sam,
we have discovered that the fastest method of inserting, deleting, and
replacing text in sequences is to minimize the number and size of
concatenations.

Consider the general case where you have a text of <aaa><bbb><ccc> and a
replacement text of <ddd> which is going to replace <bbb>.

If the length of <ddd> is longer than <bbb> then create a new sequence (
using the repeat() function) with the length of <aaa> + <ddd> + <ccc> then
copy these sub-strings into the new sequence and return the new sequence. 

Where <ddd> is shorter than <bbb>, you can avoid the overhead of creating a
new sequence and reuse the space in the exiting string. The trick here is to
move either <aaa> closer to <ccc> or <ccc> closer to <aaa>, depending on
which is smaller, and leaving enough room for <ddd> to slide in between
them. Then you just return the truncated sequence. If you didn't move <aaa>
then that is S[1 .. length(<aaa>)+length(<ddd>)+length(<ccc>)], but if you
did move <aaa> it is S[length(S) -
(length(<aaa>)+length(<ddd>)+length(<ccc>)) .. length(S)].

There are further optimisations if <aaa>, <ccc>, or <ddd> are empty
sequences.

I know this looks complex but it is almost always faster than creating
temporary sequences and concatenating them, depending on the length of the
sequences involved.
----
Derek


confidential information intended solely for the use of the individual or
entity to whom they are addressed. If you are not the intended recipient of
this message you are hereby notified that any use, dissemination,
distribution or reproduction of this message is prohibited. If you have
received this message in error please notify the sender immediately. Any
views expressed in this message are those of the individual sender and may
not necessarily reflect the views of Global Technology Australasia Limited.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Interesting Experiment With String/Sequence Slicing

Sam Lie writes:
> i am just curious as to ways the program can be 
> optimised and i have run out of avenues except 
> looking how i can get around the slicing issue.

Since you are a registered user, you can say:
     "with profile"
at the top of your main file, to get a count of 
statements executed, or even better with ex.exe:
     "with profile_time"
to get the percentage of time consumed by each statement.

These reports go to a file called "ex.pro".
It will show you the statements that are consuming 
most of the time. If you post the relevant chunk(s)
of code to this list, someone, maybe me, can 
probably suggest a faster way to do the same thing.

The first time you profile a program you often
discover something surprising.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

7. Re: Interesting Experiment With String/Sequence Slicing

Hi Robert,

i have ran the profiler and the bottle neck still
shows up when doing string slicing.  in the
example below sLine contains a text string (50k)
read from a html file, the example below took
about 4 seconds to execute, i just somehow believe
4 seconds is way too long just to do string slicing...
is there a remedy?

regards,
sam

       |for i=1 to 1000 do
 62.67 | sLine2 = sLine[2..length(sLine)]
       |end for
      |
       |for i=1 to 1000 do
  8.38 | sLine2 = sLine
 28.34 | sLine2 = sLine2[2..length(sLine2)]
       |end for
       |


ps "thanks" to Derek for the work around suggestion
i will definitely be implement his technique when doing 
concatenation and slicing.



----- Original Message ----- 
From: "Robert Craig" <rds at RapidEuphoria.com>
To: "EUforum" <EUforum at topica.com>
Sent: Tuesday, August 21, 2001 2:10 PM
Subject: Re: Interesting Experiment With String/Sequence Slicing


> 
> Sam Lie writes:
> > i am just curious as to ways the program can be 
> > optimised and i have run out of avenues except 
> > looking how i can get around the slicing issue.
> 
> Since you are a registered user, you can say:
>      "with profile"
> at the top of your main file, to get a count of 
> statements executed, or even better with ex.exe:
>      "with profile_time"
> to get the percentage of time consumed by each statement.
> 
> These reports go to a file called "ex.pro".
> It will show you the statements that are consuming 
> most of the time. If you post the relevant chunk(s)
> of code to this list, someone, maybe me, can 
> probably suggest a faster way to do the same thing.
> 
> The first time you profile a program you often
> discover something surprising.
> 
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
> 
> 
> 
> 
> 
> 



*********************************************************************************
This email and any files transmitted with it may be legally privileged 
and confidential.  If you are not the intended recipient of this email,
you must not disclose or use the information contained in it.  If you 
have received this email in error, please notify us by return email and 
permanently delete the document.
*********************************************************************************

new topic     » goto parent     » topic index » view message » categorize

8. Re: Interesting Experiment With String/Sequence Slicing

On 21 Aug 2001, at 14:06, Derek Parnell wrote:

> 
> > From: slie at theage.fairfax.com.au
> > To: EUforum <EUforum at topica.com>
> > Reply-To: EUforum at topica.com
> > Subject: Re: Interesting Experiment With 
> > String/Sequence Slicing
> > Date: 21/08/2001 12:58:45 PM
> > 
> > 
> > i am just wild guessing here, but could it 
> > be true that a function can be
> > specifically written to deal with string 
> > type sequences and so speeding up
> > the slicing function?
> > 
> 
> Hi Sam,
> we have discovered that the fastest method of inserting, deleting, and
> replacing text in sequences is to minimize the number and size of
> concatenations.
> 
> Consider the general case where you have a text of <aaa><bbb><ccc> and a
> replacement text of <ddd> which is going to replace <bbb>.
> 
> If the length of <ddd> is longer than <bbb> then create a new sequence (
> using the repeat() function) with the length of <aaa> + <ddd> + <ccc> then
> copy these sub-strings into the new sequence and return the new sequence. 
> 
> Where <ddd> is shorter than <bbb>, you can avoid the overhead of creating a
> new
> sequence and reuse the space in the exiting string. The trick here is to move
> either <aaa> closer to <ccc> or <ccc> closer to <aaa>, depending on which is
> smaller, and leaving enough room for <ddd> to slide in between them. Then you
> just return the truncated sequence. If you didn't move <aaa> then that is S[1
> ..
> length(<aaa>)+length(<ddd>)+length(<ccc>)], but if you did move <aaa> it is
> S[length(S) - (length(<aaa>)+length(<ddd>)+length(<ccc>)) .. length(S)].
> 
> There are further optimisations if <aaa>, <ccc>, or <ddd> are empty
> sequences.
> 
> I know this looks complex but it is almost always faster than creating
> temporary sequences and concatenating them, depending on the length of the
> sequences involved.

This is something the interpreter should be doing.

Kat

new topic     » goto parent     » topic index » view message » categorize

9. Re: Interesting Experiment With String/Sequence Slicing

Sam Lie writes:
> i have ran the profiler and the bottle neck still
> shows up when doing string slicing. 

I gather from your earlier private e-mail that you are reading huge
strings from a file and running match() on them.
Or perhaps the string *is* the whole file?

Since match() starts at the beginning of a sequence,
and stops when it encounters a match, I gather
you are slicing the 50K strings to look for further matches.

Try reading and searching one line at a time using gets().
You won't be copying a huge string when you have a match,
and you'll make better use of the Pentium on-chip cache,
e.g.
      object line
      integer fn

      fn = open("myfile.html", "r")
      while 1 do
            line = gets(fn)
            if atom(line) then
                exit
            end if
            if match("foobar", line) then
                .....
            end if
      end while
      close(fn)

After doing the above, if you are still convinced that 
slicing is the culprit, you could write your own match() 
that locates multiple occurrences without slicing. 
(There may be an example in the mailing list archives.)

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

10. Re: Interesting Experiment With String/Sequence Slicing

Robert,
is the *ANY* chance (pretty-lease) that you might create an in-built version
of match and find that allows a starting offset index, similar to BASIC's
instr() function. Maybe somthing like ...

  findfrom(offset, look_for, look_in)
  matchfrom(offset, look_for, look_in)

I find myself frequently cutting and dicing slices, or doing ssllooww
searches because this functionality is missing. It would be so much faster
if these were in-built rather than Euphoria-written functions.

----- Original Message -----
From: "Robert Craig" <rds at RapidEuphoria.com>
To: "EUforum" <EUforum at topica.com>
Sent: Wednesday, August 22, 2001 2:00 AM
Subject: Re: Interesting Experiment With String/Sequence Slicing


> >
> Sam Lie writes:
> > i have ran the profiler and the bottle neck still
> > shows up when doing string slicing.
>
> I gather from your earlier private e-mail that you are reading huge
> strings from a file and running match() on them.
> Or perhaps the string *is* the whole file?
>
> Since match() starts at the beginning of a sequence,
> and stops when it encounters a match, I gather
> you are slicing the 50K strings to look for further matches.
>
> Try reading and searching one line at a time using gets().
> You won't be copying a huge string when you have a match,
> and you'll make better use of the Pentium on-chip cache,
> e.g.
>       object line
>       integer fn
>
>       fn = open("myfile.html", "r")
>       while 1 do
>             line = gets(fn)
>             if atom(line) then
>                 exit
>             end if
>             if match("foobar", line) then
>                 .....
>             end if
>       end while
>       close(fn)
>
> After doing the above, if you are still convinced that
> slicing is the culprit, you could write your own match()
> that locates multiple occurrences without slicing.
> (There may be an example in the mailing list archives.)
>
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
>
> >
> >
>
>

new topic     » goto parent     » topic index » view message » categorize

11. Re: Interesting Experiment With String/Sequence Slicing

I agree with this. I've had to write my own in EU... and I'm sure it is slow
compared to a built in function.

...george

----- Original Message -----
From: "Derek Parnell" <ddparnell at bigpond.com>
To: "EUforum" <EUforum at topica.com>
Subject: Re: Interesting Experiment With String/Sequence Slicing


>
> Robert,
> is the *ANY* chance (pretty-lease) that you might create an in-built
version
> of match and find that allows a starting offset index, similar to BASIC's
> instr() function. Maybe somthing like ...
>
>   findfrom(offset, look_for, look_in)
>   matchfrom(offset, look_for, look_in)
>
> I find myself frequently cutting and dicing slices, or doing ssllooww
> searches because this functionality is missing. It would be so much faster
> if these were in-built rather than Euphoria-written functions.
>
> ----- Original Message -----
> From: "Robert Craig" <rds at RapidEuphoria.com>
> To: "EUforum" <EUforum at topica.com>
> Sent: Wednesday, August 22, 2001 2:00 AM
> Subject: Re: Interesting Experiment With String/Sequence Slicing
>
>
> > >
> > Sam Lie writes:
> > > i have ran the profiler and the bottle neck still
> > > shows up when doing string slicing.
> >
> > I gather from your earlier private e-mail that you are reading huge
> > strings from a file and running match() on them.
> > Or perhaps the string *is* the whole file?
> >
> > Since match() starts at the beginning of a sequence,
> > and stops when it encounters a match, I gather
> > you are slicing the 50K strings to look for further matches.
> >
> > Try reading and searching one line at a time using gets().
> > You won't be copying a huge string when you have a match,
> > and you'll make better use of the Pentium on-chip cache,
> > e.g.
> >       object line
> >       integer fn
> >
> >       fn = open("myfile.html", "r")
> >       while 1 do
> >             line = gets(fn)
> >             if atom(line) then
> >                 exit
> >             end if
> >             if match("foobar", line) then
> >                 .....
> >             end if
> >       end while
> >       close(fn)
> >
> > After doing the above, if you are still convinced that
> > slicing is the culprit, you could write your own match()
> > that locates multiple occurrences without slicing.
> > (There may be an example in the mailing list archives.)
> >
> > Regards,
> >    Rob Craig
> >    Rapid Deployment Software
> >    http://www.RapidEuphoria.com
> >
> > >
> > >
> >
> >
>
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

12. Re: Interesting Experiment With String/Sequence Slicing

On 22 Aug 2001, at 6:41, Derek Parnell wrote:

> > 
> Robert,
> is the *ANY* chance (pretty-lease) that you might create an in-built version
> of
> match and find that allows a starting offset index, similar to BASIC's instr()
> function. Maybe somthing like ...
> 
>   findfrom(offset, look_for, look_in)
>   matchfrom(offset, look_for, look_in)

I agree.

Kat

new topic     » goto parent     » topic index » view message » categorize

13. Re: Interesting Experiment With String/Sequence Slicing

hi Robert,

i have tried as you have suggested but have came
across more problems that i havent anticipated
because the string that i am searching for
are tag types eg.  <TD> ** NAME ** </TD>
and i run into problem when </TD> or parts of
it occur on the next line.  so i will still have to
read all the text into one big sequence and then
do the string slicing and chopping. i suppose i can
very much live with slicing but i am just wondering
could there be a function similar to slicing but only
works on strings and is much faster by moving blocks
of memory rather than copy one sequence element
to another one at a time.  however maybe thats not possible
because the sequence is not represented as a set of
continuous memory block.   

today i also came across something very interestings.

example 1
for i=1 to 1000 do
     sLine2 = sLine
     sLine2 = sLine2[2..length(sLine2)]
end for

example 2
for i=1 to 1000 do
     sLine2 = sLine
     sLine3 = sLine2[length(sLine2)-10..length(sLine2)]
end for

example 2 was faster than example 1 by a factor of 100
so my thought was that example 1 was slower because it
had to copy more items when doing the slicing.  but then again i 
thought why is it copying items, as Derek had suggested 
in the case of example 1 we could just blank out/remove  the first 
sequence element and return the sLine2 as is.  If this case is true
then the term 'slicing' could not be taken literally because it is
more like a copy operation.  flame me if i am wrong.

regards,
sam lie
Down Under, Australia



t = time()
for i=1 to 1000 do
 sLine2 = sLine
 sLine3 = sLine2[length(sLine2)-10..length(sLine2)]


----- Original Message ----- 
From: "Robert Craig" <rds at RapidEuphoria.com>
To: "EUforum" <EUforum at topica.com>
Sent: Wednesday, August 22, 2001 2:00 AM
Subject: Re: Interesting Experiment With String/Sequence Slicing


> 
> Sam Lie writes:
> > i have ran the profiler and the bottle neck still
> > shows up when doing string slicing. 
> 
> I gather from your earlier private e-mail that you are reading huge
> strings from a file and running match() on them.
> Or perhaps the string *is* the whole file?
> 
> Since match() starts at the beginning of a sequence,
> and stops when it encounters a match, I gather
> you are slicing the 50K strings to look for further matches.
> 
> Try reading and searching one line at a time using gets().
> You won't be copying a huge string when you have a match,
> and you'll make better use of the Pentium on-chip cache,
> e.g.
>       object line
>       integer fn
> 
>       fn = open("myfile.html", "r")
>       while 1 do
>             line = gets(fn)
>             if atom(line) then
>                 exit
>             end if
>             if match("foobar", line) then
>                 .....
>             end if
>       end while
>       close(fn)
> 
> After doing the above, if you are still convinced that 
> slicing is the culprit, you could write your own match() 
> that locates multiple occurrences without slicing. 
> (There may be an example in the mailing list archives.)
> 
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
> 
> 
> 
> 
> 
> 



*********************************************************************************
This email and any files transmitted with it may be legally privileged 
and confidential.  If you are not the intended recipient of this email,
you must not disclose or use the information contained in it.  If you 
have received this email in error, please notify us by return email and 
permanently delete the document.
*********************************************************************************

new topic     » goto parent     » topic index » view message » categorize

14. Re: Interesting Experiment With String/Sequence Slicing

Derek Parnell writes:
> is the *ANY* chance (pretty-lease) that you might create 
> an in-built version of match and find that allows a starting 
> offset index, similar to BASIC's instr() function. 
> Maybe somthing like ...
>
>  findfrom(offset, look_for, look_in)
>  matchfrom(offset, look_for, look_in)

Thanks for the suggestion.
I'll add it to my list. 
I'll look into it after 2.3.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

15. Re: Interesting Experiment With String/Sequence Slicing

Sam Lie writes:
> the string that i am searching for
> are tag types eg.  <TD> ** NAME ** </TD>
> and i run into problem when </TD> or parts of
> it occur on the next line.  so i will still have to
> read all the text into one big sequence and then
> do the string slicing and chopping. 

The best (fastest) algorithm for this problem would be to read
a character at a time with getc(), copying input characters to
the output file with puts(), and whenever you see '<', check to see if 
it's followed by 'T' 'D' '>' .
When you find the start tag, set a variable to suspend
writing characters to the output file,
and start looking for '<' followed by '/' 'T' 'D' '>'.
(allow for blanks, tabs and new-lines where they are allowed).
This will take more code but it will be extremely fast.

What you are doing now with match() and slicing,
requires that you copy an average of 25,000 characters 
every time you match a start or end tag.

> i suppose i can
> very much live with slicing but i am just wondering
> could there be a function similar to slicing but only
> works on strings and is much faster by moving blocks
> of memory rather than copy one sequence element
> to another one at a time.  

No. Euphoria does not distinguish strings from
sequences of integers internally. As I mentioned,
I used to cleverly point to slices within a sequence
rather than copying them, but this led to inefficiencies
elsewhere, and only helps when you never modify the
sliced data or the original sequence,
so I dropped that idea a few years ago.

> however maybe thats not possible
> because the sequence is not represented as a set of
> continuous memory block.   

A sequence *is* represented as a contiguous block
of memory.

> today i also came across something very interestings.
>
> example 1
> for i=1 to 1000 do
>     sLine2 = sLine
>     sLine2 = sLine2[2..length(sLine2)]
> end for

The above would be lightning fast, except that
           sLine2 = sLine
creates 2 references to the same sequence, so I can't
simply adjust the internal pointers. I have to copy n-1
elements to a brand new sLine2 sequence (otherwise sLine
would be mistakenly altered).

> example 2
> for i=1 to 1000 do
>     sLine2 = sLine
>     sLine3 = sLine2[length(sLine2)-10..length(sLine2)]
> end for
> example 2 was faster than example 1 by a factor of 100

Of course.
You are only copying 11 elements, not thousands.

> so my thought was that example 1 was slower because it
> had to copy more items when doing the slicing.  but then again i 
> thought why is it copying items, as Derek had suggested 
> in the case of example 1 we could just blank out/remove  the first 
> sequence element and return the sLine2 as is.  

This optimization does happen when there is only one 
reference on a sequence. 

In your example 1 you get
      Sline2 = Sline
for free, but you have to pay when you try to modify the 
(shared) sequence.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

16. Re: Interesting Experiment With String/Sequence Slicing

Hi Robert,

thank you for your reply. it is very informative
and has helped me a lot in understanding more
in how euphoria works. i also looked at your
suggested solution and have been taking that approach
by iteratively going through the string or buffer read in
rather than the match and slice method.  however,
there is still a lot of places where slicing occurs
for various reasons and when running time profile
all the big clock ticks are happening where slicing
are occuring.  i was even more surprised when
i found large clock ticks in places where i am just trying to
reference to a slice of a sequence.  as you have pointed
out that you 'no' longer use direct reference to the
sequence when doing slicing but would it not be more
optimal to allow direct reference when there is
obvious situations when the sequence is only for viewing
only eg.
        if equal(sLine[i..i+5],"</TD>")
but i guess that would mean we will just end up in
messy pointers to this and that mess like they have in C.
i have never wrote an interpreter before and certainly
not any time in the future as my technical skills are very
inadequate but i always had keen interest
in how interpreters work in particularly euphoria
as it is very well designed althought it has been 
a surprise for me to discover the string slicing issue
which will become a slight concern when 
undertaking projects that may involve quite
a lot string slashing and cutting.

regards,
sam lie
Down Under, Australia



----- Original Message ----- 
From: "Robert Craig" <rds at RapidEuphoria.com>
To: "EUforum" <EUforum at topica.com>
Sent: Wednesday, August 22, 2001 2:01 PM
Subject: Re: Interesting Experiment With String/Sequence Slicing


> 
> Sam Lie writes:
> > the string that i am searching for
> > are tag types eg.  <TD> ** NAME ** </TD>
> > and i run into problem when </TD> or parts of
> > it occur on the next line.  so i will still have to
> > read all the text into one big sequence and then
> > do the string slicing and chopping. 
> 
> The best (fastest) algorithm for this problem would be to read
> a character at a time with getc(), copying input characters to
> the output file with puts(), and whenever you see '<', check to see if 
> it's followed by 'T' 'D' '>' .
> When you find the start tag, set a variable to suspend
> writing characters to the output file,
> and start looking for '<' followed by '/' 'T' 'D' '>'.
> (allow for blanks, tabs and new-lines where they are allowed).
> This will take more code but it will be extremely fast.
> 
> What you are doing now with match() and slicing,
> requires that you copy an average of 25,000 characters 
> every time you match a start or end tag.
> 
> > i suppose i can
> > very much live with slicing but i am just wondering
> > could there be a function similar to slicing but only
> > works on strings and is much faster by moving blocks
> > of memory rather than copy one sequence element
> > to another one at a time.  
> 
> No. Euphoria does not distinguish strings from
> sequences of integers internally. As I mentioned,
> I used to cleverly point to slices within a sequence
> rather than copying them, but this led to inefficiencies
> elsewhere, and only helps when you never modify the
> sliced data or the original sequence,
> so I dropped that idea a few years ago.
> 
> > however maybe thats not possible
> > because the sequence is not represented as a set of
> > continuous memory block.   
> 
> A sequence *is* represented as a contiguous block
> of memory.
> 
> > today i also came across something very interestings.
> >
> > example 1
> > for i=1 to 1000 do
> >     sLine2 = sLine
> >     sLine2 = sLine2[2..length(sLine2)]
> > end for
> 
> The above would be lightning fast, except that
>            sLine2 = sLine
> creates 2 references to the same sequence, so I can't
> simply adjust the internal pointers. I have to copy n-1
> elements to a brand new sLine2 sequence (otherwise sLine
> would be mistakenly altered).
> 
> > example 2
> > for i=1 to 1000 do
> >     sLine2 = sLine
> >     sLine3 = sLine2[length(sLine2)-10..length(sLine2)]
> > end for
> > example 2 was faster than example 1 by a factor of 100
> 
> Of course.
> You are only copying 11 elements, not thousands.
> 
> > so my thought was that example 1 was slower because it
> > had to copy more items when doing the slicing.  but then again i 
> > thought why is it copying items, as Derek had suggested 
> > in the case of example 1 we could just blank out/remove  the first 
> > sequence element and return the sLine2 as is.  
> 
> This optimization does happen when there is only one 
> reference on a sequence. 
> 
> In your example 1 you get
>       Sline2 = Sline
> for free, but you have to pay when you try to modify the 
> (shared) sequence.
> 
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
> 
> 
> 
> 
> 
> 



*********************************************************************************
This email and any files transmitted with it may be legally privileged 
and confidential.  If you are not the intended recipient of this email,
you must not disclose or use the information contained in it.  If you 
have received this email in error, please notify us by return email and 
permanently delete the document.
*********************************************************************************

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu