Re: Interesting Experiment With String/Sequence Slicing

new topic     » goto parent     » topic index » view thread      » older message » newer message

Sam Lie writes:
> the string that i am searching for
> are tag types eg.  <TD> ** NAME ** </TD>
> and i run into problem when </TD> or parts of
> it occur on the next line.  so i will still have to
> read all the text into one big sequence and then
> do the string slicing and chopping. 

The best (fastest) algorithm for this problem would be to read
a character at a time with getc(), copying input characters to
the output file with puts(), and whenever you see '<', check to see if 
it's followed by 'T' 'D' '>' .
When you find the start tag, set a variable to suspend
writing characters to the output file,
and start looking for '<' followed by '/' 'T' 'D' '>'.
(allow for blanks, tabs and new-lines where they are allowed).
This will take more code but it will be extremely fast.

What you are doing now with match() and slicing,
requires that you copy an average of 25,000 characters 
every time you match a start or end tag.

> i suppose i can
> very much live with slicing but i am just wondering
> could there be a function similar to slicing but only
> works on strings and is much faster by moving blocks
> of memory rather than copy one sequence element
> to another one at a time.  

No. Euphoria does not distinguish strings from
sequences of integers internally. As I mentioned,
I used to cleverly point to slices within a sequence
rather than copying them, but this led to inefficiencies
elsewhere, and only helps when you never modify the
sliced data or the original sequence,
so I dropped that idea a few years ago.

> however maybe thats not possible
> because the sequence is not represented as a set of
> continuous memory block.   

A sequence *is* represented as a contiguous block
of memory.

> today i also came across something very interestings.
>
> example 1
> for i=1 to 1000 do
>     sLine2 = sLine
>     sLine2 = sLine2[2..length(sLine2)]
> end for

The above would be lightning fast, except that
           sLine2 = sLine
creates 2 references to the same sequence, so I can't
simply adjust the internal pointers. I have to copy n-1
elements to a brand new sLine2 sequence (otherwise sLine
would be mistakenly altered).

> example 2
> for i=1 to 1000 do
>     sLine2 = sLine
>     sLine3 = sLine2[length(sLine2)-10..length(sLine2)]
> end for
> example 2 was faster than example 1 by a factor of 100

Of course.
You are only copying 11 elements, not thousands.

> so my thought was that example 1 was slower because it
> had to copy more items when doing the slicing.  but then again i 
> thought why is it copying items, as Derek had suggested 
> in the case of example 1 we could just blank out/remove  the first 
> sequence element and return the sLine2 as is.  

This optimization does happen when there is only one 
reference on a sequence. 

In your example 1 you get
      Sline2 = Sline
for free, but you have to pay when you try to modify the 
(shared) sequence.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu