Re: Interesting Experiment With String/Sequence Slicing
- Posted by Robert Craig <rds at RapidEuphoria.com> Aug 21, 2001
- 461 views
Sam Lie writes: > the string that i am searching for > are tag types eg. <TD> ** NAME ** </TD> > and i run into problem when </TD> or parts of > it occur on the next line. so i will still have to > read all the text into one big sequence and then > do the string slicing and chopping. The best (fastest) algorithm for this problem would be to read a character at a time with getc(), copying input characters to the output file with puts(), and whenever you see '<', check to see if it's followed by 'T' 'D' '>' . When you find the start tag, set a variable to suspend writing characters to the output file, and start looking for '<' followed by '/' 'T' 'D' '>'. (allow for blanks, tabs and new-lines where they are allowed). This will take more code but it will be extremely fast. What you are doing now with match() and slicing, requires that you copy an average of 25,000 characters every time you match a start or end tag. > i suppose i can > very much live with slicing but i am just wondering > could there be a function similar to slicing but only > works on strings and is much faster by moving blocks > of memory rather than copy one sequence element > to another one at a time. No. Euphoria does not distinguish strings from sequences of integers internally. As I mentioned, I used to cleverly point to slices within a sequence rather than copying them, but this led to inefficiencies elsewhere, and only helps when you never modify the sliced data or the original sequence, so I dropped that idea a few years ago. > however maybe thats not possible > because the sequence is not represented as a set of > continuous memory block. A sequence *is* represented as a contiguous block of memory. > today i also came across something very interestings. > > example 1 > for i=1 to 1000 do > sLine2 = sLine > sLine2 = sLine2[2..length(sLine2)] > end for The above would be lightning fast, except that sLine2 = sLine creates 2 references to the same sequence, so I can't simply adjust the internal pointers. I have to copy n-1 elements to a brand new sLine2 sequence (otherwise sLine would be mistakenly altered). > example 2 > for i=1 to 1000 do > sLine2 = sLine > sLine3 = sLine2[length(sLine2)-10..length(sLine2)] > end for > example 2 was faster than example 1 by a factor of 100 Of course. You are only copying 11 elements, not thousands. > so my thought was that example 1 was slower because it > had to copy more items when doing the slicing. but then again i > thought why is it copying items, as Derek had suggested > in the case of example 1 we could just blank out/remove the first > sequence element and return the sLine2 as is. This optimization does happen when there is only one reference on a sequence. In your example 1 you get Sline2 = Sline for free, but you have to pay when you try to modify the (shared) sequence. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com