Re: Interesting Experiment With String/Sequence Slicing
- Posted by slie at theage.fairfax.com.au Aug 21, 2001
- 436 views
Hi Robert, thank you for your reply. it is very informative and has helped me a lot in understanding more in how euphoria works. i also looked at your suggested solution and have been taking that approach by iteratively going through the string or buffer read in rather than the match and slice method. however, there is still a lot of places where slicing occurs for various reasons and when running time profile all the big clock ticks are happening where slicing are occuring. i was even more surprised when i found large clock ticks in places where i am just trying to reference to a slice of a sequence. as you have pointed out that you 'no' longer use direct reference to the sequence when doing slicing but would it not be more optimal to allow direct reference when there is obvious situations when the sequence is only for viewing only eg. if equal(sLine[i..i+5],"</TD>") but i guess that would mean we will just end up in messy pointers to this and that mess like they have in C. i have never wrote an interpreter before and certainly not any time in the future as my technical skills are very inadequate but i always had keen interest in how interpreters work in particularly euphoria as it is very well designed althought it has been a surprise for me to discover the string slicing issue which will become a slight concern when undertaking projects that may involve quite a lot string slashing and cutting. regards, sam lie Down Under, Australia ----- Original Message ----- From: "Robert Craig" <rds at RapidEuphoria.com> To: "EUforum" <EUforum at topica.com> Sent: Wednesday, August 22, 2001 2:01 PM Subject: Re: Interesting Experiment With String/Sequence Slicing > > Sam Lie writes: > > the string that i am searching for > > are tag types eg. <TD> ** NAME ** </TD> > > and i run into problem when </TD> or parts of > > it occur on the next line. so i will still have to > > read all the text into one big sequence and then > > do the string slicing and chopping. > > The best (fastest) algorithm for this problem would be to read > a character at a time with getc(), copying input characters to > the output file with puts(), and whenever you see '<', check to see if > it's followed by 'T' 'D' '>' . > When you find the start tag, set a variable to suspend > writing characters to the output file, > and start looking for '<' followed by '/' 'T' 'D' '>'. > (allow for blanks, tabs and new-lines where they are allowed). > This will take more code but it will be extremely fast. > > What you are doing now with match() and slicing, > requires that you copy an average of 25,000 characters > every time you match a start or end tag. > > > i suppose i can > > very much live with slicing but i am just wondering > > could there be a function similar to slicing but only > > works on strings and is much faster by moving blocks > > of memory rather than copy one sequence element > > to another one at a time. > > No. Euphoria does not distinguish strings from > sequences of integers internally. As I mentioned, > I used to cleverly point to slices within a sequence > rather than copying them, but this led to inefficiencies > elsewhere, and only helps when you never modify the > sliced data or the original sequence, > so I dropped that idea a few years ago. > > > however maybe thats not possible > > because the sequence is not represented as a set of > > continuous memory block. > > A sequence *is* represented as a contiguous block > of memory. > > > today i also came across something very interestings. > > > > example 1 > > for i=1 to 1000 do > > sLine2 = sLine > > sLine2 = sLine2[2..length(sLine2)] > > end for > > The above would be lightning fast, except that > sLine2 = sLine > creates 2 references to the same sequence, so I can't > simply adjust the internal pointers. I have to copy n-1 > elements to a brand new sLine2 sequence (otherwise sLine > would be mistakenly altered). > > > example 2 > > for i=1 to 1000 do > > sLine2 = sLine > > sLine3 = sLine2[length(sLine2)-10..length(sLine2)] > > end for > > example 2 was faster than example 1 by a factor of 100 > > Of course. > You are only copying 11 elements, not thousands. > > > so my thought was that example 1 was slower because it > > had to copy more items when doing the slicing. but then again i > > thought why is it copying items, as Derek had suggested > > in the case of example 1 we could just blank out/remove the first > > sequence element and return the sLine2 as is. > > This optimization does happen when there is only one > reference on a sequence. > > In your example 1 you get > Sline2 = Sline > for free, but you have to pay when you try to modify the > (shared) sequence. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > > > > ********************************************************************************* This email and any files transmitted with it may be legally privileged and confidential. If you are not the intended recipient of this email, you must not disclose or use the information contained in it. If you have received this email in error, please notify us by return email and permanently delete the document. *********************************************************************************