1. Interesting Experiment With String/Sequence Slicing
- Posted by slie at theage.fairfax.com.au Aug 15, 2001
- 472 views
hi there all, i have also just became a registered user of euphoria too and i must say this is the first time i have found something that is heaps better than VisualBasic. i am really interested in how far one can push euphoria in terms of speed. one thing that i noticed was the time it took to slice strings specially a very large string read from a text file say 5,000 lines, it seems to be the bottle neck to the global search and replace function (after originally suspecting it to be the match function which was quite fine). i have set up the test below and found some interesting results : 1. by allocating the string first then slice appeared to be faster (anyone can explain?) 2. measuring string allocation and string slicing on their own seems to be really really fast but when combined together it appeared very slow. (which led myself to suspect that allocation and slicing only creates references in the intepreter and only start executing when the references are actually used as variables) my conclusion from this test is that the either the slicing or the allocation method in the interpreter is 'iterative' as opposed to just cutting and moving large chunks of string around in the memory, which explains the poor perfomance? but there is a good case for iterative because it caters for sequences of all data types/structures rather than just the string type and if this be the case then maybe it could provide a cause for us to petition Robert to create allocation/slicing functions specifically just for the string type sequence ie utilising the chunk memory cut and move method. overall my conclusion could be immoderatly incorrect but i would love hear comments from yous. thanks sam lie ---------------------------------------------------------------------- -- standard sequence slicing method t = time() for i=1 to 2000 do sLine2 = sLine[2..length(sLine)] end for t = time() - t printf(1, "%6.2f seconds\n", t) ---------------------------------------------------------------------- -- faster slicing method by allocating sLine2=sLine first t = time() for i=1 to 2000 do sLine2 = sLine sLine2 = sLine2[2..length(sLine2)] end for t = time() - t printf(1, "%6.2f seconds\n", t) ---------------------------------------------------------------------- -- time to allocate string (really fast hardly measurable) t = time() for i=1 to 2000 do sLine2 = sLine end for t = time() - t printf(1, "%6.2f seconds\n", t) ---------------------------------------------------------------------- -- time to slice string without allocating it to another variable (really fast hardly measurable) t = time() for i=1 to 2000 do sLine = sLine[2..length(sLine)] end for t = time() - t printf(1, "%6.2f seconds\n", t) ********************************************************************************* This email and any files transmitted with it may be legally privileged and confidential. If you are not the intended recipient of this email, you must not disclose or use the information contained in it. If you have received this email in error, please notify us by return email and permanently delete the document. *********************************************************************************
2. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Robert Craig <rds at RapidEuphoria.com> Aug 16, 2001
- 450 views
slie writes: <several examples of slicing> Without going through your examples in detail, I'll just give you a few hints: 1. Whenever you see: x = y nothing happens except for a pointer being copied. and maybe a reference count being incremented. If y is a sequence, x and y will now point at a single copy of the same sequence. 2. When you see: y = x[a..b] Then the data from a to b is copied and a new sequence is created. (Back in v1.4 and earlier I used to just have y point at the data in x that it needed, but I later changed this). 3. When you see: x = x[a..b] and there are no other references to the sequence, then I simply adjust the pointers to the beginning and end of x, without copying anything. If I am left with a huge amount of unused space, then I might decide to free it up. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
3. Re: Interesting Experiment With String/Sequence Slicing
- Posted by slie at theage.fairfax.com.au Aug 20, 2001
- 433 views
hi Robert, sorry to keep bring this issue up as the program i am writting seems to be severely affected by the slowness in the string slicing function in euphoria. the program i'm working on is basically doing a lot of string string and replace (for large text files) which relies much on the string slicing. i am just curious as to ways the program can be optimised and i have run out of avenues except looking how i can get around the slicing issue. in my last email i wrote : >>my conclusion from this test is that the either the slicing or the >>allocation method in the interpreter is 'iterative' as opposed to just cutting and >>moving large chunks of string around in the memory, which explains the poor perfomance? i am just wild guessing here, but could it be true that a function can be specifically written to deal with string type sequences and so speeding up the slicing function? regards, sam lie Down Under, Australia ----- Original Message ----- From: "Robert Craig" <rds at RapidEuphoria.com> To: "EUforum" <EUforum at topica.com> Sent: Friday, August 17, 2001 2:23 AM Subject: Re: Interesting Experiment With String/Sequence Slicing > > slie writes: > <several examples of slicing> > > Without going through your examples in detail, > I'll just give you a few hints: > > 1. Whenever you see: > > x = y > > nothing happens except for a pointer being copied. > and maybe a reference count being incremented. > If y is a sequence, x and y will now point at a single > copy of the same sequence. > > 2. When you see: > > y = x[a..b] > > Then the data from a to b is copied and a new sequence is > created. (Back in v1.4 and earlier I used to just have y point at the data > in x that it needed, but I later changed this). > > 3. When you see: > > x = x[a..b] > > and there are no other references to the sequence, > then I simply adjust the pointers to the beginning and end of x, > without copying anything. If I am left with a huge amount of > unused space, then I might decide to free it up. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > > > > ********************************************************************************* This email and any files transmitted with it may be legally privileged and confidential. If you are not the intended recipient of this email, you must not disclose or use the information contained in it. If you have received this email in error, please notify us by return email and permanently delete the document. *********************************************************************************
4. Re: Interesting Experiment With String/Sequence Slicing
- Posted by slie at theage.fairfax.com.au Aug 20, 2001
- 424 views
hi Mike > (insertIndex() & removeIndex()). I found an up-to 4-fold or more > increase in speed. I don't have time to explain the method right now but i have looked at insertIndex() & removeIndex() but am still not sure how it works in my situation. i am working with just a simple sequence of chars (string) but i believe insertIndex() & removeIndex() applies more to lists/sequences of objects. but insertIndex() & removeIndex() seems to be very interesting, maybe i missed something subtle there. regards, sam lie Down Under, Australia ----- Original Message ----- From: "Mike" <vulcan at win.co.nz> To: "EUforum" <EUforum at topica.com> Sent: Tuesday, August 21, 2001 1:23 PM Subject: RE: Interesting Experiment With String/Sequence Slicing > > Hi Sam, I have found a way of increasing the poor performance of slicing > really large sequences and have used this in my windows editor. This > method is also in the file tk_misc.e included with win32lib.ew > (insertIndex() & removeIndex()). I found an up-to 4-fold or more > increase in speed. I don't have time to explain the method right now but > if you want more help AFTER studying these routines then you could write > to me direct. vulcan at win.co.nz > > > slie at theage.fairfax.com.au wrote: > > hi Robert, > > > > sorry to keep bring this issue up as the program i am writting > > seems to be severely affected by the slowness in the string slicing > > function in euphoria. the program i'm working on is basically doing a > > lot > > of string > > string and replace (for large text files) which relies much on the > > string > > slicing. > > i am just curious as to ways the program can be optimised and i have > > run out of avenues except looking how i can get around the slicing > > issue. > > > > > > ********************************************************************************* This email and any files transmitted with it may be legally privileged and confidential. If you are not the intended recipient of this email, you must not disclose or use the information contained in it. If you have received this email in error, please notify us by return email and permanently delete the document. *********************************************************************************
5. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Derek Parnell <ddparnell at bigpond.com> Aug 20, 2001
- 431 views
> From: slie at theage.fairfax.com.au > To: EUforum <EUforum at topica.com> > Reply-To: EUforum at topica.com > Subject: Re: Interesting Experiment With > String/Sequence Slicing > Date: 21/08/2001 12:58:45 PM > > > i am just wild guessing here, but could it > be true that a function can be > specifically written to deal with string > type sequences and so speeding up > the slicing function? > Hi Sam, we have discovered that the fastest method of inserting, deleting, and replacing text in sequences is to minimize the number and size of concatenations. Consider the general case where you have a text of <aaa><bbb><ccc> and a replacement text of <ddd> which is going to replace <bbb>. If the length of <ddd> is longer than <bbb> then create a new sequence ( using the repeat() function) with the length of <aaa> + <ddd> + <ccc> then copy these sub-strings into the new sequence and return the new sequence. Where <ddd> is shorter than <bbb>, you can avoid the overhead of creating a new sequence and reuse the space in the exiting string. The trick here is to move either <aaa> closer to <ccc> or <ccc> closer to <aaa>, depending on which is smaller, and leaving enough room for <ddd> to slide in between them. Then you just return the truncated sequence. If you didn't move <aaa> then that is S[1 .. length(<aaa>)+length(<ddd>)+length(<ccc>)], but if you did move <aaa> it is S[length(S) - (length(<aaa>)+length(<ddd>)+length(<ccc>)) .. length(S)]. There are further optimisations if <aaa>, <ccc>, or <ddd> are empty sequences. I know this looks complex but it is almost always faster than creating temporary sequences and concatenating them, depending on the length of the sequences involved. ---- Derek confidential information intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient of this message you are hereby notified that any use, dissemination, distribution or reproduction of this message is prohibited. If you have received this message in error please notify the sender immediately. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of Global Technology Australasia Limited.
6. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Robert Craig <rds at RapidEuphoria.com> Aug 20, 2001
- 429 views
Sam Lie writes: > i am just curious as to ways the program can be > optimised and i have run out of avenues except > looking how i can get around the slicing issue. Since you are a registered user, you can say: "with profile" at the top of your main file, to get a count of statements executed, or even better with ex.exe: "with profile_time" to get the percentage of time consumed by each statement. These reports go to a file called "ex.pro". It will show you the statements that are consuming most of the time. If you post the relevant chunk(s) of code to this list, someone, maybe me, can probably suggest a faster way to do the same thing. The first time you profile a program you often discover something surprising. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
7. Re: Interesting Experiment With String/Sequence Slicing
- Posted by slie at theage.fairfax.com.au Aug 20, 2001
- 415 views
Hi Robert, i have ran the profiler and the bottle neck still shows up when doing string slicing. in the example below sLine contains a text string (50k) read from a html file, the example below took about 4 seconds to execute, i just somehow believe 4 seconds is way too long just to do string slicing... is there a remedy? regards, sam |for i=1 to 1000 do 62.67 | sLine2 = sLine[2..length(sLine)] |end for | |for i=1 to 1000 do 8.38 | sLine2 = sLine 28.34 | sLine2 = sLine2[2..length(sLine2)] |end for | ps "thanks" to Derek for the work around suggestion i will definitely be implement his technique when doing concatenation and slicing. ----- Original Message ----- From: "Robert Craig" <rds at RapidEuphoria.com> To: "EUforum" <EUforum at topica.com> Sent: Tuesday, August 21, 2001 2:10 PM Subject: Re: Interesting Experiment With String/Sequence Slicing > > Sam Lie writes: > > i am just curious as to ways the program can be > > optimised and i have run out of avenues except > > looking how i can get around the slicing issue. > > Since you are a registered user, you can say: > "with profile" > at the top of your main file, to get a count of > statements executed, or even better with ex.exe: > "with profile_time" > to get the percentage of time consumed by each statement. > > These reports go to a file called "ex.pro". > It will show you the statements that are consuming > most of the time. If you post the relevant chunk(s) > of code to this list, someone, maybe me, can > probably suggest a faster way to do the same thing. > > The first time you profile a program you often > discover something surprising. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > > > > ********************************************************************************* This email and any files transmitted with it may be legally privileged and confidential. If you are not the intended recipient of this email, you must not disclose or use the information contained in it. If you have received this email in error, please notify us by return email and permanently delete the document. *********************************************************************************
8. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Kat <gertie at PELL.NET> Aug 20, 2001
- 428 views
On 21 Aug 2001, at 14:06, Derek Parnell wrote: > > > From: slie at theage.fairfax.com.au > > To: EUforum <EUforum at topica.com> > > Reply-To: EUforum at topica.com > > Subject: Re: Interesting Experiment With > > String/Sequence Slicing > > Date: 21/08/2001 12:58:45 PM > > > > > > i am just wild guessing here, but could it > > be true that a function can be > > specifically written to deal with string > > type sequences and so speeding up > > the slicing function? > > > > Hi Sam, > we have discovered that the fastest method of inserting, deleting, and > replacing text in sequences is to minimize the number and size of > concatenations. > > Consider the general case where you have a text of <aaa><bbb><ccc> and a > replacement text of <ddd> which is going to replace <bbb>. > > If the length of <ddd> is longer than <bbb> then create a new sequence ( > using the repeat() function) with the length of <aaa> + <ddd> + <ccc> then > copy these sub-strings into the new sequence and return the new sequence. > > Where <ddd> is shorter than <bbb>, you can avoid the overhead of creating a > new > sequence and reuse the space in the exiting string. The trick here is to move > either <aaa> closer to <ccc> or <ccc> closer to <aaa>, depending on which is > smaller, and leaving enough room for <ddd> to slide in between them. Then you > just return the truncated sequence. If you didn't move <aaa> then that is S[1 > .. > length(<aaa>)+length(<ddd>)+length(<ccc>)], but if you did move <aaa> it is > S[length(S) - (length(<aaa>)+length(<ddd>)+length(<ccc>)) .. length(S)]. > > There are further optimisations if <aaa>, <ccc>, or <ddd> are empty > sequences. > > I know this looks complex but it is almost always faster than creating > temporary sequences and concatenating them, depending on the length of the > sequences involved. This is something the interpreter should be doing. Kat
9. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Robert Craig <rds at RapidEuphoria.com> Aug 21, 2001
- 424 views
Sam Lie writes: > i have ran the profiler and the bottle neck still > shows up when doing string slicing. I gather from your earlier private e-mail that you are reading huge strings from a file and running match() on them. Or perhaps the string *is* the whole file? Since match() starts at the beginning of a sequence, and stops when it encounters a match, I gather you are slicing the 50K strings to look for further matches. Try reading and searching one line at a time using gets(). You won't be copying a huge string when you have a match, and you'll make better use of the Pentium on-chip cache, e.g. object line integer fn fn = open("myfile.html", "r") while 1 do line = gets(fn) if atom(line) then exit end if if match("foobar", line) then ..... end if end while close(fn) After doing the above, if you are still convinced that slicing is the culprit, you could write your own match() that locates multiple occurrences without slicing. (There may be an example in the mailing list archives.) Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
10. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Derek Parnell <ddparnell at bigpond.com> Aug 21, 2001
- 428 views
Robert, is the *ANY* chance (pretty-lease) that you might create an in-built version of match and find that allows a starting offset index, similar to BASIC's instr() function. Maybe somthing like ... findfrom(offset, look_for, look_in) matchfrom(offset, look_for, look_in) I find myself frequently cutting and dicing slices, or doing ssllooww searches because this functionality is missing. It would be so much faster if these were in-built rather than Euphoria-written functions. ----- Original Message ----- From: "Robert Craig" <rds at RapidEuphoria.com> To: "EUforum" <EUforum at topica.com> Sent: Wednesday, August 22, 2001 2:00 AM Subject: Re: Interesting Experiment With String/Sequence Slicing > > > Sam Lie writes: > > i have ran the profiler and the bottle neck still > > shows up when doing string slicing. > > I gather from your earlier private e-mail that you are reading huge > strings from a file and running match() on them. > Or perhaps the string *is* the whole file? > > Since match() starts at the beginning of a sequence, > and stops when it encounters a match, I gather > you are slicing the 50K strings to look for further matches. > > Try reading and searching one line at a time using gets(). > You won't be copying a huge string when you have a match, > and you'll make better use of the Pentium on-chip cache, > e.g. > object line > integer fn > > fn = open("myfile.html", "r") > while 1 do > line = gets(fn) > if atom(line) then > exit > end if > if match("foobar", line) then > ..... > end if > end while > close(fn) > > After doing the above, if you are still convinced that > slicing is the culprit, you could write your own match() > that locates multiple occurrences without slicing. > (There may be an example in the mailing list archives.) > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > > > > >
11. Re: Interesting Experiment With String/Sequence Slicing
- Posted by George Walters <gwalters at sc.rr.com> Aug 21, 2001
- 434 views
I agree with this. I've had to write my own in EU... and I'm sure it is slow compared to a built in function. ...george ----- Original Message ----- From: "Derek Parnell" <ddparnell at bigpond.com> To: "EUforum" <EUforum at topica.com> Subject: Re: Interesting Experiment With String/Sequence Slicing > > Robert, > is the *ANY* chance (pretty-lease) that you might create an in-built version > of match and find that allows a starting offset index, similar to BASIC's > instr() function. Maybe somthing like ... > > findfrom(offset, look_for, look_in) > matchfrom(offset, look_for, look_in) > > I find myself frequently cutting and dicing slices, or doing ssllooww > searches because this functionality is missing. It would be so much faster > if these were in-built rather than Euphoria-written functions. > > ----- Original Message ----- > From: "Robert Craig" <rds at RapidEuphoria.com> > To: "EUforum" <EUforum at topica.com> > Sent: Wednesday, August 22, 2001 2:00 AM > Subject: Re: Interesting Experiment With String/Sequence Slicing > > > > > > > Sam Lie writes: > > > i have ran the profiler and the bottle neck still > > > shows up when doing string slicing. > > > > I gather from your earlier private e-mail that you are reading huge > > strings from a file and running match() on them. > > Or perhaps the string *is* the whole file? > > > > Since match() starts at the beginning of a sequence, > > and stops when it encounters a match, I gather > > you are slicing the 50K strings to look for further matches. > > > > Try reading and searching one line at a time using gets(). > > You won't be copying a huge string when you have a match, > > and you'll make better use of the Pentium on-chip cache, > > e.g. > > object line > > integer fn > > > > fn = open("myfile.html", "r") > > while 1 do > > line = gets(fn) > > if atom(line) then > > exit > > end if > > if match("foobar", line) then > > ..... > > end if > > end while > > close(fn) > > > > After doing the above, if you are still convinced that > > slicing is the culprit, you could write your own match() > > that locates multiple occurrences without slicing. > > (There may be an example in the mailing list archives.) > > > > Regards, > > Rob Craig > > Rapid Deployment Software > > http://www.RapidEuphoria.com > > > > > > > > > > > > > > > > >
12. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Kat <gertie at PELL.NET> Aug 21, 2001
- 443 views
On 22 Aug 2001, at 6:41, Derek Parnell wrote: > > > Robert, > is the *ANY* chance (pretty-lease) that you might create an in-built version > of > match and find that allows a starting offset index, similar to BASIC's instr() > function. Maybe somthing like ... > > findfrom(offset, look_for, look_in) > matchfrom(offset, look_for, look_in) I agree. Kat
13. Re: Interesting Experiment With String/Sequence Slicing
- Posted by slie at theage.fairfax.com.au Aug 21, 2001
- 434 views
hi Robert, i have tried as you have suggested but have came across more problems that i havent anticipated because the string that i am searching for are tag types eg. <TD> ** NAME ** </TD> and i run into problem when </TD> or parts of it occur on the next line. so i will still have to read all the text into one big sequence and then do the string slicing and chopping. i suppose i can very much live with slicing but i am just wondering could there be a function similar to slicing but only works on strings and is much faster by moving blocks of memory rather than copy one sequence element to another one at a time. however maybe thats not possible because the sequence is not represented as a set of continuous memory block. today i also came across something very interestings. example 1 for i=1 to 1000 do sLine2 = sLine sLine2 = sLine2[2..length(sLine2)] end for example 2 for i=1 to 1000 do sLine2 = sLine sLine3 = sLine2[length(sLine2)-10..length(sLine2)] end for example 2 was faster than example 1 by a factor of 100 so my thought was that example 1 was slower because it had to copy more items when doing the slicing. but then again i thought why is it copying items, as Derek had suggested in the case of example 1 we could just blank out/remove the first sequence element and return the sLine2 as is. If this case is true then the term 'slicing' could not be taken literally because it is more like a copy operation. flame me if i am wrong. regards, sam lie Down Under, Australia t = time() for i=1 to 1000 do sLine2 = sLine sLine3 = sLine2[length(sLine2)-10..length(sLine2)] ----- Original Message ----- From: "Robert Craig" <rds at RapidEuphoria.com> To: "EUforum" <EUforum at topica.com> Sent: Wednesday, August 22, 2001 2:00 AM Subject: Re: Interesting Experiment With String/Sequence Slicing > > Sam Lie writes: > > i have ran the profiler and the bottle neck still > > shows up when doing string slicing. > > I gather from your earlier private e-mail that you are reading huge > strings from a file and running match() on them. > Or perhaps the string *is* the whole file? > > Since match() starts at the beginning of a sequence, > and stops when it encounters a match, I gather > you are slicing the 50K strings to look for further matches. > > Try reading and searching one line at a time using gets(). > You won't be copying a huge string when you have a match, > and you'll make better use of the Pentium on-chip cache, > e.g. > object line > integer fn > > fn = open("myfile.html", "r") > while 1 do > line = gets(fn) > if atom(line) then > exit > end if > if match("foobar", line) then > ..... > end if > end while > close(fn) > > After doing the above, if you are still convinced that > slicing is the culprit, you could write your own match() > that locates multiple occurrences without slicing. > (There may be an example in the mailing list archives.) > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > > > > ********************************************************************************* This email and any files transmitted with it may be legally privileged and confidential. If you are not the intended recipient of this email, you must not disclose or use the information contained in it. If you have received this email in error, please notify us by return email and permanently delete the document. *********************************************************************************
14. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Robert Craig <rds at RapidEuphoria.com> Aug 21, 2001
- 439 views
Derek Parnell writes: > is the *ANY* chance (pretty-lease) that you might create > an in-built version of match and find that allows a starting > offset index, similar to BASIC's instr() function. > Maybe somthing like ... > > findfrom(offset, look_for, look_in) > matchfrom(offset, look_for, look_in) Thanks for the suggestion. I'll add it to my list. I'll look into it after 2.3. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
15. Re: Interesting Experiment With String/Sequence Slicing
- Posted by Robert Craig <rds at RapidEuphoria.com> Aug 21, 2001
- 460 views
Sam Lie writes: > the string that i am searching for > are tag types eg. <TD> ** NAME ** </TD> > and i run into problem when </TD> or parts of > it occur on the next line. so i will still have to > read all the text into one big sequence and then > do the string slicing and chopping. The best (fastest) algorithm for this problem would be to read a character at a time with getc(), copying input characters to the output file with puts(), and whenever you see '<', check to see if it's followed by 'T' 'D' '>' . When you find the start tag, set a variable to suspend writing characters to the output file, and start looking for '<' followed by '/' 'T' 'D' '>'. (allow for blanks, tabs and new-lines where they are allowed). This will take more code but it will be extremely fast. What you are doing now with match() and slicing, requires that you copy an average of 25,000 characters every time you match a start or end tag. > i suppose i can > very much live with slicing but i am just wondering > could there be a function similar to slicing but only > works on strings and is much faster by moving blocks > of memory rather than copy one sequence element > to another one at a time. No. Euphoria does not distinguish strings from sequences of integers internally. As I mentioned, I used to cleverly point to slices within a sequence rather than copying them, but this led to inefficiencies elsewhere, and only helps when you never modify the sliced data or the original sequence, so I dropped that idea a few years ago. > however maybe thats not possible > because the sequence is not represented as a set of > continuous memory block. A sequence *is* represented as a contiguous block of memory. > today i also came across something very interestings. > > example 1 > for i=1 to 1000 do > sLine2 = sLine > sLine2 = sLine2[2..length(sLine2)] > end for The above would be lightning fast, except that sLine2 = sLine creates 2 references to the same sequence, so I can't simply adjust the internal pointers. I have to copy n-1 elements to a brand new sLine2 sequence (otherwise sLine would be mistakenly altered). > example 2 > for i=1 to 1000 do > sLine2 = sLine > sLine3 = sLine2[length(sLine2)-10..length(sLine2)] > end for > example 2 was faster than example 1 by a factor of 100 Of course. You are only copying 11 elements, not thousands. > so my thought was that example 1 was slower because it > had to copy more items when doing the slicing. but then again i > thought why is it copying items, as Derek had suggested > in the case of example 1 we could just blank out/remove the first > sequence element and return the sLine2 as is. This optimization does happen when there is only one reference on a sequence. In your example 1 you get Sline2 = Sline for free, but you have to pay when you try to modify the (shared) sequence. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
16. Re: Interesting Experiment With String/Sequence Slicing
- Posted by slie at theage.fairfax.com.au Aug 21, 2001
- 436 views
Hi Robert, thank you for your reply. it is very informative and has helped me a lot in understanding more in how euphoria works. i also looked at your suggested solution and have been taking that approach by iteratively going through the string or buffer read in rather than the match and slice method. however, there is still a lot of places where slicing occurs for various reasons and when running time profile all the big clock ticks are happening where slicing are occuring. i was even more surprised when i found large clock ticks in places where i am just trying to reference to a slice of a sequence. as you have pointed out that you 'no' longer use direct reference to the sequence when doing slicing but would it not be more optimal to allow direct reference when there is obvious situations when the sequence is only for viewing only eg. if equal(sLine[i..i+5],"</TD>") but i guess that would mean we will just end up in messy pointers to this and that mess like they have in C. i have never wrote an interpreter before and certainly not any time in the future as my technical skills are very inadequate but i always had keen interest in how interpreters work in particularly euphoria as it is very well designed althought it has been a surprise for me to discover the string slicing issue which will become a slight concern when undertaking projects that may involve quite a lot string slashing and cutting. regards, sam lie Down Under, Australia ----- Original Message ----- From: "Robert Craig" <rds at RapidEuphoria.com> To: "EUforum" <EUforum at topica.com> Sent: Wednesday, August 22, 2001 2:01 PM Subject: Re: Interesting Experiment With String/Sequence Slicing > > Sam Lie writes: > > the string that i am searching for > > are tag types eg. <TD> ** NAME ** </TD> > > and i run into problem when </TD> or parts of > > it occur on the next line. so i will still have to > > read all the text into one big sequence and then > > do the string slicing and chopping. > > The best (fastest) algorithm for this problem would be to read > a character at a time with getc(), copying input characters to > the output file with puts(), and whenever you see '<', check to see if > it's followed by 'T' 'D' '>' . > When you find the start tag, set a variable to suspend > writing characters to the output file, > and start looking for '<' followed by '/' 'T' 'D' '>'. > (allow for blanks, tabs and new-lines where they are allowed). > This will take more code but it will be extremely fast. > > What you are doing now with match() and slicing, > requires that you copy an average of 25,000 characters > every time you match a start or end tag. > > > i suppose i can > > very much live with slicing but i am just wondering > > could there be a function similar to slicing but only > > works on strings and is much faster by moving blocks > > of memory rather than copy one sequence element > > to another one at a time. > > No. Euphoria does not distinguish strings from > sequences of integers internally. As I mentioned, > I used to cleverly point to slices within a sequence > rather than copying them, but this led to inefficiencies > elsewhere, and only helps when you never modify the > sliced data or the original sequence, > so I dropped that idea a few years ago. > > > however maybe thats not possible > > because the sequence is not represented as a set of > > continuous memory block. > > A sequence *is* represented as a contiguous block > of memory. > > > today i also came across something very interestings. > > > > example 1 > > for i=1 to 1000 do > > sLine2 = sLine > > sLine2 = sLine2[2..length(sLine2)] > > end for > > The above would be lightning fast, except that > sLine2 = sLine > creates 2 references to the same sequence, so I can't > simply adjust the internal pointers. I have to copy n-1 > elements to a brand new sLine2 sequence (otherwise sLine > would be mistakenly altered). > > > example 2 > > for i=1 to 1000 do > > sLine2 = sLine > > sLine3 = sLine2[length(sLine2)-10..length(sLine2)] > > end for > > example 2 was faster than example 1 by a factor of 100 > > Of course. > You are only copying 11 elements, not thousands. > > > so my thought was that example 1 was slower because it > > had to copy more items when doing the slicing. but then again i > > thought why is it copying items, as Derek had suggested > > in the case of example 1 we could just blank out/remove the first > > sequence element and return the sLine2 as is. > > This optimization does happen when there is only one > reference on a sequence. > > In your example 1 you get > Sline2 = Sline > for free, but you have to pay when you try to modify the > (shared) sequence. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > > > > ********************************************************************************* This email and any files transmitted with it may be legally privileged and confidential. If you are not the intended recipient of this email, you must not disclose or use the information contained in it. If you have received this email in error, please notify us by return email and permanently delete the document. *********************************************************************************