Re: Interesting Experiment With String/Sequence Slicing
- Posted by Kat <gertie at PELL.NET> Aug 20, 2001
- 428 views
On 21 Aug 2001, at 14:06, Derek Parnell wrote: > > > From: slie at theage.fairfax.com.au > > To: EUforum <EUforum at topica.com> > > Reply-To: EUforum at topica.com > > Subject: Re: Interesting Experiment With > > String/Sequence Slicing > > Date: 21/08/2001 12:58:45 PM > > > > > > i am just wild guessing here, but could it > > be true that a function can be > > specifically written to deal with string > > type sequences and so speeding up > > the slicing function? > > > > Hi Sam, > we have discovered that the fastest method of inserting, deleting, and > replacing text in sequences is to minimize the number and size of > concatenations. > > Consider the general case where you have a text of <aaa><bbb><ccc> and a > replacement text of <ddd> which is going to replace <bbb>. > > If the length of <ddd> is longer than <bbb> then create a new sequence ( > using the repeat() function) with the length of <aaa> + <ddd> + <ccc> then > copy these sub-strings into the new sequence and return the new sequence. > > Where <ddd> is shorter than <bbb>, you can avoid the overhead of creating a > new > sequence and reuse the space in the exiting string. The trick here is to move > either <aaa> closer to <ccc> or <ccc> closer to <aaa>, depending on which is > smaller, and leaving enough room for <ddd> to slide in between them. Then you > just return the truncated sequence. If you didn't move <aaa> then that is S[1 > .. > length(<aaa>)+length(<ddd>)+length(<ccc>)], but if you did move <aaa> it is > S[length(S) - (length(<aaa>)+length(<ddd>)+length(<ccc>)) .. length(S)]. > > There are further optimisations if <aaa>, <ccc>, or <ddd> are empty > sequences. > > I know this looks complex but it is almost always faster than creating > temporary sequences and concatenating them, depending on the length of the > sequences involved. This is something the interpreter should be doing. Kat