1. RE: Replacing characters (Matt: bug)
- Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Sep 16, 2002
- 491 views
> -----Original Message----- > From: Dan Moyer [mailto:DANIELMOYER at prodigy.net] > I've been using your useful replace_all function, & found > that if I happen > to try to replace the last character on a line it errors. Sure does. Guess I never tried to do that. Add the if statement around the call to match() in the loop: function replace_all( sequence text, object a, object b ) integer ix, jx if atom(a) then a = {a} end if if atom(b) then b = {b} end if ix = 0 jx = match( a, text ) while jx do ix += jx text = text[1..ix-1] & b & text[ix+length(a)..length(text)] ix += length(b) if ix < length(text) then jx = match( a, text[ix+length(a)..length(text)] ) else jx = 0 end if end while return text end function
2. RE: Replacing characters (Matt: bug)
- Posted by Henri Goffin <H.Goffin at skynet.be> Sep 16, 2002
- 445 views
Hi! Here's the function I commonly use for that purpose. The core instruction is (IMO) a beautiful piece of sequence arithmetic. It cleverly takes advantage of the fact that Euphoria sees a string of characters as a sequence of atoms. The idea is not from me but was posted here a few months ago by Mike "vulcan" from New Zealand (hi Mike!). That's why i indulge myself with this laudatory praise. Hope you will enjoy it too. Henri Goffin ------< snip >----- global function replace_in_string(sequence s, object o, object n) -- replaces (transliterates) chars in a string according to a translation "table". -- the table is actually either 2 atoms in which case all occurences of atom o in s is replaced by atom n -- or it consists of 2 sequences of atoms (flat sequence) of equal length where each occurence in s -- of each atom in o is replaced by the atom in n having the same position -- example replace_in_string("This is an example", "aeiou", "AEIOU") will replace all lower case -- vowels into upper case. The returned string is: "ThIs Is An ExAmplE". -- WARNING: this function does not give the expected result if o and n contain identical atom(s) in -- different positions. In particular the function cannot "swap" a pair of atoms in the sequence -- Example: replace_in_string("binary code is made of 1s and 0s", "01", "10") -- is not equal to: "binary code is made of 0s and 1s" -- To obtain this last result, you have to make 2 calls with an intermediate replacement: -- S=replace_in_string("binary code is made of 1s and 0s", "01", "#0") -- replace_in_string(S, "#", "1") sequence so if atom(o) then if sequence(n) then return -1 else so = repeat(o, length(s)) -- here is the trick s += (s = so) * (n - o) -- ... and here return s end if elsif atom(n) then return -1 elsif length(o) != length(n) then return -1 else for i = 1 to length(o) do s = replace_in_string(s, o[i], n[i]) end for return s end if end function -----< snip >----- -----Original Message----- From: Dan Moyer [SMTP:DANIELMOYER at prodigy.net] Sent: Monday, September 16, 2002 4:34 PM To: EUforum Subject: Re: Replacing characters (Matt: bug) Thanks Matt! Dan ----- Original Message ----- From: "Matthew Lewis" <matthewwalkerlewis at YAHOO.COM> To: "EUforum" <EUforum at topica.com> Sent: Monday, September 16, 2002 4:30 AM Subject: RE: Replacing characters (Matt: bug) > > > > -----Original Message----- > > From: Dan Moyer [mailto:DANIELMOYER at prodigy.net] > > > I've been using your useful replace_all function, & found > > that if I happen > > to try to replace the last character on a line it errors. > > Sure does. Guess I never tried to do that. Add the if statement around the > call to match() in the loop: > > function replace_all( sequence text, object a, object b ) > integer ix, jx > > if atom(a) then > a = {a} > end if > > if atom(b) then > b = {b} > end if > > ix = 0 > jx = match( a, text ) > while jx do > ix += jx > text = text[1..ix-1] & b & text[ix+length(a)..length(text)] > ix += length(b) > > if ix < length(text) then > jx = match( a, text[ix+length(a)..length(text)] ) > else > jx = 0 > end if > end while > > return text > end function > >
3. RE: Replacing characters (Matt: bug)
- Posted by Andy Serpa <renegade at earthling.net> Sep 16, 2002
- 456 views
Henri Goffin wrote: >> global function replace_in_string(sequence s, object o, object n) > The "so" sequence is not necessary. You can compare the sequence directly to the atom "o": s += (s = o) * (n - o)
4. RE: Replacing characters (Matt: bug)
- Posted by Andy Serpa <renegade at earthling.net> Sep 16, 2002
- 472 views
Andy Serpa wrote: > > Henri Goffin wrote: > >> global function replace_in_string(sequence s, object o, object n) > > > > The "so" sequence is not necessary. You can compare the sequence > directly to the atom "o": > > s += (s = o) * (n - o) > > Of course this, while devoid of cleverness, is at least 10x faster (and does swaps too): function replace_in_string(sequence s, object o, object n) integer ni if atom(o) and atom(n) then for i = 1 to length(s) do if s[i] = o then s[i] = n end if end for return s elsif sequence(o) and sequence(n) and length(o) = length(n) then for i = 1 to length(s) do ni = find(s[i],o) if ni then s[i] = n[ni] end if end for return s else return -1 end if end function
5. RE: Replacing characters (Matt: bug)
- Posted by Andy Serpa <renegade at earthling.net> Sep 17, 2002
- 469 views
Dan Moyer wrote: > Some tests (not exhaustive) of the 3 "replace characters" routines > recently > shown here: > > Matt's now doesn't fail on replace last character, but doesn't actually > replace the last character, & is *very* much slower than either of the > other > two > (one test: Henri: 3.9, Andy: 2.75 , Matt: 24.06) ; > > Henri/Mike "vulcan" routine is faster than Matt's but won't replace one > char > with 2; > > Andy's is fastest. How much faster seems to vary. On a "long" sequence > to > peruse, it seemed twice as fast as Henri/Mike "vulcan", but on many > repetitions of replacing in smaller sequence, it seems only 1.4 times > faster > (see test results above). > > I've attached the tests I ran in case I did something dumb in them. > > Now that I think about it, I don't know why I put "10x faster". I did only one test with a short sequence and short replacements, and found mine about 4.5x faster that Henri's (with the "so" sequence removed -- maybe that actually speeds it up?). My unposted "most elegant" version turned out to be slower than both, so I just admired it for a while then erased it...
6. RE: Replacing characters (Matt: bug)
- Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Sep 17, 2002
- 455 views
> From: Dan Moyer [mailto:DANIELMOYER at prodigy.net] > Matt's now doesn't fail on replace last character, but > doesn't actually > replace the last character, & is *very* much slower than > either of the other > two > (one test: Henri: 3.9, Andy: 2.75 , Matt: 24.06) ; It seems to replace the last char when I test it. I believe that the slowness is due to the use of subscripting within the call to match(). There was some discussion about this recently. I typically run fairly short strings through this routine (maybe 2 or 3 sentence lengths) at a time, so I've never needed any more speed. > Henri/Mike "vulcan" routine is faster than Matt's but won't > replace one char > with 2; > > Andy's is fastest. How much faster seems to vary. On a > "long" sequence to > peruse, it seemed twice as fast as Henri/Mike "vulcan", but on many > repetitions of replacing in smaller sequence, it seems only > 1.4 times faster > (see test results above). Actually, neither one will handle replacements of different lengths. This was a definite requirement for what I needed, where, for instance you might want to replace "." with "..". Also, neither routine will handle: R/replace_in_string("abc cba", "ab", "12" ) Both return "12c c21", which is fine if you've got a cipher (did anyone use this method in the contest? :), but not if you're trying to replace words with other words, in which case you end up with garbage. Henri talks about this in the docs for the routine. Of course, if the searched for object isn't there, or there aren't many of them, my routine will speed up. So if you're doing it on a large file, you might try calling replace_all each line or every few lines. You'd probably have to test to see where the overhead of the call cancelled out the quicker return. And even then, it probably would depend upon how often your match came up. Here is a tweaked version that doesn't use a subscripted match. It seems to be about 15-20% faster: global function replace_all2( sequence text, object a, object b ) integer ix, jx, m, buf, lent, lena, lenb, dlen if atom(a) then a = {a} end if if atom(b) then b = {b} end if ix = match( a, text ) if not ix then return text end if lena = length(a) lenb = length(b) dlen = lenb - lena lent = length(text) jx = lena + 1 while jx > 1 do text = text[1..ix-1] & b & text[ix+lena..lent] ix += lenb jx = 1 lent += dlen while ix <= lent and jx <= lena do if text[ix] = a[jx] then jx += 1 else jx = 1 end if ix += 1 end while if ix > lent then ix -= lena end if end while return text end function Matt Lewis
7. RE: Replacing characters (Matt: bug)
- Posted by Andy Serpa <renegade at earthling.net> Sep 17, 2002
- 456 views
Matthew Lewis wrote: > > Actually, neither one will handle replacements of different lengths. > > This > was a definite requirement for what I needed, where, for instance you > might > want to replace "." with "..". Also, neither routine will handle: > > R/replace_in_string("abc cba", "ab", "12" ) > > Both return "12c c21", which is fine if you've got a cipher Well, that's what it is supposed to do -- replace character for character. Personally, I don't think I'd ever use it -- I have a very fast substring replace routine that chooses from two different search methods based on the size of the subject & a class replace routine (which will replace any single character or group of characters in the "class" it is looking for) that pretty much take care of everything I need. I was thinking of putting together a string library with some of this stuff. Maybe I'll post the routines one at a time, and we can look for improvements to them and build up the library. The string stuff in the archives has left me wanting...
8. RE: Replacing characters (Matt: bug)
- Posted by Andy Serpa <renegade at earthling.net> Sep 17, 2002
- 453 views
Derek, Yours fails if "a" is longer than "b"... -- Andy Derek Parnell wrote: > Hi Matt, > unfortunately the algorithm doesn't work. Try this test ... > > x = replace_all2("derekderekderekderek", "rek", "1234567") > > this returns "de1234567derek1234567ekde1234567" > > rather than "de1234567de1234567de1234567de1234567" > > > Anyhow, here is another method of doing it... > > function replace_elem(sequence s, object a, object b) > sequence t > integer n,m > > if atom(a) then > a = {a} > end if > > if atom(b) then > b = {b} > end if > > -- Create a buffer big enough to cater for worst case replacement. > t = repeat(0, (floor(length(s) / length(a)) + 1) * length(b)) > > m = 1 > while 1 do > n = match(a, s) > if n then > t[m..m+n-2] = s[1..n-1] > m += (n-1) > t[m..m+length(b)-1] = b > m += (length(b)) > s = s[n+length(a) .. length(s)] > else > t[m..m+length(s)-1] = s > m += (length(s)) > exit > end if > end while > > return t[1..m-1] > end function > > > ---------------- > cheers, > Derek Parnell > ----- Original Message ----- > From: "Matthew Lewis" <matthewwalkerlewis at YAHOO.COM> > To: "EUforum" <EUforum at topica.com> > Sent: Tuesday, September 17, 2002 9:23 PM > Subject: RE: Replacing characters (Matt: bug) > > > > > From: Dan Moyer [mailto:DANIELMOYER at prodigy.net] > > > > > Matt's now doesn't fail on replace last character, but > > > doesn't actually > > > replace the last character, & is *very* much slower than > > > either of the other > > > two > > > (one test: Henri: 3.9, Andy: 2.75 , Matt: 24.06) ; > > > > It seems to replace the last char when I test it. I believe that the > > slowness is due to the use of subscripting within the call to match(). > > There was some discussion about this recently. I typically run fairly > short > > strings through this routine (maybe 2 or 3 sentence lengths) at a time, > > so > > I've never needed any more speed. > > > > > Henri/Mike "vulcan" routine is faster than Matt's but won't > > > replace one char > > > with 2; > > > > > > Andy's is fastest. How much faster seems to vary. On a > > > "long" sequence to > > > peruse, it seemed twice as fast as Henri/Mike "vulcan", but on many > > > repetitions of replacing in smaller sequence, it seems only > > > 1.4 times faster > > > (see test results above). > > > > Actually, neither one will handle replacements of different lengths. > > This > > was a definite requirement for what I needed, where, for instance you > might > > want to replace "." with "..". Also, neither routine will handle: > > > > R/replace_in_string("abc cba", "ab", "12" ) > > > > Both return "12c c21", which is fine if you've got a cipher (did anyone > use > > this method in the contest? :), but not if you're trying to replace > > words > > with other words, in which case you end up with garbage. Henri talks <snip>
9. RE: Replacing characters (Matt: bug)
- Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Sep 17, 2002
- 470 views
> From: Derek Parnell [mailto:ddparnell at bigpond.com] > > Hi Matt, > unfortunately the algorithm doesn't work. Try this test ... Yes, the "if ix > lent " test shouldn't be there. > Anyhow, here is another method of doing it... Ah, here's the real reason for slowness. It was using the same sequence, and doing the slow replacing. I always wondered what sort of speedup I might be able to get, but was always too lazy to do it. :) Matt Lewis
10. RE: Replacing characters (Matt: bug)
- Posted by Andy Serpa <renegade at earthling.net> Sep 17, 2002
- 470 views
> > Actually, neither one will handle replacements of different lengths. > > The test I ran (& zipped previously) *did* show Andy's replacing one > with > two. That same instance of doubled parentheses mentioned above ("))") > was > replaced by "ZZZZ" in my test. > > The replace_in_string function? It shouldn't have replaced two-for-one. I don't even see how it is possible -- are you sure? That function should really be called "tranlate" or something as it is not meant as a substring search-and-replace.
11. RE: Replacing characters (Matt: bug)
- Posted by Andy Serpa <renegade at earthling.net> Sep 18, 2002
- 467 views
Dan Moyer wrote: > Andy, > > Well, on the one hand it's your routine, so you should know, but on the > other hand, my test results show it does replace one with two. I should, but I'm not very sharp. >Maybe > there's something wrong with my test?? I'll send you another copy of > the > zip of the test I applied (previously sent twice to the forum), but > here's > the pertinent variables, & a copy of one line of the results (the fact > that > it says "Replace_in_string" instead of "replace_in_string" is just to > distinguish two different routines with the same name in the test, the > one > with "R" is yours): > > Sorry -- I read on the web interface, so the attachments don't come through. > <code snippet> > integer tr -- test repetitions > sequence r, rb -- replace, replace by > text = {} > new = {} > times = {} > > r = {')'} -- replace > rb = {"ZZ"} --replace by > There's the thing. I was thinking that the arguments would always be one-dimensional strings, with the replacement always being a single element of that, i.e. one character. You've got rb = {"ZZ"} instead of just rb = "ZZ" so in your test "ZZ" in a single element instead of just 'Z'. Your search string (r) however uses single quotes and is the same as r = ")". It is all clear now.