OpenEuphoria: Forum: RE: Replacing characters (Matt: bug)

1. RE: Replacing characters (Matt: bug)

Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Sep 16, 2002
491 views

> -----Original Message-----
> From: Dan Moyer [mailto:DANIELMOYER at prodigy.net]

> I've been using your useful replace_all function, & found 
> that if I happen
> to try to replace the last character on a line it errors.

Sure does.  Guess I never tried to do that.  Add the if statement around the
call to match() in the loop:

function replace_all( sequence text, object a, object b )
    integer ix, jx

    if atom(a) then
        a = {a}
    end if

    if atom(b) then
        b = {b}
    end if

    ix = 0
    jx = match( a, text )
    while jx do
        ix += jx
        text = text[1..ix-1] & b & text[ix+length(a)..length(text)]
        ix += length(b)

        if ix < length(text) then
            jx = match( a, text[ix+length(a)..length(text)] )
        else
            jx = 0
        end if
    end while

    return text
end function

new topic » topic index » view message » categorize

2. RE: Replacing characters (Matt: bug)

Posted by Henri Goffin <H.Goffin at skynet.be> Sep 16, 2002
445 views

Hi!

Here's the function I commonly use for that purpose. The core instruction 
is (IMO) a beautiful piece of sequence arithmetic. It cleverly takes 
advantage of the fact that Euphoria sees a string of characters as a 
sequence of atoms. The idea is not from me but was posted here a few months 
ago by Mike "vulcan" from New Zealand (hi Mike!). That's why i indulge 
myself with this laudatory praise. Hope you will enjoy it too.

Henri Goffin

------< snip >-----
global function replace_in_string(sequence s, object o, object n)
-- replaces (transliterates) chars in a string according to a translation 
"table".
-- the table is actually either 2 atoms in which case all occurences of 
atom o in s is replaced by atom n
-- or it consists of 2 sequences of atoms (flat sequence) of equal length 
where each occurence in s
-- of each atom in o is replaced by the atom in n having the same position
-- example replace_in_string("This is an example", "aeiou", "AEIOU") will 
replace all lower case
-- vowels into upper case. The returned string is: "ThIs Is An ExAmplE".

-- WARNING: this function does not give the expected result if o and n 
contain identical atom(s) in
-- different positions. In particular the function cannot "swap" a pair of 
atoms in the sequence
-- Example: replace_in_string("binary code is made of 1s and 0s", "01", 
"10")
-- is not equal to:			"binary code is made of 0s and 1s"
-- To obtain this last result, you have to make 2 calls with an 
intermediate replacement:
-- S=replace_in_string("binary code is made of 1s and 0s", "01", "#0")
-- replace_in_string(S, "#", "1")


sequence so

if atom(o) then
	if sequence(n) then
		return -1
	else
		so = repeat(o, length(s))		-- here  is the trick
		s += (s = so) * (n - o)		-- ... and here
		return s
	end if
elsif atom(n) then
	return -1
elsif length(o) != length(n) then
	return -1
else
	for i = 1 to length(o) do
		s = replace_in_string(s, o[i], n[i])
	end for
	return s
end if
end function
-----< snip >-----

-----Original Message-----
From:	Dan Moyer [SMTP:DANIELMOYER at prodigy.net]
Sent:	Monday, September 16, 2002 4:34 PM
To:	EUforum
Subject:	Re: Replacing characters (Matt: bug)


Thanks Matt!

Dan

----- Original Message -----
From: "Matthew Lewis" <matthewwalkerlewis at YAHOO.COM>
To: "EUforum" <EUforum at topica.com>
Sent: Monday, September 16, 2002 4:30 AM
Subject: RE: Replacing characters (Matt: bug)


>
>
> > -----Original Message-----
> > From: Dan Moyer [mailto:DANIELMOYER at prodigy.net]
>
> > I've been using your useful replace_all function, & found
> > that if I happen
> > to try to replace the last character on a line it errors.
>
> Sure does.  Guess I never tried to do that.  Add the if statement around
the
> call to match() in the loop:
>
> function replace_all( sequence text, object a, object b )
>     integer ix, jx
>
>     if atom(a) then
>         a = {a}
>     end if
>
>     if atom(b) then
>         b = {b}
>     end if
>
>     ix = 0
>     jx = match( a, text )
>     while jx do
>         ix += jx
>         text = text[1..ix-1] & b & text[ix+length(a)..length(text)]
>         ix += length(b)
>
>         if ix < length(text) then
>             jx = match( a, text[ix+length(a)..length(text)] )
>         else
>             jx = 0
>         end if
>     end while
>
>     return text
> end function
>
>

new topic » goto parent » topic index » view message » categorize

3. RE: Replacing characters (Matt: bug)

Posted by Andy Serpa <renegade at earthling.net> Sep 16, 2002
456 views

Henri Goffin wrote:
>> global function replace_in_string(sequence s, object o, object n)
> 

The "so" sequence is not necessary.  You can compare the sequence 
directly to the atom "o":

	s += (s = o) * (n - o)

new topic » goto parent » topic index » view message » categorize

4. RE: Replacing characters (Matt: bug)

Posted by Andy Serpa <renegade at earthling.net> Sep 16, 2002
472 views

Andy Serpa wrote:
> 
> Henri Goffin wrote:
> >> global function replace_in_string(sequence s, object o, object n)
> > 
> 
> The "so" sequence is not necessary.  You can compare the sequence 
> directly to the atom "o":
> 
> 	s += (s = o) * (n - o)
> 
> 

Of course this, while devoid of cleverness, is at least 10x faster (and 
does swaps too):

function replace_in_string(sequence s, object o, object n)
integer ni

	if atom(o) and atom(n) then
		for i = 1 to length(s) do
			if s[i] = o then
				s[i] = n
			end if
		end for
		return s
	
	elsif sequence(o) and sequence(n) and length(o) = length(n) then

		for i = 1 to length(s) do
			ni = find(s[i],o)
			if ni then
				s[i] = n[ni]
			end if
		end for
		return s
	else
		return -1
	end if
	
end function

new topic » goto parent » topic index » view message » categorize

5. RE: Replacing characters (Matt: bug)

Posted by Andy Serpa <renegade at earthling.net> Sep 17, 2002
469 views

Dan Moyer wrote:
> Some tests (not exhaustive) of the 3 "replace characters" routines 
> recently
> shown here:
> 
> Matt's now doesn't fail on replace last character, but doesn't actually
> replace the last character, & is *very* much slower than either of the 
> other
> two
>  (one test: Henri: 3.9, Andy:  2.75 , Matt:  24.06) ;
> 
> Henri/Mike "vulcan" routine is faster than Matt's but won't replace one 
> char
> with 2;
> 
> Andy's is fastest. How much faster seems to vary.  On a "long" sequence 
> to
> peruse, it seemed twice as fast as Henri/Mike "vulcan", but on many
> repetitions of replacing in smaller sequence, it seems only 1.4 times 
> faster
> (see test results above).
> 
> I've attached the tests I ran in case I did something dumb in them.
> 
>

Now that I think about it, I don't know why I put "10x faster".  I did 
only one test with a short sequence and short replacements, and found 
mine about 4.5x faster that Henri's (with the "so" sequence removed -- 
maybe that actually speeds it up?).  My unposted "most elegant" version 
turned out to be slower than both, so I just admired it for a while then 
erased it...

new topic » goto parent » topic index » view message » categorize

6. RE: Replacing characters (Matt: bug)

Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Sep 17, 2002
455 views

> From: Dan Moyer [mailto:DANIELMOYER at prodigy.net]

> Matt's now doesn't fail on replace last character, but 
> doesn't actually
> replace the last character, & is *very* much slower than 
> either of the other
> two
>  (one test: Henri: 3.9, Andy:  2.75 , Matt:  24.06) ;

It seems to replace the last char when I test it.  I believe that the
slowness is due to the use of subscripting within the call to match().
There was some discussion about this recently.  I typically run fairly short
strings through this routine (maybe 2 or 3 sentence lengths) at a time, so
I've never needed any more speed.
 
> Henri/Mike "vulcan" routine is faster than Matt's but won't 
> replace one char
> with 2;
> 
> Andy's is fastest. How much faster seems to vary.  On a 
> "long" sequence to
> peruse, it seemed twice as fast as Henri/Mike "vulcan", but on many
> repetitions of replacing in smaller sequence, it seems only 
> 1.4 times faster
> (see test results above).

Actually, neither one will handle replacements of different lengths.  This
was a definite requirement for what I needed, where, for instance you might
want to replace "." with "..".  Also, neither routine will handle:

R/replace_in_string("abc cba", "ab", "12" ) 

Both return "12c c21", which is fine if you've got a cipher (did anyone use
this method in the contest? :), but not if you're trying to replace words
with other words, in which case you end up with garbage.  Henri talks about
this in the docs for the routine.  Of course, if the searched for object
isn't there, or there aren't many of them, my routine will speed up.  So if
you're doing it on a large file, you might try calling replace_all each line
or every few lines.  You'd probably have to test to see where the overhead
of the call cancelled out the quicker return.  And even then, it probably
would depend upon how often your match came up.

Here is a tweaked version that doesn't use a subscripted match.  It seems to
be about 15-20% faster:

global function replace_all2( sequence text, object a, object b )
  integer ix, jx, m, buf, lent, lena, lenb, dlen

  if atom(a) then
    a = {a}
  end if

  if atom(b) then
   b = {b}
  end if

  ix = match( a, text )
  if not ix then
    return text
  end if

  lena = length(a)
  lenb = length(b)
  dlen = lenb - lena
  lent = length(text)
  jx = lena + 1

  while jx > 1 do
    text = text[1..ix-1] & b & text[ix+lena..lent]
    ix += lenb
    jx = 1
    lent += dlen

    while ix <= lent and jx <= lena do
      if text[ix] = a[jx] then
        jx += 1
      else
        jx = 1
      end if
      ix += 1
    end while

    if ix > lent then
       ix -= lena
    end if
  end while

  return text
end function

Matt Lewis

new topic » goto parent » topic index » view message » categorize

7. RE: Replacing characters (Matt: bug)

Posted by Andy Serpa <renegade at earthling.net> Sep 17, 2002
456 views

Matthew Lewis wrote:
> > Actually, neither one will handle replacements of different lengths.  
> > This
> was a definite requirement for what I needed, where, for instance you 
> might
> want to replace "." with "..".  Also, neither routine will handle:
> 
> R/replace_in_string("abc cba", "ab", "12" ) 
> 
> Both return "12c c21", which is fine if you've got a cipher 

Well, that's what it is supposed to do -- replace character for 
character.  Personally, I don't think I'd ever use it -- I have a very 
fast substring replace routine that chooses from two different search 
methods based on the size of the subject & a class replace routine 
(which will replace any single character or group of characters in the 
"class" it is looking for) that pretty much take care of everything I 
need.  I was thinking of putting together a string library with some of 
this stuff.

Maybe I'll post the routines one at a time, and we can look for 
improvements to them and build up the library.  The string stuff in the 
archives has left me wanting...

new topic » goto parent » topic index » view message » categorize

8. RE: Replacing characters (Matt: bug)

Posted by Andy Serpa <renegade at earthling.net> Sep 17, 2002
453 views

Derek,

Yours fails if "a" is longer than "b"...

-- Andy


Derek Parnell wrote:
> Hi Matt,
> unfortunately the algorithm doesn't work.  Try this test ...
> 
>   x = replace_all2("derekderekderekderek", "rek", "1234567")
> 
> this returns "de1234567derek1234567ekde1234567"
> 
> rather than "de1234567de1234567de1234567de1234567"
> 
> 
> Anyhow, here is another method of doing it...
> 
> function replace_elem(sequence s, object a, object b)
>     sequence t
>     integer n,m
> 
>     if atom(a) then
>         a = {a}
>     end if
> 
>     if atom(b) then
>         b = {b}
>     end if
> 
>     -- Create a buffer big enough to cater for worst case replacement.
>     t = repeat(0, (floor(length(s) / length(a)) + 1) * length(b))
> 
>     m = 1
>     while 1 do
>         n = match(a, s)
>         if n then
>             t[m..m+n-2] = s[1..n-1]
>             m += (n-1)
>             t[m..m+length(b)-1] = b
>             m += (length(b))
>             s = s[n+length(a) .. length(s)]
>         else
>             t[m..m+length(s)-1] = s
>             m += (length(s))
>             exit
>         end if
>     end while
> 
>     return t[1..m-1]
> end function
> 
> 
> ----------------
> cheers,
> Derek Parnell
> ----- Original Message -----
> From: "Matthew Lewis" <matthewwalkerlewis at YAHOO.COM>
> To: "EUforum" <EUforum at topica.com>
> Sent: Tuesday, September 17, 2002 9:23 PM
> Subject: RE: Replacing characters (Matt: bug)
> 
> 
> > > From: Dan Moyer [mailto:DANIELMOYER at prodigy.net]
> >
> > > Matt's now doesn't fail on replace last character, but
> > > doesn't actually
> > > replace the last character, & is *very* much slower than
> > > either of the other
> > > two
> > >  (one test: Henri: 3.9, Andy:  2.75 , Matt:  24.06) ;
> >
> > It seems to replace the last char when I test it.  I believe that the
> > slowness is due to the use of subscripting within the call to match().
> > There was some discussion about this recently.  I typically run fairly
> short
> > strings through this routine (maybe 2 or 3 sentence lengths) at a time, 
> > so
> > I've never needed any more speed.
> >
> > > Henri/Mike "vulcan" routine is faster than Matt's but won't
> > > replace one char
> > > with 2;
> > >
> > > Andy's is fastest. How much faster seems to vary.  On a
> > > "long" sequence to
> > > peruse, it seemed twice as fast as Henri/Mike "vulcan", but on many
> > > repetitions of replacing in smaller sequence, it seems only
> > > 1.4 times faster
> > > (see test results above).
> >
> > Actually, neither one will handle replacements of different lengths.  
> > This
> > was a definite requirement for what I needed, where, for instance you
> might
> > want to replace "." with "..".  Also, neither routine will handle:
> >
> > R/replace_in_string("abc cba", "ab", "12" )
> >
> > Both return "12c c21", which is fine if you've got a cipher (did anyone
> use
> > this method in the contest? :), but not if you're trying to replace 
> > words
> > with other words, in which case you end up with garbage.  Henri talks
<snip>

new topic » goto parent » topic index » view message » categorize

9. RE: Replacing characters (Matt: bug)

Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Sep 17, 2002
470 views

> From: Derek Parnell [mailto:ddparnell at bigpond.com]

> 
> Hi Matt,
> unfortunately the algorithm doesn't work.  Try this test ...

Yes, the "if ix > lent " test shouldn't be there.

> Anyhow, here is another method of doing it...

Ah, here's the real reason for slowness.  It was using the same sequence,
and doing the slow replacing.  I always wondered what sort of speedup I
might be able to get, but was always too lazy to do it. :)

Matt Lewis

new topic » goto parent » topic index » view message » categorize

10. RE: Replacing characters (Matt: bug)

Posted by Andy Serpa <renegade at earthling.net> Sep 17, 2002
470 views

> > Actually, neither one will handle replacements of different lengths.
> 
> The test I ran (& zipped previously) *did* show Andy's replacing one 
> with
> two. That same instance of doubled parentheses mentioned above ("))") 
> was
> replaced by "ZZZZ" in my test.
> 
> 

The replace_in_string function?  It shouldn't have replaced two-for-one. 
 I don't even see how it is possible -- are you sure?

That function should really be called "tranlate" or something as it is 
not meant as a substring search-and-replace.

new topic » goto parent » topic index » view message » categorize

11. RE: Replacing characters (Matt: bug)

Posted by Andy Serpa <renegade at earthling.net> Sep 18, 2002
467 views

Dan Moyer wrote:
> Andy,
> 
> Well, on the one hand it's your routine, so you should know, but on the
> other hand, my test results show it does replace one with two.  

I should, but I'm not very sharp.

>Maybe
> there's something wrong with my test??  I'll send you another copy of 
> the
> zip of the test I applied (previously sent twice to the forum), but 
> here's
> the pertinent variables, & a copy of one line of the results (the fact 
> that
> it says "Replace_in_string" instead of "replace_in_string" is just to
> distinguish two different routines with the same name in the test, the 
> one
> with "R" is yours):
> 
>

Sorry -- I read on the web interface, so the attachments don't come 
through.

> <code snippet>
> integer tr  -- test repetitions
> sequence r, rb  -- replace, replace by
> text = {}
> new = {}
> times = {}
> 
> r = {')'}         -- replace
> rb = {"ZZ"}   --replace by
> 

There's the thing.  I was thinking that the arguments would always be 
one-dimensional strings, with the replacement always being a single 
element of that, i.e. one character.  You've got rb = {"ZZ"} instead of 
just rb = "ZZ" so in your test "ZZ" in a single element instead of just 
'Z'. Your search string (r) however uses single quotes and is the same 
as r = ")".

It is all clear now.

new topic » goto parent » topic index » view message » categorize

OpenEuphoria

1. RE: Replacing characters (Matt: bug)

2. RE: Replacing characters (Matt: bug)

3. RE: Replacing characters (Matt: bug)

4. RE: Replacing characters (Matt: bug)

5. RE: Replacing characters (Matt: bug)

6. RE: Replacing characters (Matt: bug)

7. RE: Replacing characters (Matt: bug)

8. RE: Replacing characters (Matt: bug)

9. RE: Replacing characters (Matt: bug)

10. RE: Replacing characters (Matt: bug)

11. RE: Replacing characters (Matt: bug)

Search

Include:

Quick Links

User menu

Misc Menu