Re: Looking for the best "Parse"...
- Posted by Derek Parnell <ddparnell at bigpond.com> Oct 26, 2001
- 389 views
----- Original Message ----- From: <euman at bellsouth.net> To: "EUforum" <EUforum at topica.com> Subject: Re: Looking for the best "Parse"... > I dont think these multi-run test are any good...... > > What Im saying is that often times the second pass will be 5 times faster than the first. > > Anyone want to comment on this? I have noticed this also. It has to do with garbage collection processes. In order to compensate for this phenomena, I have styled the tests using fake calls to routines which seem to trick the garbage collection. To demonstrate, I swapped the order of the routines around and still get the same sort of results. However, I have been reminded never to write code while having breakfast, and while getting the kids ready for club tennis competitions - there is a major mistake in the routine I sent out. It doesn't return any text after the last match! Ooops. Below is the corrected routine. -------------- -- Assumption: All sequences contain ASCII text. -- That is, each element is an integer from 0 to 255. global function dparse(sequence text, sequence esc, sequence replace) sequence t,ltext integer pos, -- posn of esc in text epos, -- end of text in t spos, -- pos in t to copy to ppos -- previous 'pos' value -- Do a case-insensitive compare esc = lower(esc) ltext = lower(text) -- Find the first match pos = match(esc, ltext) -- If no matches, return the original text if pos = 0 or length(esc) = 0 then return text end if -- Create a sequence long enough to hold the result ppos = length(text) + floor(1 + (length(replace)*length(text)/length(esc))) t = repeat(-1, ppos) -- Initialise the position holders epos = 0 spos = 0 ppos = 0 -- Keep making replacements until no more matches while pos do -- Copy all the text up to the current match -- Note special case ('cos Eu doesn't like slices starting with 0) if pos != 1 then epos = spos + pos - ppos - 1 t[spos .. epos] = ltext[ppos .. pos-1] end if -- Hide the current match by changing its value to a non-text value, -- this way the next iteration won't find it again. ltext[pos] = -1 -- Skip over the escape text ppos = pos + length(esc) -- Copy the replacement text into the result spos = epos + 1 epos = epos + length(replace) t[spos .. epos] = replace -- Skip over the replacement text in the result area spos = epos + 1 -- Look for the next match pos = match(esc, ltext) end while -- Copy any left over text if ppos <= length(ltext) then pos = length(ltext) epos = spos + pos - ppos t[spos .. epos] = ltext[ppos .. pos] end if -- Return only the copied text, as there could be some residual -- junk in the result area. return t[1..epos] end function -------------- Some more observations, if I may. The recursive solution presented so far have a problem when the replacement text contains the escape text. They go into a never ending loop. They choke on this sort of thing.. res = parse("one$Atwo", "$A", "three$Afour") Secondly, these sort of routines would go a lot faster and be coded a lot more simply if RDS provided a variation to match() and find() that allowed one to specify the starting offset. Such as ... pos = match_from(",", "abc,def,ghi,klm", 5) would start at the fifth element and return 8. Finally, it would be nice if RDS could treat zero-length slices as a NOP, regardless of what the actual start and end pos values are. --- Derek.