1. faster deparse() for string library strtok.e

hi all,
i use strtok.e to do parsing of downloaded mail. But i found out the 
library was terribly slow. So i decided to add
 Chris Benslers find_all() which was 20% faster but that didn't do the 
trick. I also added Andy Serpas turbo
parse(). that too didn't do the trick. I decided to do time rofiling and 
found out what was chewing up most time
and memory. I found out it was deparse(). G.boehme made a programming 
error that made the library take
a lot of time doing nothing. Many won't notice this, cause they have 
fast computers. but i'm sure KAT will
appreciate my modification.

--****************************************************
--Test for deparse()
include get.e

--original derparse()
global function deparse_old(sequence list, integer c)
object t
sequence s
atom len
t= time()
len = length(list)
   if len then
      s = list[1]
      for i = 2 to len by 1 do
	 s =s&c&list[i]                     ---///area of error
         end for
printf(1,"%.32f deparse G.Boehme\n",time()-t)
      return s
   end if
   return ""
end function
------------------------
--my version
global function deparse_new(sequence list, integer c)
object t
sequence s
 t= time()
   if length(list) then
      s = list[1]
      for i = 2 to length(list) by 1 do
	 s =s&c
      s=s&list[i]
      end for
printf(1,"%.32f my version of deparse\n",time()-t)
      return s
   end if

   return ""
end function
-----------------------
--Andy Serpas Turbo version 3X faster
global function parse(sequence s, integer c)
integer slen, spt, flag
sequence parsed

	parsed = {}
	slen = length(s)

	spt = 1
	flag = 0
	for i = 1 to slen do
		if s[i] = c then
			if flag = 1 then
				parsed = append(parsed,s[spt..i-1])
				flag = 0
				spt = i+1
			else
				spt += 1
			end if
		else
			flag = 1
		end if
	end for
	if flag = 1 then
		parsed = append(parsed,s[spt..slen])
	end if
	return parsed
end function
--------------------------------
global procedure wait_abort(sequence msg)
    integer wait
    printf(1, "%s\n", {msg} )
    wait = wait_key()
    abort(1)
end procedure
--------------------------------
object data,buffer,test,test1,data1
buffer = {}
-------------------Create a large test sequence
data1={}
puts(1," Starting test:\n")
puts(1,"\nCreating test sequence\n")
data = repeat("a ",1000)  --create a large sequence
for i = 1 to length(data) do
data1 = data1 & data[1]     --
--                            }double the sequence
data1 = data1 & data[i]     --
end for
----------------
puts(1,"\nParsing test sequence\n")
buffer = parse(data1,32)
puts(1,"\nDeparsing (G.Boehme) test sequence\n")
test = deparse_old(buffer,32)
puts(1,"\nDeparsing (Jordah Ferguson) test sequence\n")
test1 = deparse_new(buffer,32)
-------------------
wait_abort(sprintf("Length of parsed sequence used for test: 
%d",length(buffer)))


jordah Ferguson
jorfergie03 at yahoo.com

--Processed and saved using: Cirrus Mail ver 0.2a

new topic     » topic index » view message » categorize

2. Re: faster deparse() for string library strtok.e

On Tue,  2 Apr 2002 13:42:43 +0000, jordah ferguson
<jorfergie03 at yahoo.com> wrote:

That's a huge, and exponential difference.
At 800, its about 100 times slower. At 4,000 it is over 1,000 times
slower.

Changing
>	 s =s&c&list[i]                     ---///area of error
to
>	 s =s&c
>      s=s&list[i]
should not cause such a huge difference, surely.
Given that the latter finishes in 0.007 seconds, the same number of
iterations for the former should definitely not take 11.6 seconds.
To process a sequence 5 times as long, the latter takes as expected
about 5 times as long; the former however about 47 times as long

Any thoughts Rob?

Pete

new topic     » goto parent     » topic index » view message » categorize

3. Re: faster deparse() for string library strtok.e

On Tue, 2 Apr 2002 12:47:17 -0500 , Matthew Lewis
<matthewwalkerlewis at YAHOO.COM> wrote:

Oh Matt, the perils & embarrassment of posting untested code...

Your routine when passed {"I","did","not","test","this"} returns 
"Ididnottestthis" rather than "I did not test this"

Anyway blink , works fine with:
>    ix += len
>  end for
>  return s[1..ix-1]
replaced by:
    ix += len+1
  end for
  return s[1..ix-2]

I suspect there's a nasty slice error waiting there though [0..-1]; it
still needs an if len/else return "" clause, I think.

Also, rather than pick slen=256 I would suggest guessing the average
length of a word & pre-allocating that; extending by length of this
word+av.length*remaining words left if necessary.

Pete

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu