1. faster deparse() for string library strtok.e
hi all,
i use strtok.e to do parsing of downloaded mail. But i found out the
library was terribly slow. So i decided to add
Chris Benslers find_all() which was 20% faster but that didn't do the
trick. I also added Andy Serpas turbo
parse(). that too didn't do the trick. I decided to do time rofiling and
found out what was chewing up most time
and memory. I found out it was deparse(). G.boehme made a programming
error that made the library take
a lot of time doing nothing. Many won't notice this, cause they have
fast computers. but i'm sure KAT will
appreciate my modification.
--****************************************************
--Test for deparse()
include get.e
--original derparse()
global function deparse_old(sequence list, integer c)
object t
sequence s
atom len
t= time()
len = length(list)
if len then
s = list[1]
for i = 2 to len by 1 do
s =s&c&list[i] ---///area of error
end for
printf(1,"%.32f deparse G.Boehme\n",time()-t)
return s
end if
return ""
end function
------------------------
--my version
global function deparse_new(sequence list, integer c)
object t
sequence s
t= time()
if length(list) then
s = list[1]
for i = 2 to length(list) by 1 do
s =s&c
s=s&list[i]
end for
printf(1,"%.32f my version of deparse\n",time()-t)
return s
end if
return ""
end function
-----------------------
--Andy Serpas Turbo version 3X faster
global function parse(sequence s, integer c)
integer slen, spt, flag
sequence parsed
parsed = {}
slen = length(s)
spt = 1
flag = 0
for i = 1 to slen do
if s[i] = c then
if flag = 1 then
parsed = append(parsed,s[spt..i-1])
flag = 0
spt = i+1
else
spt += 1
end if
else
flag = 1
end if
end for
if flag = 1 then
parsed = append(parsed,s[spt..slen])
end if
return parsed
end function
--------------------------------
global procedure wait_abort(sequence msg)
integer wait
printf(1, "%s\n", {msg} )
wait = wait_key()
abort(1)
end procedure
--------------------------------
object data,buffer,test,test1,data1
buffer = {}
-------------------Create a large test sequence
data1={}
puts(1," Starting test:\n")
puts(1,"\nCreating test sequence\n")
data = repeat("a ",1000) --create a large sequence
for i = 1 to length(data) do
data1 = data1 & data[1] --
-- }double the sequence
data1 = data1 & data[i] --
end for
----------------
puts(1,"\nParsing test sequence\n")
buffer = parse(data1,32)
puts(1,"\nDeparsing (G.Boehme) test sequence\n")
test = deparse_old(buffer,32)
puts(1,"\nDeparsing (Jordah Ferguson) test sequence\n")
test1 = deparse_new(buffer,32)
-------------------
wait_abort(sprintf("Length of parsed sequence used for test:
%d",length(buffer)))
jordah Ferguson
jorfergie03 at yahoo.com
--Processed and saved using: Cirrus Mail ver 0.2a
2. Re: faster deparse() for string library strtok.e
On Tue, 2 Apr 2002 13:42:43 +0000, jordah ferguson
<jorfergie03 at yahoo.com> wrote:
That's a huge, and exponential difference.
At 800, its about 100 times slower. At 4,000 it is over 1,000 times
slower.
Changing
> s =s&c&list[i] ---///area of error
to
> s =s&c
> s=s&list[i]
should not cause such a huge difference, surely.
Given that the latter finishes in 0.007 seconds, the same number of
iterations for the former should definitely not take 11.6 seconds.
To process a sequence 5 times as long, the latter takes as expected
about 5 times as long; the former however about 47 times as long
Any thoughts Rob?
Pete
3. Re: faster deparse() for string library strtok.e
On Tue, 2 Apr 2002 12:47:17 -0500 , Matthew Lewis
<matthewwalkerlewis at YAHOO.COM> wrote:
Oh Matt, the perils & embarrassment of posting untested code...
Your routine when passed {"I","did","not","test","this"} returns
"Ididnottestthis" rather than "I did not test this"
Anyway
, works fine with:
> ix += len
> end for
> return s[1..ix-1]
replaced by:
ix += len+1
end for
return s[1..ix-2]
I suspect there's a nasty slice error waiting there though [0..-1]; it
still needs an if len/else return "" clause, I think.
Also, rather than pick slen=256 I would suggest guessing the average
length of a word & pre-allocating that; extending by length of this
word+av.length*remaining words left if necessary.
Pete