1. Slow memory allocation
Suprisingly my old Python code was faster than Euphoria. I was reading a file of
40.000 lines like this:
a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
mark 120 B 405 425 404.83 425.86 71.0 51.0 false -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
into memory. The code is below. It took 2 minutes just reading the file into
memory during which the ntvdm.exe was running at full speed and memory allocation
rose slowly to around 55 megabytes. If I skipped the line: table =
append(table,line) this took 2 seconds.
include kparse.e
sequence line, table
integer file
object o_line
constant TRUE = 1
constant FALSE = 0
constant TAB = 9
file = open("big_file.txt", "r")
table = {}
while TRUE do
o_line = gets(file)
if atom(o_line) then
exit
end if
line = {}
line = Kparse(o_line, TAB)
table = append(table,line)
end while
2. Re: Slow memory allocation
Haflidi Asgrimsson wrote:
>
> Suprisingly my old Python code was faster than Euphoria. I was reading a file
> of 40.000
> lines like this:
> a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
> mark 120 B 405 425 404.83
> 425.86 71.0 51.0 false -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
> into memory. The code is below. It took 2 minutes just reading the file into
> memory
> during which the ntvdm.exe was running at full speed and memory allocation
> rose slowly
> to around 55 megabytes. If I skipped the line: table = append(table,line) this
> took
> 2 seconds.
>
If you know how big table will eventually be, you should intialize it like:
table = repeat( "", 40000 )
Even if you don't know how big it will be, you'll get better performance
if you grow it in chunks. Chunk size is, of course, up to you, but here's
an example:
include kparse.e
sequence line, table
integer file, table_size, table_index
object o_line
constant TRUE = 1
constant FALSE = 0
constant TAB = 9
constant TABLE_CHUNK = 1024
table_size = TABLE_CHUNK
table_index = 0
file = open("big_file.txt", "r")
table = repeat( 0, table_size )
while TRUE do
o_line = gets(file)
if atom(o_line) then
exit
end if
line = {}
line = Kparse(o_line, TAB)
table_index += 1
if table_index = table_size then
table_size += TABLE_CHUNK
table &= repeat( 0, TABLE_CHUNK )
end if
table[table_index] = line
end while
table = table[1..table_size]
You can play around with that and change the size of TABLE_CHUNK, or grow
TABLE_CHUNK dynamically, to allocate more memory each time. Basically,
whenever you create a sequence, Euphoria will allocate enough space for
it, plus a little extra whenever you start to grow it. Once you go beyond
that size, it allocates a new chunk, and moves the memory. So you want
to minimize the number of times this happens.
Matt Lewis
3. Re: Slow memory allocation
Haflidi Asgrimsson wrote:
>
> Suprisingly my old Python code was faster than Euphoria. I was reading a file
> of 40.000
> lines like this:
> a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
> mark 120 B 405 425 404.83
> 425.86 71.0 51.0 false -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
> into memory. The code is below. It took 2 minutes just reading the file into
> memory
> during which the ntvdm.exe was running at full speed and memory allocation
> rose slowly
> to around 55 megabytes. If I skipped the line: table = append(table,line) this
> took
> 2 seconds.
>
> include kparse.e
>
> sequence line, table
> integer file
> object o_line
> constant TRUE = 1
> constant FALSE = 0
> constant TAB = 9
>
> file = open("big_file.txt", "r")
> table = {}
> while TRUE do
> o_line = gets(file)
> if atom(o_line) then
> exit
> end if
> line = {}
> line = Kparse(o_line, TAB)
> table = append(table,line)
> end while
>
Hi there,
That's interesting, because i do the same thing with my personal editor
and it opens a file with 56,000 lines (4 Megabytes) in about a second.
How much physical RAM do you have installed in your computer?
Take care,
Al
And, good luck with your Euphoria programming!
My bumper sticker: "I brake for LED's"
4. Re: Slow memory allocation
- Posted by Mario Steele <eumario at trilake.net>
Jun 11, 2005
-
Last edited Jun 12, 2005
Al Getz wrote:
>
> Haflidi Asgrimsson wrote:
> >
> > Suprisingly my old Python code was faster than Euphoria. I was reading a
> > file of 40.000
> > lines like this:
> > a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
> > mark 120 B 405 425 404.83
> > 425.86 71.0 51.0 false -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
> > into memory. The code is below. It took 2 minutes just reading the file into
> > memory
> > during which the ntvdm.exe was running at full speed and memory allocation
> > rose slowly
> > to around 55 megabytes. If I skipped the line: table = append(table,line)
> > this took
> > 2 seconds.
> >
> > include kparse.e
> >
> > sequence line, table
> > integer file
> > object o_line
> > constant TRUE = 1
> > constant FALSE = 0
> > constant TAB = 9
> >
> > file = open("big_file.txt", "r")
> > table = {}
> > while TRUE do
> > o_line = gets(file)
> > if atom(o_line) then
> > exit
> > end if
> > line = {}
> > line = Kparse(o_line, TAB)
> > table = append(table,line)
> > end while
> >
>
> Hi there,
>
> That's interesting, because i do the same thing with my personal editor
> and it opens a file with 56,000 lines (4 Megabytes) in about a second.
>
> How much physical RAM do you have installed in your computer?
>
>
> Take care,
> Al
>
> And, good luck with your Euphoria programming!
>
> My bumper sticker: "I brake for LED's"
Actually, this isn't a problem with slow memory allocation. This is a problem
with append(). This function has always been known to be slow, amongst most of
the "old timers" here. It is much more suggestable that you use the &= oper sign
to concat data together. It has been proven to be much faster then append() on
many occasions.
Here's an Example:
sequence a
a = "This is part of a string"
a &= ", and this is the rest"
a = {a}
a &= { "This is a new string." }
-- a now looks like: {"This is part of a string, and this is the rest", "This is
a new string"}
Always remember though, when you want to concat together two sequences, and want
each to have it's own place, and not concated together into 1 sequence, use the {
} brackets around the data, even if it is a sequence, it shows the Euphoria
Interpreter, that you want to seperate the two sequences from each other.
Mario Steele
http://enchantedblade.trilake.net
Attaining World Dominiation, one byte at a time...
5. Re: Slow memory allocation
Mario Steele wrote:
>
> Al Getz wrote:
> >
> > Haflidi Asgrimsson wrote:
> > >
> > > Suprisingly my old Python code was faster than Euphoria. I was reading a
> > > file of 40.000
> > > lines like this:
> > > a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
> > > mark 120 B 405 425 404.83
> > > 425.86 71.0 51.0 false -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
> > > into memory. The code is below. It took 2 minutes just reading the file
> > > into memory
> > > during which the ntvdm.exe was running at full speed and memory allocation
> > > rose slowly
> > > to around 55 megabytes. If I skipped the line: table = append(table,line)
> > > this took
> > > 2 seconds.
> > >
> > > include kparse.e
> > >
> > > sequence line, table
> > > integer file
> > > object o_line
> > > constant TRUE = 1
> > > constant FALSE = 0
> > > constant TAB = 9
> > >
> > > file = open("big_file.txt", "r")
> > > table = {}
> > > while TRUE do
> > > o_line = gets(file)
> > > if atom(o_line) then
> > > exit
> > > end if
> > > line = {}
> > > line = Kparse(o_line, TAB)
> > > table = append(table,line)
> > > end while
> > >
> >
> > Hi there,
> >
> > That's interesting, because i do the same thing with my personal editor
> > and it opens a file with 56,000 lines (4 Megabytes) in about a second.
> >
> > How much physical RAM do you have installed in your computer?
> >
> >
> > Take care,
> > Al
> >
> > And, good luck with your Euphoria programming!
> >
> > My bumper sticker: "I brake for LED's"
>
> Actually, this isn't a problem with slow memory allocation. This is a problem
> with append().
> This function has always been known to be slow, amongst most of the "old
> timers" here.
> It is much more suggestable that you use the &= oper sign to concat data
> together.
> It has been proven to be much faster then append() on many occasions.
>
> Here's an Example:
>
> }}}
<eucode>
> sequence a
> a = "This is part of a string"
> a &= ", and this is the rest"
> a = {a}
> a &= { "This is a new string." }
> -- a now looks like: {"This is part of a string, and this is the rest", "This
> is a new string"}
> <font color="#330033"></eucode>
{{{
</font>
>
> Always remember though, when you want to concat together two sequences, and
> want each
> to have it's own place, and not concated together into 1 sequence, use the { }
> brackets
> around the data, even if it is a sequence, it shows the Euphoria Interpreter,
> that
> you want to seperate the two sequences from each other.
>
>
> Mario Steele
> <a
> href="http://enchantedblade.trilake.net">http://enchantedblade.trilake.net</a>
> Attaining World Dominiation, one byte at a time...
>
I tried this on two systems both running Windows XP Pro, the latter is 2GHz with
1.5 Gb memory, there it took 25 seconds. What I found most interesting is that
append(), &= }}}
<eucode>and</eucode>
{{{
assignment gave the same result, 25 seconds.
table[i] = line is taking around 24 seconds of those.
integer file, n_line
object o_line
constant TRUE = 1
constant FALSE = 0
constant TAB = 9
file = open("big_file.txt", "r")
n_line = 0
while TRUE do
n_line += 1
o_line = gets(file)
if atom(o_line) then
exit
end if
end while
if seek(file,0) then
puts(1,"Seek failed\n")
end if
table = repeat( {}, n_line )
for i = 1 to n_line do
o_line = gets(file)
if atom(o_line) then
exit
end if
line = {}
line = Kparse(o_line, TAB)
table[i] = line
end for
6. Re: Slow memory allocation
Haflidi Asgrimsson wrote:
>
> Mario Steele wrote:
> >
> > Al Getz wrote:
> > >
> > > Haflidi Asgrimsson wrote:
> > > >
> > > > Suprisingly my old Python code was faster than Euphoria. I was reading a
> > > > file of 40.000
> > > > lines like this:
> > > > a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
> > > > mark 120 B 405 425 404.83
> > > > 425.86 71.0 51.0 false -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
> > > > into memory. The code is below. It took 2 minutes just reading the file
> > > > into memory
> > > > during which the ntvdm.exe was running at full speed and memory
> > > > allocation rose slowly
> > > > to around 55 megabytes. If I skipped the line: table =
> > > > append(table,line) this took
> > > > 2 seconds.
> > > >
> > > > include kparse.e
> > > >
> > > > sequence line, table
> > > > integer file
> > > > object o_line
> > > > constant TRUE = 1
> > > > constant FALSE = 0
> > > > constant TAB = 9
> > > >
> > > > file = open("big_file.txt", "r")
> > > > table = {}
> > > > while TRUE do
> > > > o_line = gets(file)
> > > > if atom(o_line) then
> > > > exit
> > > > end if
> > > > line = {}
> > > > line = Kparse(o_line, TAB)
> > > > table = append(table,line)
> > > > end while
> > > >
> > >
> > > Hi there,
> > >
> > > That's interesting, because i do the same thing with my personal editor
> > > and it opens a file with 56,000 lines (4 Megabytes) in about a second.
> > >
> > > How much physical RAM do you have installed in your computer?
> > >
> > >
> > > Take care,
> > > Al
> > >
> > > And, good luck with your Euphoria programming!
> > >
> > > My bumper sticker: "I brake for LED's"
> >
> > Actually, this isn't a problem with slow memory allocation. This is a
> > problem with append().
> > This function has always been known to be slow, amongst most of the "old
> > timers" here.
> > It is much more suggestable that you use the &= oper sign to concat data
> > together.
> > It has been proven to be much faster then append() on many occasions.
> >
> > Here's an Example:
> >
> > }}}
<eucode>
> > sequence a
> > a = "This is part of a string"
> > a &= ", and this is the rest"
> > a = {a}
> > a &= { "This is a new string." }
> > -- a now looks like: {"This is part of a string, and this is the rest",
> > "This is a new string"}
> <font color="#330033">> <font color=</font><font
> color="#00A033">"#330033"</font><font color="#330033">></eucode>
{{{
</font></font>
> >
> > Always remember though, when you want to concat together two sequences, and
> > want each
> > to have it's own place, and not concated together into 1 sequence, use the {
> > } brackets
> > around the data, even if it is a sequence, it shows the Euphoria
> > Interpreter, that
> > you want to seperate the two sequences from each other.
> >
> >
> > Mario Steele
> > <a
> > href="http://enchantedblade.trilake.net">http://enchantedblade.trilake.net</a>
> > Attaining World Dominiation, one byte at a time...
> >
> <font color="#330033">I tried this on two systems both running Windows XP Pro,
> the latter is 2GHz </font><font color="#0000FF">with </font><font
> color="#330033">1.5 Gb memory, there it took 25 seconds. What I found most
> interesting is that </font><font color="#FF00FF">append</font><font
> color="#330033">(),
> &= }}}
<eucode></font><font color="#0000FF">and</font><font
> color="#330033"></eucode>
{{{
assignment gave the same result, 25 seconds.</font>
> table[i] = line is taking around 24 seconds of those.
> }}}
<eucode>
> integer file, n_line
> object o_line
> constant TRUE = 1
> constant FALSE = 0
> constant TAB = 9
>
> file = open("big_file.txt", "r")
> n_line = 0
> while TRUE do
> n_line += 1
> o_line = gets(file)
> if atom(o_line) then
> exit
> end if
> end while
> if seek(file,0) then
> puts(1,"Seek failed\n")
> end if
> table = repeat( {}, n_line )
> for i = 1 to n_line do
> o_line = gets(file)
> if atom(o_line) then
> exit
> end if
> line = {}
> line = Kparse(o_line, TAB)
> table[i] = line
> end for
> <font color="#330033"></eucode>
{{{
</font>
>
Hello again,
My only question now is what version of Euphoria are you using?
Im asking these questions because i do this:
object line
sequence buff
atom fn
fn=open("c:\\myfile.txt","r")
while 1 do
line=gets(fn)
if atom(line) then
exit
end if
buff=append(buff,line)
end while
The above code opens a 2,839,910 bytes text file in about 1 second
using Euphoria v2.4 with a bindw'd exe.
Did you try loading the file completely first, then parsing after?
If so, does it speed it up any?
I know you have tried leaving out the 'append' line, but what
happens when you leave out the parse line without leaving out
the append line?
Take care,
Al
And, good luck with your Euphoria programming!
My bumper sticker: "I brake for LED's"
7. Re: Slow memory allocation
Hello again,
Im not sure if i replied to the correct post, so here's the post i meant
to reply to:
>I tried this on two systems both running Windows XP Pro,
> the latter is 2GHz with 1.5 Gb memory, there it took 25 seconds.
> What I found most interesting is that append(),
>&= and assignment gave the same result, 25 seconds.
>table[i] = line is taking around 24 seconds of those.
Your system is faster than mine and has more memory so i would have
expected it to read the file faster than on mine, and mine reads
a 2,900,000+ byte file in about a second.
I tried a non-bindw'd .exw file and it was the same, and i tried
the .exw file with Version 2.5 of Euphoria (PD Beta version) and
it was the same, about one second to read the whole file into
the sequence.
What i would do is try that exact code fragment (in the previous post)
without the parse line and see if it works faster. If not, i would
wonder if you have any active virus software or the page file was
moved by a non-Windows disk manager.
If it does in fact speed up, then perhaps you should do your parsing
AFTER the whole file is read into the sequence.
I've been using Euphoria for several years now and i've had my
editor up and running for most of them, and it's always been fast
even though im using 'append' to store the lines in the sequence.
Pete Lomax recently started a new editor which uses basically the
same technique and that is about the same speed (fast). This makes
me think something else is wrong.
Take care,
Al
And, good luck with your Euphoria programming!
My bumper sticker: "I brake for LED's"
8. Re: Slow memory allocation
Al Getz wrote:
>
> Hello again,
>
> Im not sure if i replied to the correct post, so here's the post i meant
> to reply to:
>
> >I tried this on two systems both running Windows XP Pro,
> > the latter is 2GHz with 1.5 Gb memory, there it took 25 seconds.
> > What I found most interesting is that append(),
> >&= and assignment gave the same result, 25 seconds.
> >table[i] = line is taking around 24 seconds of those.
>
> Your system is faster than mine and has more memory so i would have
> expected it to read the file faster than on mine, and mine reads
> a 2,900,000+ byte file in about a second.
> I tried a non-bindw'd .exw file and it was the same, and i tried
> the .exw file with Version 2.5 of Euphoria (PD Beta version) and
> it was the same, about one second to read the whole file into
> the sequence.
> What i would do is try that exact code fragment (in the previous post)
> without the parse line and see if it works faster. If not, i would
> wonder if you have any active virus software or the page file was
> moved by a non-Windows disk manager.
> If it does in fact speed up, then perhaps you should do your parsing
> AFTER the whole file is read into the sequence.
>
> I've been using Euphoria for several years now and i've had my
> editor up and running for most of them, and it's always been fast
> even though im using 'append' to store the lines in the sequence.
> Pete Lomax recently started a new editor which uses basically the
> same technique and that is about the same speed (fast). This makes
> me think something else is wrong.
>
>
> Take care,
> Al
>
> And, good luck with your Euphoria programming!
>
> My bumper sticker: "I brake for LED's"
>
Us getting different results led to me trying you code, ran in less than
a second. The culprit line is:
line = Kparse(o_line, TAB)
Although it only slows the code with append(), &= or assignment following.
9. Re: Slow memory allocation
Haflidi Asgrimsson wrote:
>
> Us getting different results led to me trying you code, ran in less than
> a second. The culprit line is:
> line = Kparse(o_line, TAB)
> Although it only slows the code with append(), &= or assignment following.
>
That would have been my next question. What does Kparse() do? Could you
tell us what the results are when you:
1) Comment out the Kparse() call
2) Comment out the assignment to table
3) Comment out the Kparse() call and the assignment (just read and ignore)
Matt Lewis
10. Re: Slow memory allocation
Matt Lewis wrote:
>
> Haflidi Asgrimsson wrote:
> >
> > Us getting different results led to me trying you code, ran in less than
> > a second. The culprit line is:
> > line = Kparse(o_line, TAB)
> > Although it only slows the code with append(), &= or assignment following.
> >
>
> That would have been my next question. What does Kparse() do? Could you
> tell us what the results are when you:
>
> 1) Comment out the Kparse() call
> 2) Comment out the assignment to table
> 3) Comment out the Kparse() call and the assignment (just read and ignore)
>
> Matt Lewis
>
Kparse is from kparse.e
--KPARSE.E (parse with keep)
--(c) 05/01/104 Michael J Raley (thinkways at yahoo.com)
--Turns a string of delimited text into a list of items,
--while retaining the position of empty elements.
The function is below
So string like "1\t2\t\t3" is converted into list: {"1","2","","3"}
In all three cases I get nearly instant response, only when both lines:
line = Kparse(o_line, TAB)
table = append(table, line)
then the CPU runs for 25 seconds.
This must be some kind of late typechecking because if TAB is replaced by
a sequence. Then this takes only about 2 seconds.
line = Kparse(o_line, "9")
--------------------------------------------------------
global function Kparse(object s, object o)
sequence clipbook, parsed_list
atom ls, lc, clip
if atom(s) then return s end if
clipbook = {}
parsed_list = {}
ls = length(s)
--convert atom delimiter into a 1n sequence
if atom(o) then o = {o} end if
--bookmark the position of all delimiters in 1 pass
for a = 1 to ls do
if match({s[a]},o) then clipbook &= a
end if
end for
lc = length(clipbook)
if lc = 0 then return {} end if
-- find the text between the recorded delimeter positions to create a list.
-- First check to see if the first bookmarked delimiter starts sequence s
clip = clipbook[1]
if clip = 1 then -- Yes. Create the first element empty
parsed_list = {{}}
else
parsed_list = {s[1..clip-1]} -- No. Build the first element from s up to our
bookmark
end if
-- now we can process the rest of the sequence
for ic = 2 to lc do
if clip+1 = clipbook[ic] then
parsed_list = append(parsed_list,{})
else
parsed_list = append(parsed_list,s[clip+1..clipbook[ic]-1])
end if
clip = clipbook[ic]
end for
--test if end of s is past last delimeter
if ls > clipbook[lc] then
parsed_list = append(parsed_list,s[clipbook[lc]+1..ls])
end if
return parsed_list
end function
11. Re: Slow memory allocation
Haflidi Asgrimsson wrote:
>
> Matt Lewis wrote:
> >
> > Haflidi Asgrimsson wrote:
> > >
> > > Us getting different results led to me trying you code, ran in less than
> > > a second. The culprit line is:
> > > line = Kparse(o_line, TAB)
> > > Although it only slows the code with append(), &= or assignment following.
> > >
> >
> > That would have been my next question. What does Kparse() do? Could you
> > tell us what the results are when you:
> >
> > 1) Comment out the Kparse() call
> > 2) Comment out the assignment to table
> > 3) Comment out the Kparse() call and the assignment (just read and ignore)
> >
> > Matt Lewis
> >
> Kparse is from kparse.e
> --KPARSE.E (parse with keep)
> --(c) 05/01/104 Michael J Raley (thinkways at yahoo.com)
> --Turns a string of delimited text into a list of items,
> --while retaining the position of empty elements.
> The function is below
> So string like "1\t2\t\t3" is converted into list: {"1","2","","3"}
>
> In all three cases I get nearly instant response, only when both lines:
> line = Kparse(o_line, TAB)
> table = append(table, line)
> then the CPU runs for 25 seconds.
> This must be some kind of late typechecking because if TAB is replaced by
> a sequence. Then this takes only about 2 seconds.
> line = Kparse(o_line, "9")
>
>
> }}}
<eucode>
> --------------------------------------------------------
> global function Kparse(object s, object o)
> sequence clipbook, parsed_list
> atom ls, lc, clip
>
> if atom(s) then return s end if
>
> clipbook = {}
> parsed_list = {}
> ls = length(s)
>
> --convert atom delimiter into a 1n sequence
> if atom(o) then o = {o} end if
>
> --bookmark the position of all delimiters in 1 pass
> for a = 1 to ls do
> if match({s[a]},o) then clipbook &= a
> end if
>
> end for
>
> lc = length(clipbook)
> if lc = 0 then return {} end if
>
> -- find the text between the recorded delimeter positions to create a list.
> -- First check to see if the first bookmarked delimiter starts sequence s
>
> clip = clipbook[1]
> if clip = 1 then -- Yes. Create the first element empty
> parsed_list = {{}}
> else
> parsed_list = {s[1..clip-1]} -- No. Build the first element from s up to
> our bookmark
> end if
>
> -- now we can process the rest of the sequence
> for ic = 2 to lc do
> if clip+1 = clipbook[ic] then
> parsed_list = append(parsed_list,{})
> else
> parsed_list = append(parsed_list,s[clip+1..clipbook[ic]-1])
> end if
> clip = clipbook[ic]
> end for
> --test if end of s is past last delimeter
> if ls > clipbook[lc] then
> parsed_list = append(parsed_list,s[clipbook[lc]+1..ls])
> end if
>
> return parsed_list
> end function
> <font color="#330033"></eucode>
{{{
</font>
>
Look at all the appends here with sequence subscripting & slicing operations.
With "append()" and "&=" short-hand operator, sequences are dynamically growning
(allocating more memory) whenever more data is pushed into the sequences. With
repeat() you can specify exactly how big you want the sequence to be (without
dynamic sequence growning & memory allocation), and push data into the already
allocated sequence elements, using a loop. Dynamic allocation is very useful in
Euphoria, but maybe not in this case where performance is the biggest issue. See
if you can modify this code to use some repeats. That could help the kparse
routine perform quicker and more efficently.
Regards,
Vincent
--
Without walls and fences, there is no need for Windows and Gates.
12. Re: Slow memory allocation
Haflidi Asgrimsson wrote:
>
> I tried this on two systems both running Windows XP Pro, the
> latter is 2GHz with 1.5 Gb memory, there it took 25 seconds. What
> I found most interesting is that append(), &= and assignment gave
> the same result, 25 seconds
> table[i] = line is taking around 24 seconds of those.
This seems really odd. I just made a text file that's 64,886 lines of
10 random numbers with 11 characters, tab delimited. I can't make the
time go up much beyond 1.5 seconds, and this is on a 2.4GHz Celeron with
512MB RAM WinXP Home. Total memory usage is about 55Megs, which is seems
correct to me (the file's about 7Megs).
Is there something odd about the file? For one thing, you're adding extra
ref's and deref's to sequences by using the line variable. Cut that out
and see if that makes any difference (didn't on my machine). Also, I'd
advise using a sequence passed to kparse, since it just makes an integer
into a sequence, so you're wasting some cycles right there.
Can you post the source file (or one just like it, but with the data
changed, if that's an issue)? Maybe there's something strange about
the format of the data that's causing issues. Anyway, here's my code
that runs in about 1.5s on my machine. Replace "bigrand.txt" with your
file, and let me know what happens. If you want, I can email you my
file (it's about 3MB zipped).
include get.e
global function kparse(object s, object o)
sequence clipbook, parsed_list
atom ls, lc, clip
if atom(s) then return s end if
clipbook = {}
parsed_list = {}
ls = length(s)
--convert atom delimiter into a 1n sequence
if atom(o) then o = {o} end if
--bookmark the position of all delimiters in 1 pass
for a = 1 to ls do
if match({s[a]},o) then clipbook &= a
end if
end for
lc = length(clipbook)
if lc = 0 then return {} end if
-- find the text between the recorded delimeter positions to create a list.
-- First check to see if the first bookmarked delimiter starts sequence s
clip = clipbook[1]
if clip = 1 then -- Yes. Create the first element empty
parsed_list = {{}}
else
parsed_list = {s[1..clip-1]} -- No. Build the first element from s up to our
bookmark
end if
-- now we can process the rest of the sequence
for ic = 2 to lc do
if clip+1 = clipbook[ic] then
parsed_list = append(parsed_list,{})
else
parsed_list = append(parsed_list,s[clip+1..clipbook[ic]-1])
end if
clip = clipbook[ic]
end for
--test if end of s is past last delimeter
if ls > clipbook[lc] then
parsed_list = append(parsed_list,s[clipbook[lc]+1..ls])
end if
return parsed_list
end function
procedure main()
atom t
integer fn
object in
sequence table
fn = open( "bigrand.txt", "r" )
table = {}
t = time()
in = gets( fn )
while sequence(in) do
table = append( table, kparse( in, "\t" ) )
in = gets( fn )
end while
printf( 1, "%gsec\n", time() - t)
if wait_key() then
end if
end procedure
main()
13. Re: Slow memory allocation
- Posted by Al Getz <Xaxo at aol.com>
Jun 12, 2005
-
Last edited Jun 13, 2005
Haflidi Asgrimsson wrote:
>
> Matt Lewis wrote:
> >
> > Haflidi Asgrimsson wrote:
> > >
> > > Us getting different results led to me trying you code, ran in less than
> > > a second. The culprit line is:
> > > line = Kparse(o_line, TAB)
> > > Although it only slows the code with append(), &= or assignment following.
> > >
> >
> > That would have been my next question. What does Kparse() do? Could you
> > tell us what the results are when you:
> >
> > 1) Comment out the Kparse() call
> > 2) Comment out the assignment to table
> > 3) Comment out the Kparse() call and the assignment (just read and ignore)
> >
> > Matt Lewis
> >
> Kparse is from kparse.e
> --KPARSE.E (parse with keep)
> --(c) 05/01/104 Michael J Raley (thinkways at yahoo.com)
> --Turns a string of delimited text into a list of items,
> --while retaining the position of empty elements.
> The function is below
> So string like "1\t2\t\t3" is converted into list: {"1","2","","3"}
>
> In all three cases I get nearly instant response, only when both lines:
> line = Kparse(o_line, TAB)
> table = append(table, line)
> then the CPU runs for 25 seconds.
> This must be some kind of late typechecking because if TAB is replaced by
> a sequence. Then this takes only about 2 seconds.
> line = Kparse(o_line, "9")
>
>
> }}}
<eucode>
> --------------------------------------------------------
> global function Kparse(object s, object o)
> sequence clipbook, parsed_list
> atom ls, lc, clip
>
> if atom(s) then return s end if
>
> clipbook = {}
> parsed_list = {}
> ls = length(s)
>
> --convert atom delimiter into a 1n sequence
> if atom(o) then o = {o} end if
>
> --bookmark the position of all delimiters in 1 pass
> for a = 1 to ls do
> if match({s[a]},o) then clipbook &= a
> end if
>
> end for
>
> lc = length(clipbook)
> if lc = 0 then return {} end if
>
> -- find the text between the recorded delimeter positions to create a list.
> -- First check to see if the first bookmarked delimiter starts sequence s
>
> clip = clipbook[1]
> if clip = 1 then -- Yes. Create the first element empty
> parsed_list = {{}}
> else
> parsed_list = {s[1..clip-1]} -- No. Build the first element from s up to
> our bookmark
> end if
>
> -- now we can process the rest of the sequence
> for ic = 2 to lc do
> if clip+1 = clipbook[ic] then
> parsed_list = append(parsed_list,{})
> else
> parsed_list = append(parsed_list,s[clip+1..clipbook[ic]-1])
> end if
> clip = clipbook[ic]
> end for
> --test if end of s is past last delimeter
> if ls > clipbook[lc] then
> parsed_list = append(parsed_list,s[clipbook[lc]+1..ls])
> end if
>
> return parsed_list
> end function
> <font color="#330033"></eucode>
{{{
</font>
>
Hi again,
It's usually better to load the entire file first and then parse
later.
Take care,
Al
And, good luck with your Euphoria programming!
My bumper sticker: "I brake for LED's"
14. Re: Slow memory allocation
- Posted by Haflidi Asgrimsson <haflidi at prokaria.com>
Jun 12, 2005
-
Last edited Jun 13, 2005
As with all other interpreters the more information you give the more
efficiently it runs the code.
I think I've learned a valuable lesson here:
constant TAB = 9 is bad
constant TAB = '9' is OK
constant TAB = "9" is OK
15. Re: Slow memory allocation
- Posted by Matt Lewis <matthewwalkerlewis at gmail.com>
Jun 12, 2005
-
Last edited Jun 13, 2005
Haflidi Asgrimsson wrote:
>
> As with all other interpreters the more information you give the more
> efficiently it runs the code.
> I think I've learned a valuable lesson here:
> constant TAB = 9 is bad
> constant TAB = '9' is OK
> constant TAB = "9" is OK
>
This is misleading. '9' != '\t' and "9" != "\t". The reason it's much
faster is that you're getting many fewer delimiters (only when the character
9 is encountered).
Do you have other things running? How much *free* memory do you have? Maybe
your memory is swapping out or something? There's something else going on
here, because we're all getting very different results on relatively similar
hardware.
Matt Lewis
16. Re: Slow memory allocation
>From: Haflidi Asgrimsson <guest at RapidEuphoria.com>
>Reply-To: EUforum at topica.com
>To: EUforum at topica.com
>Subject: Re: Slow memory allocation
>Date: Sun, 12 Jun 2005 14:17:17 -0700
>
>posted by: Haflidi Asgrimsson <haflidi at prokaria.com>
>
>As with all other interpreters the more information you give the more
>efficiently it runs the code.
>I think I've learned a valuable lesson here:
>constant TAB = 9 is bad
>constant TAB = '9' is OK
>constant TAB = "9" is OK
>
Those last two are not correct. They just mean the number 9, not a TAB
character. If you're splitting by 9's and not TAB's, then maybe you aren't
splitting anything at all, thereby making it seem faster. The constants
should be like this:
constant TAB = '\t'
or
constant TAB = "\t"
~[ WingZone ]~
http://wingzone.tripod.com/
17. Re: Slow memory allocation
- Posted by Haflidi Asgrimsson <haflidi at prokaria.com>
Jun 12, 2005
-
Last edited Jun 13, 2005
Except Kparses stops parsing and returns empty list.
So I wrote my own function and it parses my file in 3 seconds:
function mySplit(string s_input, sequence s_char)
sequence l_return, s_return
integer n_start, n_stop
atom a_char
if equal(s_char, "#") then
a_char = '!'
else
a_char = '#'
end if
l_return = {}
n_start = 1
while TRUE do
n_stop = match(s_char, s_input)
if n_stop then
s_input[n_stop] = a_char
s_return = s_input[n_start..n_stop-1]
l_return = append(l_return, s_return)
n_start = n_stop+1
else
l_return = append(l_return, s_input[n_start..$])
exit
end if
end while
return l_return
end function
Tank you all for trying to help
18. Re: Slow memory allocation
- Posted by Haflidi Asgrimsson <haflidi at prokaria.com>
Jun 12, 2005
-
Last edited Jun 13, 2005
I made the bigfile in Excel filling down the following line, 40000 lines
a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
mark 120 B 405 425 404.83 425.86 71.0 51.0 FALSE -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
The "\t" did not change anything but of corse is this right. But 9 works too.
I wrote my own function that works so I blame allt this on Kparse function.
Thank you
19. Re: Slow memory allocation
On Sun, 12 Jun 2005 15:59:17 -0700, Haflidi Asgrimsson
<guest at RapidEuphoria.com> wrote:
>I made the bigfile in Excel filling down the following line, 40000 lines
While I don't want to re-open that can of worms, I am reminded of a
previous thread:
http://www.listfilter.com/cgi-bin/esearch.exu?thread=1&fromMonth=A&fromYear=9&toMonth=C&toYear=9&keywords=%22Dramatic+slowdown+-ping+Rob%22
>a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
>mark 120 B 405 425 404.83 425.86 71.0 51.0 FALSE -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0
..and in this unusual test case you are apparently allocating 40,000
strings of length 43. The program might well have run fine on real
data, it could just have been that particular test set.
>
>I wrote my own function that works so I blame all this on Kparse function.
>
Perhaps not the kindest words ever written, but I'm glad you resolved
it. I imagine it is a programmers lot to occasionally run into such
problems, and they probably occur whatever language we code in.
Regards,
Pete
20. Re: Slow memory allocation
Pete Lomax wrote:
>
> On Sun, 12 Jun 2005 15:59:17 -0700, Haflidi Asgrimsson
> <guest at RapidEuphoria.com> wrote:
>
> >I made the bigfile in Excel filling down the following line, 40000 lines
> While I don't want to re-open that can of worms, I am reminded of a
> previous thread:
> <a
> href="http://www.listfilter.com/cgi-bin/esearch.exu?thread=1&fromMonth=A&fromYear=9&toMonth=C&toYear=9&keywords=%22Dramatic+slowdown+-ping+Rob%22">http://www.listfilter.com/cgi-bin/esearch.exu?thread=1&fromMonth=A&fromYear=9&toMonth=C&toYear=9&keywords=%22Dramatic+slowdown+-ping+Rob%22</a>
>
>
> >a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04 Clown Fish 17
> >mark 120 B 405 425 404.83 425.86 71.0
> 51.0 FALSE -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0</font></i>
>
> ..and in this unusual test case you are apparently allocating 40,000
> strings of length 43. The program might well have run fine on real
> data, it could just have been that particular test set.
> >
> >I wrote my own function that works so I blame all this on Kparse function.
> >
> Perhaps not the kindest words ever written, but I'm glad you resolved
> it. I imagine it is a programmers lot to occasionally run into such
> problems, and they probably occur whatever language we code in.
>
> Regards,
> Pete
>
>
Sorry, It was meant as a joke at my expense.
When one is stuck in ones own code the last resort is to blame someone else’s.
Actually my solution wasn't good enough so I'm using the Kparse function not
reading the whole file into memory, just one line at a time and it works fine.
I'm lacking insight into the Euphoria interpreter so I found this case
interesting.
And I got a lot of hints form you all, thank you!
And I repeat, I'm very sorry if I sounded rude!