1. Slow memory allocation

Suprisingly my old Python code was faster than Euphoria. I was reading a file of
40.000 lines like this:
a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
mark	120	B	405	425	404.83	425.86	71.0	51.0	false	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0
into memory. The code is below. It took 2 minutes just reading the file into
memory during which the ntvdm.exe was running at full speed and memory allocation
rose slowly to around 55 megabytes. If I skipped the line: table =
append(table,line) this took 2 seconds.

include kparse.e

sequence line, table
integer file
object o_line
constant TRUE = 1
constant FALSE = 0
constant TAB = 9

file = open("big_file.txt", "r")
table = {}
	while TRUE do
	o_line = gets(file)
	if atom(o_line) then
		exit
	end if
	line = {}
	line = Kparse(o_line, TAB)
	table = append(table,line)
end while

new topic     » topic index » view message » categorize

2. Re: Slow memory allocation

Haflidi Asgrimsson wrote:
> 
> Suprisingly my old Python code was faster than Euphoria. I was reading a file
> of 40.000
> lines like this:
> a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
> mark	120	B	405	425	404.83
> 425.86	71.0	51.0	false	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0
> into memory. The code is below. It took 2 minutes just reading the file into
> memory
> during which the ntvdm.exe was running at full speed and memory allocation
> rose slowly
> to around 55 megabytes. If I skipped the line: table = append(table,line) this
> took
> 2 seconds.
> 

If you know how big table will eventually be, you should intialize it like:
table = repeat( "", 40000 )

Even if you don't know how big it will be, you'll get better performance
if you grow it in chunks.  Chunk size is, of course, up to you, but here's
an example:
include kparse.e

sequence line, table
integer file, table_size, table_index
object o_line
constant TRUE = 1
constant FALSE = 0
constant TAB = 9
constant TABLE_CHUNK = 1024
table_size = TABLE_CHUNK
table_index = 0
file = open("big_file.txt", "r")
table = repeat( 0, table_size )
	while TRUE do
	o_line = gets(file)
	if atom(o_line) then
		exit
	end if
	line = {}
	line = Kparse(o_line, TAB)
	table_index += 1
	if table_index = table_size then
		table_size += TABLE_CHUNK
		table &= repeat( 0, TABLE_CHUNK )
	end if
	table[table_index] = line
end while
table = table[1..table_size]

You can play around with that and change the size of TABLE_CHUNK, or grow
TABLE_CHUNK dynamically, to allocate more memory each time.  Basically,
whenever you create a sequence, Euphoria will allocate enough space for
it, plus a little extra whenever you start to grow it.  Once you go beyond
that size, it allocates a new chunk, and moves the memory.  So you want
to minimize the number of times this happens.

Matt Lewis

new topic     » goto parent     » topic index » view message » categorize

3. Re: Slow memory allocation

Haflidi Asgrimsson wrote:
> 
> Suprisingly my old Python code was faster than Euphoria. I was reading a file
> of 40.000
> lines like this:
> a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
> mark	120	B	405	425	404.83
> 425.86	71.0	51.0	false	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0
> into memory. The code is below. It took 2 minutes just reading the file into
> memory
> during which the ntvdm.exe was running at full speed and memory allocation
> rose slowly
> to around 55 megabytes. If I skipped the line: table = append(table,line) this
> took
> 2 seconds.
> 
> include kparse.e
> 
> sequence line, table
> integer file
> object o_line
> constant TRUE = 1
> constant FALSE = 0
> constant TAB = 9
> 
> file = open("big_file.txt", "r")
> table = {}
> 	while TRUE do
> 	o_line = gets(file)
> 	if atom(o_line) then
> 		exit
> 	end if
> 	line = {}
> 	line = Kparse(o_line, TAB)
> 	table = append(table,line)
> end while
> 

Hi there,

That's interesting, because i do the same thing with my personal editor
and it opens a file with 56,000 lines (4 Megabytes) in about a second.

How much physical RAM do you have installed in your computer?


Take care,
Al

And, good luck with your Euphoria programming!

My bumper sticker: "I brake for LED's"

new topic     » goto parent     » topic index » view message » categorize

4. Re: Slow memory allocation

Al Getz wrote:
> 
> Haflidi Asgrimsson wrote:
> > 
> > Suprisingly my old Python code was faster than Euphoria. I was reading a
> > file of 40.000
> > lines like this:
> > a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
> > mark	120	B	405	425	404.83
> > 425.86	71.0	51.0	false	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0
> > into memory. The code is below. It took 2 minutes just reading the file into
> > memory
> > during which the ntvdm.exe was running at full speed and memory allocation
> > rose slowly
> > to around 55 megabytes. If I skipped the line: table = append(table,line)
> > this took
> > 2 seconds.
> > 
> > include kparse.e
> > 
> > sequence line, table
> > integer file
> > object o_line
> > constant TRUE = 1
> > constant FALSE = 0
> > constant TAB = 9
> > 
> > file = open("big_file.txt", "r")
> > table = {}
> > 	while TRUE do
> > 	o_line = gets(file)
> > 	if atom(o_line) then
> > 		exit
> > 	end if
> > 	line = {}
> > 	line = Kparse(o_line, TAB)
> > 	table = append(table,line)
> > end while
> > 
> 
> Hi there,
> 
> That's interesting, because i do the same thing with my personal editor
> and it opens a file with 56,000 lines (4 Megabytes) in about a second.
> 
> How much physical RAM do you have installed in your computer?
> 
> 
> Take care,
> Al
> 
> And, good luck with your Euphoria programming!
> 
> My bumper sticker: "I brake for LED's"

Actually, this isn't a problem with slow memory allocation.  This is a problem
with append().  This function has always been known to be slow, amongst most of
the "old timers" here.  It is much more suggestable that you use the &= oper sign
to concat data together.  It has been proven to be much faster then append() on
many occasions.

Here's an Example:

sequence a
a = "This is part of a string"
a &= ", and this is the rest"
a = {a}
a &= { "This is a new string." }
-- a now looks like: {"This is part of a string, and this is the rest", "This is
a new string"}


Always remember though, when you want to concat together two sequences, and want
each to have it's own place, and not concated together into 1 sequence, use the {
} brackets around the data, even if it is a sequence, it shows the Euphoria
Interpreter, that you want to seperate the two sequences from each other.


Mario Steele
http://enchantedblade.trilake.net
Attaining World Dominiation, one byte at a time...

new topic     » goto parent     » topic index » view message » categorize

5. Re: Slow memory allocation

Mario Steele wrote:
> 
> Al Getz wrote:
> > 
> > Haflidi Asgrimsson wrote:
> > > 
> > > Suprisingly my old Python code was faster than Euphoria. I was reading a
> > > file of 40.000
> > > lines like this:
> > > a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
> > > mark	120	B	405	425	404.83
> > > 425.86	71.0	51.0	false	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0
> > > into memory. The code is below. It took 2 minutes just reading the file
> > > into memory
> > > during which the ntvdm.exe was running at full speed and memory allocation
> > > rose slowly
> > > to around 55 megabytes. If I skipped the line: table = append(table,line)
> > > this took
> > > 2 seconds.
> > > 
> > > include kparse.e
> > > 
> > > sequence line, table
> > > integer file
> > > object o_line
> > > constant TRUE = 1
> > > constant FALSE = 0
> > > constant TAB = 9
> > > 
> > > file = open("big_file.txt", "r")
> > > table = {}
> > > 	while TRUE do
> > > 	o_line = gets(file)
> > > 	if atom(o_line) then
> > > 		exit
> > > 	end if
> > > 	line = {}
> > > 	line = Kparse(o_line, TAB)
> > > 	table = append(table,line)
> > > end while
> > > 
> > 
> > Hi there,
> > 
> > That's interesting, because i do the same thing with my personal editor
> > and it opens a file with 56,000 lines (4 Megabytes) in about a second.
> > 
> > How much physical RAM do you have installed in your computer?
> > 
> > 
> > Take care,
> > Al
> > 
> > And, good luck with your Euphoria programming!
> > 
> > My bumper sticker: "I brake for LED's"
> 
> Actually, this isn't a problem with slow memory allocation.  This is a problem
> with append().
>  This function has always been known to be slow, amongst most of the "old
>  timers" here.
>  It is much more suggestable that you use the &= oper sign to concat data
>  together.
>  It has been proven to be much faster then append() on many occasions.
> 
> Here's an Example:
> 
> }}}
<eucode>
> sequence a
> a = "This is part of a string"
> a &= ", and this is the rest"
> a = {a}
> a &= { "This is a new string." }
> -- a now looks like: {"This is part of a string, and this is the rest", "This
> is a new string"}
> <font color="#330033"></eucode>
{{{
</font>
> 
> Always remember though, when you want to concat together two sequences, and
> want each
> to have it's own place, and not concated together into 1 sequence, use the { }
> brackets
> around the data, even if it is a sequence, it shows the Euphoria Interpreter,
> that
> you want to seperate the two sequences from each other.
> 
> 
> Mario Steele
> <a
> href="http://enchantedblade.trilake.net">http://enchantedblade.trilake.net</a>
> Attaining World Dominiation, one byte at a time...
> 
I tried this on two systems both running Windows XP Pro, the latter is 2GHz with
1.5 Gb memory, there it took 25 seconds. What I found most interesting is that
append(), &= 	}}}
<eucode>and</eucode>
{{{
 assignment gave the same result, 25 seconds.
table[i] = line is taking around 24 seconds of those.
integer file, n_line
object o_line
constant TRUE = 1
constant FALSE = 0
constant TAB = 9

file = open("big_file.txt", "r")
n_line = 0
while TRUE do
	n_line += 1
	o_line = gets(file)
	if atom(o_line) then
		exit
	end if
end while
if seek(file,0) then
	puts(1,"Seek failed\n")
end if
table = repeat( {}, n_line )
for i = 1 to n_line do
	o_line = gets(file)
	if atom(o_line) then
	exit
	end if
	line = {}
	line = Kparse(o_line, TAB)
	table[i] = line
end for


new topic     » goto parent     » topic index » view message » categorize

6. Re: Slow memory allocation

Haflidi Asgrimsson wrote:
> 
> Mario Steele wrote:
> > 
> > Al Getz wrote:
> > > 
> > > Haflidi Asgrimsson wrote:
> > > > 
> > > > Suprisingly my old Python code was faster than Euphoria. I was reading a
> > > > file of 40.000
> > > > lines like this:
> > > > a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
> > > > mark	120	B	405	425	404.83
> > > > 425.86	71.0	51.0	false	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0
> > > > into memory. The code is below. It took 2 minutes just reading the file
> > > > into memory
> > > > during which the ntvdm.exe was running at full speed and memory
> > > > allocation rose slowly
> > > > to around 55 megabytes. If I skipped the line: table =
> > > > append(table,line) this took
> > > > 2 seconds.
> > > > 
> > > > include kparse.e
> > > > 
> > > > sequence line, table
> > > > integer file
> > > > object o_line
> > > > constant TRUE = 1
> > > > constant FALSE = 0
> > > > constant TAB = 9
> > > > 
> > > > file = open("big_file.txt", "r")
> > > > table = {}
> > > > 	while TRUE do
> > > > 	o_line = gets(file)
> > > > 	if atom(o_line) then
> > > > 		exit
> > > > 	end if
> > > > 	line = {}
> > > > 	line = Kparse(o_line, TAB)
> > > > 	table = append(table,line)
> > > > end while
> > > > 
> > > 
> > > Hi there,
> > > 
> > > That's interesting, because i do the same thing with my personal editor
> > > and it opens a file with 56,000 lines (4 Megabytes) in about a second.
> > > 
> > > How much physical RAM do you have installed in your computer?
> > > 
> > > 
> > > Take care,
> > > Al
> > > 
> > > And, good luck with your Euphoria programming!
> > > 
> > > My bumper sticker: "I brake for LED's"
> > 
> > Actually, this isn't a problem with slow memory allocation.  This is a
> > problem with append().
> >  This function has always been known to be slow, amongst most of the "old
> >  timers" here.
> >  It is much more suggestable that you use the &= oper sign to concat data
> >  together.
> >  It has been proven to be much faster then append() on many occasions.
> > 
> > Here's an Example:
> > 
> > }}}
<eucode>
> > sequence a
> > a = "This is part of a string"
> > a &= ", and this is the rest"
> > a = {a}
> > a &= { "This is a new string." }
> > -- a now looks like: {"This is part of a string, and this is the rest",
> > "This is a new string"}
> <font color="#330033">> <font color=</font><font
> color="#00A033">"#330033"</font><font color="#330033">></eucode>
{{{
</font></font>
> > 
> > Always remember though, when you want to concat together two sequences, and
> > want each
> > to have it's own place, and not concated together into 1 sequence, use the {
> > } brackets
> > around the data, even if it is a sequence, it shows the Euphoria
> > Interpreter, that
> > you want to seperate the two sequences from each other.
> > 
> > 
> > Mario Steele
> > <a
> > href="http://enchantedblade.trilake.net">http://enchantedblade.trilake.net</a>
> > Attaining World Dominiation, one byte at a time...
> > 
> <font color="#330033">I tried this on two systems both running Windows XP Pro,
> the latter is 2GHz </font><font color="#0000FF">with </font><font
> color="#330033">1.5 Gb memory, there it took 25 seconds. What I found most
> interesting is that </font><font color="#FF00FF">append</font><font
> color="#330033">(),
> &= 	}}}
<eucode></font><font color="#0000FF">and</font><font
> color="#330033"></eucode>
{{{
 assignment gave the same result, 25 seconds.</font>
> table[i] = line is taking around 24 seconds of those.
> }}}
<eucode>
> integer file, n_line
> object o_line
> constant TRUE = 1
> constant FALSE = 0
> constant TAB = 9
> 
> file = open("big_file.txt", "r")
> n_line = 0
> while TRUE do
> 	n_line += 1
> 	o_line = gets(file)
> 	if atom(o_line) then
> 		exit
> 	end if
> end while
> if seek(file,0) then
> 	puts(1,"Seek failed\n")
> end if
> table = repeat( {}, n_line )
> for i = 1 to n_line do
> 	o_line = gets(file)
> 	if atom(o_line) then
> 	exit
> 	end if
> 	line = {}
> 	line = Kparse(o_line, TAB)
> 	table[i] = line
> end for
> <font color="#330033"></eucode>
{{{
</font>
> 


Hello again,


My only question now is what version of Euphoria are you using?

Im asking these questions because i do this:


object line
sequence buff
atom fn

fn=open("c:\\myfile.txt","r")

while 1 do
  line=gets(fn)
  if atom(line) then
    exit
  end if
  buff=append(buff,line)
end while

The above code opens a 2,839,910 bytes text file in about 1 second
using Euphoria v2.4 with a bindw'd exe.

Did you try loading the file completely first, then parsing after?
If so, does it speed it up any?
I know you have tried leaving out the 'append' line, but what
happens when you leave out the parse line without leaving out
the append line?


Take care,
Al

And, good luck with your Euphoria programming!

My bumper sticker: "I brake for LED's"

new topic     » goto parent     » topic index » view message » categorize

7. Re: Slow memory allocation

Hello again,

Im not sure if i replied to the correct post, so here's the post i meant
to reply to:

>I tried this on two systems both running Windows XP Pro,
> the latter is 2GHz with 1.5 Gb memory, there it took 25 seconds.
> What I found most interesting is that append(),
>&= 	and assignment gave the same result, 25 seconds.
>table[i] = line is taking around 24 seconds of those.

Your system is faster than mine and has more memory so i would have
expected it to read the file faster than on mine, and mine reads
a 2,900,000+ byte file in about a second.
I tried a non-bindw'd .exw file and it was the same, and i tried
the .exw file with Version 2.5 of Euphoria (PD Beta version) and
it was the same, about one second to read the whole file into
the sequence.
What i would do is try that exact code fragment (in the previous post)
without the parse line and see if it works faster.  If not, i would
wonder if you have any active virus software or the page file was
moved by a non-Windows disk manager.
If it does in fact speed up, then perhaps you should do your parsing
AFTER the whole file is read into the sequence.

I've been using Euphoria for several years now and i've had my
editor up and running for most of them, and it's always been fast
even though im using 'append' to store the lines in the sequence.
Pete Lomax recently started a new editor which uses basically the
same technique and that is about the same speed (fast).  This makes
me think something else is wrong.


Take care,
Al

And, good luck with your Euphoria programming!

My bumper sticker: "I brake for LED's"

new topic     » goto parent     » topic index » view message » categorize

8. Re: Slow memory allocation

Al Getz wrote:
> 
> Hello again,
> 
> Im not sure if i replied to the correct post, so here's the post i meant
> to reply to:
> 
> >I tried this on two systems both running Windows XP Pro,
> > the latter is 2GHz with 1.5 Gb memory, there it took 25 seconds.
> > What I found most interesting is that append(),
> >&= 	and assignment gave the same result, 25 seconds.
> >table[i] = line is taking around 24 seconds of those.
> 
> Your system is faster than mine and has more memory so i would have
> expected it to read the file faster than on mine, and mine reads
> a 2,900,000+ byte file in about a second.
> I tried a non-bindw'd .exw file and it was the same, and i tried
> the .exw file with Version 2.5 of Euphoria (PD Beta version) and
> it was the same, about one second to read the whole file into
> the sequence.
> What i would do is try that exact code fragment (in the previous post)
> without the parse line and see if it works faster.  If not, i would
> wonder if you have any active virus software or the page file was
> moved by a non-Windows disk manager.
> If it does in fact speed up, then perhaps you should do your parsing
> AFTER the whole file is read into the sequence.
> 
> I've been using Euphoria for several years now and i've had my
> editor up and running for most of them, and it's always been fast
> even though im using 'append' to store the lines in the sequence.
> Pete Lomax recently started a new editor which uses basically the
> same technique and that is about the same speed (fast).  This makes
> me think something else is wrong.
> 
> 
> Take care,
> Al
> 
> And, good luck with your Euphoria programming!
> 
> My bumper sticker: "I brake for LED's"
> 
Us getting different results led to me trying you code, ran in less than
a second. The culprit line is:
	line = Kparse(o_line, TAB)
Although it only slows the code with append(), &= or assignment following.

new topic     » goto parent     » topic index » view message » categorize

9. Re: Slow memory allocation

Haflidi Asgrimsson wrote:
> 
> Us getting different results led to me trying you code, ran in less than
> a second. The culprit line is:
> 	line = Kparse(o_line, TAB)
> Although it only slows the code with append(), &= or assignment following.
> 

That would have been my next question.  What does Kparse() do?  Could you
tell us what the results are when you:

  1) Comment out the Kparse() call
  2) Comment out the assignment to table
  3) Comment out the Kparse() call and the assignment (just read and ignore)

Matt Lewis

new topic     » goto parent     » topic index » view message » categorize

10. Re: Slow memory allocation

Matt Lewis wrote:
> 
> Haflidi Asgrimsson wrote:
> > 
> > Us getting different results led to me trying you code, ran in less than
> > a second. The culprit line is:
> > 	line = Kparse(o_line, TAB)
> > Although it only slows the code with append(), &= or assignment following.
> > 
> 
> That would have been my next question.  What does Kparse() do?  Could you
> tell us what the results are when you:
> 
>   1) Comment out the Kparse() call
>   2) Comment out the assignment to table
>   3) Comment out the Kparse() call and the assignment (just read and ignore)
> 
> Matt Lewis
> 
Kparse is from kparse.e
--KPARSE.E (parse with keep)
--(c) 05/01/104 Michael J Raley (thinkways at yahoo.com)
--Turns a string of delimited text into a list of items,
--while retaining the position of empty elements. 
The function is below
So string like "1\t2\t\t3" is converted into list: {"1","2","","3"}

In all three cases I get nearly instant response, only when both lines:
	line = Kparse(o_line, TAB)
	table = append(table, line)
then the CPU runs for 25 seconds.
This must be some kind of late typechecking because if TAB is replaced by 
a sequence. Then this takes only about 2 seconds.
	line = Kparse(o_line, "9")



--------------------------------------------------------
global function Kparse(object s, object o) 
 sequence clipbook, parsed_list
 atom ls, lc, clip

 if atom(s) then return s end if 

 clipbook = {}
  parsed_list = {} 
  ls = length(s)

--convert atom delimiter into a 1n sequence 
 if atom(o) then o = {o} end if 

--bookmark the position of all delimiters in 1 pass
  for a = 1 to ls do  
    if match({s[a]},o) then clipbook &= a
    end if  
   
  end for

  lc   = length(clipbook)
  if lc = 0 then return {} end if 

-- find the text between the recorded delimeter positions to create a list.
-- First check to see if the first bookmarked delimiter starts sequence s

  clip = clipbook[1]
   if clip = 1 then              -- Yes. Create the first element empty
    parsed_list = {{}}         
   else 
parsed_list = {s[1..clip-1]} -- No. Build the first element from s up to our
    bookmark
   end if 

-- now we can process the rest of the sequence 
  for ic = 2 to lc do
     if clip+1 = clipbook[ic] then
       parsed_list = append(parsed_list,{})
     else 
       parsed_list = append(parsed_list,s[clip+1..clipbook[ic]-1])
     end if 
     clip = clipbook[ic]
   end for 
--test if end of s is past last delimeter   
   if ls > clipbook[lc] then 
     parsed_list = append(parsed_list,s[clipbook[lc]+1..ls])     
   end if 
  
   return parsed_list
end function


new topic     » goto parent     » topic index » view message » categorize

11. Re: Slow memory allocation

Haflidi Asgrimsson wrote:
> 
> Matt Lewis wrote:
> > 
> > Haflidi Asgrimsson wrote:
> > > 
> > > Us getting different results led to me trying you code, ran in less than
> > > a second. The culprit line is:
> > > 	line = Kparse(o_line, TAB)
> > > Although it only slows the code with append(), &= or assignment following.
> > > 
> > 
> > That would have been my next question.  What does Kparse() do?  Could you
> > tell us what the results are when you:
> > 
> >   1) Comment out the Kparse() call
> >   2) Comment out the assignment to table
> >   3) Comment out the Kparse() call and the assignment (just read and ignore)
> > 
> > Matt Lewis
> > 
> Kparse is from kparse.e
> --KPARSE.E (parse with keep)
> --(c) 05/01/104 Michael J Raley (thinkways at yahoo.com)
> --Turns a string of delimited text into a list of items,
> --while retaining the position of empty elements. 
> The function is below
> So string like "1\t2\t\t3" is converted into list: {"1","2","","3"}
> 
> In all three cases I get nearly instant response, only when both lines:
> 	line = Kparse(o_line, TAB)
> 	table = append(table, line)
> then the CPU runs for 25 seconds.
> This must be some kind of late typechecking because if TAB is replaced by 
> a sequence. Then this takes only about 2 seconds.
> 	line = Kparse(o_line, "9")
> 
> 
> }}}
<eucode>
> --------------------------------------------------------
> global function Kparse(object s, object o) 
>  sequence clipbook, parsed_list
>  atom ls, lc, clip
> 
>  if atom(s) then return s end if 
> 
>  clipbook = {}
>   parsed_list = {} 
>   ls = length(s)
> 
> --convert atom delimiter into a 1n sequence 
>  if atom(o) then o = {o} end if 
> 
> --bookmark the position of all delimiters in 1 pass
>   for a = 1 to ls do  
>     if match({s[a]},o) then clipbook &= a
>     end if  
>    
>   end for
> 
>   lc   = length(clipbook)
>   if lc = 0 then return {} end if 
> 
> -- find the text between the recorded delimeter positions to create a list.
> -- First check to see if the first bookmarked delimiter starts sequence s
> 
>   clip = clipbook[1]
>    if clip = 1 then              -- Yes. Create the first element empty
>     parsed_list = {{}}         
>    else 
>     parsed_list = {s[1..clip-1]} -- No. Build the first element from s up to
>     our bookmark
>    end if 
> 
> -- now we can process the rest of the sequence 
>   for ic = 2 to lc do
>      if clip+1 = clipbook[ic] then
>        parsed_list = append(parsed_list,{})
>      else 
>        parsed_list = append(parsed_list,s[clip+1..clipbook[ic]-1])
>      end if 
>      clip = clipbook[ic]
>    end for 
> --test if end of s is past last delimeter   
>    if ls > clipbook[lc] then 
>      parsed_list = append(parsed_list,s[clipbook[lc]+1..ls])     
>    end if 
>   
>    return parsed_list
> end function
> <font color="#330033"></eucode>
{{{
</font>
> 

Look at all the appends here with sequence subscripting & slicing operations.
With "append()" and "&=" short-hand operator, sequences are dynamically growning
(allocating more memory) whenever more data is pushed into the sequences. With
repeat() you can specify exactly how big you want the sequence to be (without
dynamic sequence growning & memory allocation), and push data into the already
allocated sequence elements, using a loop. Dynamic allocation is very useful in
Euphoria, but maybe not in this case where performance is the biggest issue. See
if you can modify this code to use some repeats. That could help the kparse
routine perform quicker and more efficently.


Regards,
Vincent

--
Without walls and fences, there is no need for Windows and Gates.

new topic     » goto parent     » topic index » view message » categorize

12. Re: Slow memory allocation

Haflidi Asgrimsson wrote:
> 
> I tried this on two systems both running Windows XP Pro, the 
> latter is 2GHz with 1.5 Gb memory, there it took 25 seconds. What 
> I found most interesting is that append(), &= and assignment gave 
> the same result, 25 seconds
> table[i] = line is taking around 24 seconds of those.

This seems really odd.  I just made a text file that's 64,886 lines of
10 random numbers with 11 characters, tab delimited.  I can't make the 
time go up much beyond 1.5 seconds, and this is on a 2.4GHz Celeron with
512MB RAM WinXP Home.  Total memory usage is about 55Megs, which is seems
correct to me (the file's about 7Megs).

Is there something odd about the file?  For one thing, you're adding extra
ref's and deref's to sequences by using the line variable.  Cut that out
and see if that makes any difference (didn't on my machine).  Also, I'd
advise using a sequence passed to kparse, since it just makes an integer
into a sequence, so you're wasting some cycles right there.

Can you post the source file (or one just like it, but with the data
changed, if that's an issue)?  Maybe there's something strange about 
the format of the data that's causing issues.  Anyway, here's my code
that runs in about 1.5s on my machine.  Replace "bigrand.txt" with your
file, and let me know what happens.  If you want, I can email you my
file (it's about 3MB zipped).

include get.e

global function kparse(object s, object o) 
 sequence clipbook, parsed_list
 atom ls, lc, clip

 if atom(s) then return s end if 

 clipbook = {}
  parsed_list = {} 
  ls = length(s)

--convert atom delimiter into a 1n sequence 
 if atom(o) then o = {o} end if 

--bookmark the position of all delimiters in 1 pass
  for a = 1 to ls do  
    if match({s[a]},o) then clipbook &= a
    end if  
   
  end for

  lc   = length(clipbook)
  if lc = 0 then return {} end if 

-- find the text between the recorded delimeter positions to create a list.
-- First check to see if the first bookmarked delimiter starts sequence s

  clip = clipbook[1]
   if clip = 1 then              -- Yes. Create the first element empty
    parsed_list = {{}}         
   else 
parsed_list = {s[1..clip-1]} -- No. Build the first element from s up to our
    bookmark
   end if 

-- now we can process the rest of the sequence 
  for ic = 2 to lc do
     if clip+1 = clipbook[ic] then
       parsed_list = append(parsed_list,{})
     else 
       parsed_list = append(parsed_list,s[clip+1..clipbook[ic]-1])
     end if 
     clip = clipbook[ic]
   end for 
--test if end of s is past last delimeter   
   if ls > clipbook[lc] then 
     parsed_list = append(parsed_list,s[clipbook[lc]+1..ls])     
   end if 
  
   return parsed_list
end function

procedure main()
	atom t
	integer fn
	object in
	sequence table

	fn = open( "bigrand.txt", "r" )
	table = {}
	t = time()
	in = gets( fn )

	while sequence(in) do
		table = append( table, kparse( in, "\t" ) )
		in = gets( fn )
	end while
	printf( 1, "%gsec\n", time() - t)
	if wait_key() then
		
	end if
end procedure
main()


new topic     » goto parent     » topic index » view message » categorize

13. Re: Slow memory allocation

Haflidi Asgrimsson wrote:
> 
> Matt Lewis wrote:
> > 
> > Haflidi Asgrimsson wrote:
> > > 
> > > Us getting different results led to me trying you code, ran in less than
> > > a second. The culprit line is:
> > > 	line = Kparse(o_line, TAB)
> > > Although it only slows the code with append(), &= or assignment following.
> > > 
> > 
> > That would have been my next question.  What does Kparse() do?  Could you
> > tell us what the results are when you:
> > 
> >   1) Comment out the Kparse() call
> >   2) Comment out the assignment to table
> >   3) Comment out the Kparse() call and the assignment (just read and ignore)
> > 
> > Matt Lewis
> > 
> Kparse is from kparse.e
> --KPARSE.E (parse with keep)
> --(c) 05/01/104 Michael J Raley (thinkways at yahoo.com)
> --Turns a string of delimited text into a list of items,
> --while retaining the position of empty elements. 
> The function is below
> So string like "1\t2\t\t3" is converted into list: {"1","2","","3"}
> 
> In all three cases I get nearly instant response, only when both lines:
> 	line = Kparse(o_line, TAB)
> 	table = append(table, line)
> then the CPU runs for 25 seconds.
> This must be some kind of late typechecking because if TAB is replaced by 
> a sequence. Then this takes only about 2 seconds.
> 	line = Kparse(o_line, "9")
> 
> 
> }}}
<eucode>
> --------------------------------------------------------
> global function Kparse(object s, object o) 
>  sequence clipbook, parsed_list
>  atom ls, lc, clip
> 
>  if atom(s) then return s end if 
> 
>  clipbook = {}
>   parsed_list = {} 
>   ls = length(s)
> 
> --convert atom delimiter into a 1n sequence 
>  if atom(o) then o = {o} end if 
> 
> --bookmark the position of all delimiters in 1 pass
>   for a = 1 to ls do  
>     if match({s[a]},o) then clipbook &= a
>     end if  
>    
>   end for
> 
>   lc   = length(clipbook)
>   if lc = 0 then return {} end if 
> 
> -- find the text between the recorded delimeter positions to create a list.
> -- First check to see if the first bookmarked delimiter starts sequence s
> 
>   clip = clipbook[1]
>    if clip = 1 then              -- Yes. Create the first element empty
>     parsed_list = {{}}         
>    else 
>     parsed_list = {s[1..clip-1]} -- No. Build the first element from s up to
>     our bookmark
>    end if 
> 
> -- now we can process the rest of the sequence 
>   for ic = 2 to lc do
>      if clip+1 = clipbook[ic] then
>        parsed_list = append(parsed_list,{})
>      else 
>        parsed_list = append(parsed_list,s[clip+1..clipbook[ic]-1])
>      end if 
>      clip = clipbook[ic]
>    end for 
> --test if end of s is past last delimeter   
>    if ls > clipbook[lc] then 
>      parsed_list = append(parsed_list,s[clipbook[lc]+1..ls])     
>    end if 
>   
>    return parsed_list
> end function
> <font color="#330033"></eucode>
{{{
</font>
> 

Hi again,


It's usually better to load the entire file first and then parse
later.


Take care,
Al

And, good luck with your Euphoria programming!

My bumper sticker: "I brake for LED's"

new topic     » goto parent     » topic index » view message » categorize

14. Re: Slow memory allocation

As with all other interpreters the more information you give the more
efficiently it runs the code.
I think I've learned a valuable lesson here:
constant TAB = 9 is bad 
constant TAB = '9' is OK
constant TAB = "9" is OK

new topic     » goto parent     » topic index » view message » categorize

15. Re: Slow memory allocation

Haflidi Asgrimsson wrote:
> 
> As with all other interpreters the more information you give the more
> efficiently it runs the code.
> I think I've learned a valuable lesson here:
> constant TAB = 9 is bad 
> constant TAB = '9' is OK
> constant TAB = "9" is OK
> 


This is misleading.  '9' != '\t' and "9" != "\t".  The reason it's much
faster is that you're getting many fewer delimiters (only when the character
9 is encountered).

Do you have other things running?  How much *free* memory do you have?  Maybe
your memory is swapping out or something?  There's something else going on
here, because we're all getting very different results on relatively similar
hardware.

Matt Lewis

new topic     » goto parent     » topic index » view message » categorize

16. Re: Slow memory allocation

>From: Haflidi Asgrimsson <guest at RapidEuphoria.com>
>Reply-To: EUforum at topica.com
>To: EUforum at topica.com
>Subject: Re: Slow memory allocation
>Date: Sun, 12 Jun 2005 14:17:17 -0700
>
>posted by: Haflidi Asgrimsson <haflidi at prokaria.com>
>
>As with all other interpreters the more information you give the more
>efficiently it runs the code.
>I think I've learned a valuable lesson here:
>constant TAB = 9 is bad
>constant TAB = '9' is OK
>constant TAB = "9" is OK
>

   Those last two are not correct. They just mean the number 9, not a TAB 
character. If you're splitting by 9's and not TAB's, then maybe you aren't 
splitting anything at all, thereby making it seem faster. The constants 
should be like this:
constant TAB = '\t'
or
constant TAB = "\t"

~[ WingZone ]~
http://wingzone.tripod.com/

new topic     » goto parent     » topic index » view message » categorize

17. Re: Slow memory allocation

Except Kparses stops parsing and returns empty list.

So I wrote my own function and it parses my file in 3 seconds:
function mySplit(string s_input, sequence s_char)
	sequence l_return, s_return
	integer n_start, n_stop
	atom a_char
	
	if equal(s_char, "#") then
		a_char = '!'
	else
		a_char = '#'
	end if
	l_return = {}
	n_start = 1
	while TRUE do
		n_stop = match(s_char, s_input)
		if n_stop then
			s_input[n_stop] = a_char
			s_return = s_input[n_start..n_stop-1]
			l_return = append(l_return, s_return)
			n_start = n_stop+1
		else
			l_return = append(l_return, s_input[n_start..$])
			exit
		end if
	end while
	return l_return
end function

Tank you all for trying to help

new topic     » goto parent     » topic index » view message » categorize

18. Re: Slow memory allocation

I made the bigfile in Excel filling down the following line, 40000 lines
a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
mark	120	B	405	425	404.83	425.86	71.0	51.0	FALSE	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0

The "\t" did not change anything but of corse is this right. But 9 works too.

I wrote my own function that works so I blame allt this on Kparse function.

Thank you

new topic     » goto parent     » topic index » view message » categorize

19. Re: Slow memory allocation

On Sun, 12 Jun 2005 15:59:17 -0700, Haflidi Asgrimsson
<guest at RapidEuphoria.com> wrote:

>I made the bigfile in Excel filling down the following line, 40000 lines
While I don't want to re-open that can of worms, I am reminded of a
previous thread:
http://www.listfilter.com/cgi-bin/esearch.exu?thread=1&fromMonth=A&fromYear=9&toMonth=C&toYear=9&keywords=%22Dramatic+slowdown+-ping+Rob%22


>a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
>mark	120	B	405	425	404.83	425.86	71.0	51.0	FALSE	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0

..and in this unusual test case you are apparently allocating 40,000
strings of length 43. The program might well have run fine on real
data, it could just have been that particular test set.
>
>I wrote my own function that works so I blame all this on Kparse function.
>
Perhaps not the kindest words ever written, but I'm glad you resolved
it. I imagine it is a programmers lot to occasionally run into such
problems, and they probably occur whatever language we code in.

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

20. Re: Slow memory allocation

Pete Lomax wrote:
> 
> On Sun, 12 Jun 2005 15:59:17 -0700, Haflidi Asgrimsson
> <guest at RapidEuphoria.com> wrote:
> 
> >I made the bigfile in Excel filling down the following line, 40000 lines
> While I don't want to re-open that can of worms, I am reminded of a
> previous thread:
> <a
> href="http://www.listfilter.com/cgi-bin/esearch.exu?thread=1&fromMonth=A&fromYear=9&toMonth=C&toYear=9&keywords=%22Dramatic+slowdown+-ping+Rob%22">http://www.listfilter.com/cgi-bin/esearch.exu?thread=1&fromMonth=A&fromYear=9&toMonth=C&toYear=9&keywords=%22Dramatic+slowdown+-ping+Rob%22</a>
> 
> 
> >a10_Clo10_BL26_ClFish_O_C251A_0_0_0_2003X04	Clown Fish 17
> >mark	120	B	405	425	404.83	425.86	71.0
> 51.0	FALSE	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	1.0</font></i>
> 
> ..and in this unusual test case you are apparently allocating 40,000
> strings of length 43. The program might well have run fine on real
> data, it could just have been that particular test set.
> >
> >I wrote my own function that works so I blame all this on Kparse function.
> >
> Perhaps not the kindest words ever written, but I'm glad you resolved
> it. I imagine it is a programmers lot to occasionally run into such
> problems, and they probably occur whatever language we code in.
> 
> Regards,
> Pete
> 
> 

Sorry, It was meant as a joke at my expense. 
When one is stuck in ones own code the last resort is to blame someone else’s.
Actually my solution wasn't good enough so I'm using the Kparse function not 
reading the whole file into memory, just one line at a time and it works fine.
I'm lacking insight into the Euphoria interpreter so I found this case
interesting.
And I got a lot of hints form you all, thank you!
And I repeat, I'm very sorry if I sounded rude!

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu