1. De-allocating memory - time issue

I have a *HUGE* object of some 10 million entries. (I now know the 
exact size 'cos I coded a status window to let me know) 
[please read this all the way thru before responding] 
 
Given that I have 48MB of ram, that 10 million bears resemblance to me 
running out of physical memory (4 bytes per integer, yep), and the ole 
PC begins to disk thrash. 
 
So, I code a "Cancel" button, press it and the program waddles on 
longer than expected. 
 
So I trace it to the line 
 
	 stack={ 
 
That line alone takes 3 minutes to execute. 
 
Now, I kind of understand there are difficulties abound in the general 
case deallocating ten million entries, most of which are copies of 
copies of copies (however many it takes to get to ten million). 
 
stack is an array of approx repeat({},55000) of which only the middle 
3200 have been updated, however each such that stack[i] is set to 
stack[i+/-1]&{x,y. 
 
So I guess reference counts are off the scale. 
 
The annoying point here, then, assuming it *is* the problem, is that 
the ref count handling is, in this unusual case, wholly irrelevant. 
 
I definately doubt there is any way to improve this, it really is not 
any kind of major problem in any sense, but on the off-chance, I am 
just asking if anyone has any suggestions. 
 
Thanks 
Pete

new topic     » topic index » view message » categorize

2. Re: De-allocating memory - time issue

On 19 Sep 2002, at 1:17, petelomax at blueyonder.co.uk wrote:

> 
> I have a *HUGE* object of some 10 million entries. (I now know the
> exact size 'cos I coded a status window to let me know)
> [please read this all the way thru before responding]
> 
> Given that I have 48MB of ram, that 10 million bears resemblance to me
> running out of physical memory (4 bytes per integer, yep), and the ole
> PC begins to disk thrash.
> 
> So, I code a "Cancel" button, press it and the program waddles on
> longer than expected.
> 
> So I trace it to the line
> 
>   stack={}
> 
> That line alone takes 3 minutes to execute.
> 
> Now, I kind of understand there are difficulties abound in the general
> case deallocating ten million entries, most of which are copies of
> copies of copies (however many it takes to get to ten million).
> 
> stack is an array of approx repeat({},55000) of which only the middle
> 3200 have been updated, however each such that stack[i] is set to
> stack[i+/-1]&{x,y}.
> 
> So I guess reference counts are off the scale.
> 
> The annoying point here, then, assuming it *is* the problem, is that
> the ref count handling is, in this unusual case, wholly irrelevant.
> 
> I definately doubt there is any way to improve this, it really is not
> any kind of major problem in any sense, but on the off-chance, I am
> just asking if anyone has any suggestions.

Once you run out of ram, you are at the mercy of the OS and disk speeds. 
I'd point you to the Bladed Beowulf site, but the DOE took it down about 2 
days before 9-11. Not that i can afford that either, assuming it ever makes it 
into production, but i ran across it while looking into a similar problem as 
your's: the problem doesn't fit into the computer, and no single cpu/ram 
combination is going to be fast enough. 

Just last night, one program ate 304megabytes, and stopped itself. I still 
haven't figured out how it used so much ram.

Kat

new topic     » goto parent     » topic index » view message » categorize

3. Re: De-allocating memory - time issue

And in case it might be helpful, I did Ricardo's tests, but with only 100 in
the inner loop, to reduce the length of time required, and it took 355 times
longer to do the multiplication than the simple put value in sequence; that
would suggest it would have taken almost 2.5 hours to complete the
multiplication test if he'd let it go on.

Dan Moyer

----- Original Message -----
From: <rforno at tutopia.com>
To: "EUforum" <EUforum at topica.com>
Subject: RE: De-allocating memory - time issue


>
> Andy:
> I tried it as follows, and it lasted 23 seconds:
>
> sequence t
> t = repeat(repeat(repeat(1.13218966051, 3000), 100), 50)
> for i = 1 to 50 do
>     for j = 1 to 100 do
>  for k = 1 to 3000 do
>      t[i][j][k] = 1.31764903217
>  end for
>     end for
> end for
>
> But if I perform a floating point calculation like:
> t[i][j][k] = (i * j * k) * 1.31764903217
> it lasts a long while (I stopped it before getting the timing because I
was
> tired of waiting...)
> Maybe Rob can explain the difference. Of course, there should be a
> difference, but not so big, I think.
> Regards.
>
> ----- Original Message -----
> From: Andy Serpa <renegade at earthling.net>
> To: EUforum <EUforum at topica.com>
> Sent: Saturday, September 21, 2002 10:03 AM
> Subject: RE: De-allocating memory - time issue
>
>
> > Try it with floating point values.
> >
> >
> > rforno at tutopia.com wrote:
> > > Andy:
> > > I tried your example using the following program. For me, it lasted 21
> > > seconds. I own an AMD 500 with 256 Mb RAM.
> > > So, I suspect your timings are a consequence of using virtual RAM.
Other
> > > cause of slowness would be changing the size of the sequence's
elements,
> > > say
> > > intializing them with 0 and afterwards replacing some data with
> > > non-Euphoria-integer data.
> > > Regards.
> > >
> > > sequence t
> > > t = repeat(repeat(repeat(0, 3000), 100), 50)
> > > for i = 1 to 50 do
> > >     for j = 1 to 100 do
> > >  for k = 1 to 3000 do
> > >      t[i][j][k] = i * j * k
> > >  end for
> > >     end for
> > > end for
> > >
> > > ----- Original Message -----
> > > From: Andy Serpa <renegade at earthling.net>
> > > To: EUforum <EUforum at topica.com>
> > > Sent: Friday, September 20, 2002 3:43 AM
> > > Subject: RE: De-allocating memory - time issue
> > >
> > >
> > > > Well, if you ain't got the RAM, you ain't got the RAM and nothing is
> > > > going to help too much, but I recently wrote some code to simulate
> > > > multi-dimensional sequences in direct memory.
> > > >
> > > > I had a 3-dimensional sequence with dimensions something like 50 x
100
> x
> > > > 3000 giving me room to store 15 millions atoms. (Which I had enough
> RAM
> > > > for, so disk-swapping was not the issue.) Just trying to fill it
took
> 10
> > > > minutes or so, even if I pre-allocated it.  And trying to grab or
> update
> > > > values was incredibly slow.
> > > >
> > > > Seeing as it was of fixed size once created (although the elements
> > > > needed to be updated all the time) I figured I could just poke &
peek
> > > > the values directly into memory.  But I needed the ability to
> reference
> > > > an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some
functions
> > > > that calculated the correct index as poked or peeked it for me.
> > > >
> > > > Doing it this way is obviously slower than using a normal sequence
of
> > > > normal size, and the emulated sequence needs to be of fixed size,
but
> > > > for giant multi-dimensional sequences there is no comparison as my
> code
> > > > runs the same speed no matter how big the sequence you're emulating
> (as
> > > > long as you've got the memory).
> > > >
> > > > If anyone is interested, I can prepare the code for public
consumption
> > > > and upload it...
> > > >
> > > > -- Andy
> > > >
> > > >
> > > > petelomax at blueyonder.co.uk wrote:
> > > > > I have a *HUGE* object of some 10 million entries. (I now know the
> > > > > exact size 'cos I coded a status window to let me know)
> > > > > [please read this all the way thru before responding]
> > > > >
> > > > > Given that I have 48MB of ram, that 10 million bears resemblance
to
> me
> > > > > running out of physical memory (4 bytes per integer, yep), and the
> ole
> > > > > PC begins to disk thrash.
> > > > >
> > > > > So, I code a "Cancel" button, press it and the program waddles on
> > > > > longer than expected.
> > > > >
> > > > > So I trace it to the line
> > > > >
> > > > > stack={}
> > > > >
> > > > > That line alone takes 3 minutes to execute.
> > > > >
> > > > > Now, I kind of understand there are difficulties abound in the
> general
> > > > > case deallocating ten million entries, most of which are copies of
> > > > > copies of copies (however many it takes to get to ten million).
> > > > >
> > > > > stack is an array of approx repeat({},55000) of which only the
> middle
> > > > > 3200 have been updated, however each such that stack[i] is set to
> > > > > stack[i+/-1]&{x,y}.
> > > > >
> > > > > So I guess reference counts are off the scale.
> > > > >
> > > > > The annoying point here, then, assuming it *is* the problem, is
that
> > > > > the ref count handling is, in this unusual case, wholly
irrelevant.
> > > > >
> > > > > I definately doubt there is any way to improve this, it really is
> not
> > > > > any kind of major problem in any sense, but on the off-chance, I
am
> > > > > just asking if anyone has any suggestions.
> > > > >
> > > > > Thanks
> > > > > Pete
> > > > >
> > > > >
> > <snip>
> >
> >
>
>
>

new topic     » goto parent     » topic index » view message » categorize

4. Re: De-allocating memory - time issue

rforno writes:
> t = repeat(repeat(repeat(1.13218966051, 3000), 100), 50)

After executing this (type of) statement, in almost 
any language but Euphoria, you would have 3000x100x50 = 15 million 
(8-byte or more) copies of 1.13218966051 stored in memory. 
In Euphoria, you will have 3000 4-byte pointers to a *single* 
copy of 1.13218966051. You will also have 100 4-byte pointers 
to a *single* copy of the 3000-element sequence. You will also 
have 50 4-byte pointers to a *single* copy of  the 100-element sequence.
Instead of using 15 million x 8 bytes = 120,000,000 bytes of storage,
Euphoria will use only 3000x4+100x4+50x4 = 12,600 bytes 
plus the storage for *one* copy of 1.13218966051.

> for i = 1 to 50 do
>    for j = 1 to 100 do
> for k = 1 to 3000 do
>     t[i][j][k] = 1.31764903217
> end for
>    end for
> end for

During execution of the above nested loops,
copies will be made of all the sequences at each level.
There will now be 15 million 4-byte pointers 
to a *single* copy of 1.31764903217.
So you are still using much less memory than other languages.

> But if I perform a floating point calculation like:
> t[i][j][k] = (i * j * k) * 1.31764903217
> it lasts a long while 

Now it's no longer possible to have lots of pointers
to a single floating-point value. You are creating a
different f.p. value to store in each element. So you 
have created 15 million new floating-point values.
Each floating-point value needs 8 bytes plus 4 bytes
for a reference count, plus 4 bytes for malloc overhead.
So you suddenly need 15 million x 16 = 240,000,000 bytes
of *extra* storage in this case. Unless you have a lot of RAM
on your machine, you will be faced with some very serious
paging activity that will destroy your performance.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

5. Re: De-allocating memory - time issue

rforno writes:
> I have some additional questions:
>
> 1) Why does the timing for each main iteration increases linearly as the
> iteration number grows, in the following program?
>
> constant COLUMNS = 1000
> constant ROWS = 100
> constant PLANES = 50
> atom t0, t1
> sequence t
> t0 = time()
> t = repeat(repeat(repeat(1.13218966051, COLUMNS ), ROWS), PLANES)
> printf(1, "initialization time = %f\n", time() - t0)
> for i = 1 to PLANES do
>    t1 = time()
>    for j = 1 to ROWS do
>    for k = 1 to COLUMNS do
>        t[i][j][k] = (i * j * k) * 1.31764903217
>    end for
>    end for
>    printf(1, "i = %d, time = %f\n", {i, time() - t1})
> end for
> printf(1, "total time = %f\n", time() - t0)

I confirmed that over 99% of the time is spent inside
Watcom C's (DOS) malloc() routine. It appears as
if Watcom's malloc() is searching through all of the
millions of allocated blocks (mostly floating-point numbers),
so the time increases linearly as more blocks are allocated.
I know Watcom's free() routine simply marks a block as "free"
by setting a bit. That's all it does. So maybe Watcom does not
maintain a linked list of free blocks like most other malloc's,
but rather, searches *all* the blocks looking for one that's free.

When I run this with Euphoria for DOS as compiled by *DJGPP*,
the program runs really fast, with no linear increase in time.

When I run it with exw, it runs really fast, but there's still some
small linear increase.

> 2) I attempted to improve the performance by the following means, without
> succeeding. Its timing also increases linearly with the iteration count.
> Please explain why it doesn't work:

I assume malloc is the culprit here too.

> 3) Finally, I succeeded in improving radically the performace with the
> following approach (in my PC it runs in 92 seconds, against hours with the
> other methods):

You slowly initialize everything, and then you start
replacing values, one by one. The replacements are fast because
for each item that you allocate, another one of the same size
is freed. I have a "storage cache" that ensures that small blocks
will be allocated and freed very quickly in this case. malloc
probably isn't even called.

Every implementation of malloc is different.
Everyone approaches the problem thinking he
can do it better than everyone else.
I have no intention of rewriting Watcom's malloc,
even though the source code is out there somewhere.

> 4) But that is not all! If you substitute constant COLUMNS = 999
> in place of constant COLUMNS = 1000, instead of decreasing,
> the execution time grows enormously!

I couldn't duplicate this one, so I can't explain it.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu