1. RE: De-allocating memory - time issue

Well, if you ain't got the RAM, you ain't got the RAM and nothing is 
going to help too much, but I recently wrote some code to simulate 
multi-dimensional sequences in direct memory.

I had a 3-dimensional sequence with dimensions something like 50 x 100 x 
3000 giving me room to store 15 millions atoms. (Which I had enough RAM 
for, so disk-swapping was not the issue.) Just trying to fill it took 10 
minutes or so, even if I pre-allocated it.  And trying to grab or update 
values was incredibly slow.

Seeing as it was of fixed size once created (although the elements 
needed to be updated all the time) I figured I could just poke & peek 
the values directly into memory.  But I needed the ability to reference 
an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions 
that calculated the correct index as poked or peeked it for me.

Doing it this way is obviously slower than using a normal sequence of 
normal size, and the emulated sequence needs to be of fixed size, but 
for giant multi-dimensional sequences there is no comparison as my code 
runs the same speed no matter how big the sequence you're emulating (as 
long as you've got the memory).

If anyone is interested, I can prepare the code for public consumption 
and upload it...

-- Andy


petelomax at blueyonder.co.uk wrote:
> I have a *HUGE* object of some 10 million entries. (I now know the
> exact size 'cos I coded a status window to let me know)
> [please read this all the way thru before responding]
> 
> Given that I have 48MB of ram, that 10 million bears resemblance to me
> running out of physical memory (4 bytes per integer, yep), and the ole
> PC begins to disk thrash.
> 
> So, I code a "Cancel" button, press it and the program waddles on
> longer than expected.
> 
> So I trace it to the line
> 
> 	 stack={}
> 
> That line alone takes 3 minutes to execute.
> 
> Now, I kind of understand there are difficulties abound in the general
> case deallocating ten million entries, most of which are copies of
> copies of copies (however many it takes to get to ten million).
> 
> stack is an array of approx repeat({},55000) of which only the middle
> 3200 have been updated, however each such that stack[i] is set to
> stack[i+/-1]&{x,y}.
> 
> So I guess reference counts are off the scale.
> 
> The annoying point here, then, assuming it *is* the problem, is that
> the ref count handling is, in this unusual case, wholly irrelevant.
> 
> I definately doubt there is any way to improve this, it really is not
> any kind of major problem in any sense, but on the off-chance, I am
> just asking if anyone has any suggestions.
> 
> Thanks
> Pete
> 
>

new topic     » topic index » view message » categorize

2. RE: De-allocating memory - time issue

Hi Pete/Andy,

Not sure if this is relevant or helpful but I came across a good article 
on memory-mapped files the other day at:

http://freespace.virgin.net/james.brown7/tuts/bigmem01.htm

This was in the context of writing a hex editor for very large 
(gigabyte+) files under MS windows.  Portions of the files are mapped 
into a reserved memory range (in 64k chunks) and it is all set up so 
that changes applied to this area of memory are automatically written to 
the file for you. The author describes how you can write code to make it 
look as if the whole file is held in memory from the point of view of an 
application.

Whether or not this is specifically useful it is a very interesting site 
for Windows programming info.

Regards,

Phil Russell

new topic     » goto parent     » topic index » view message » categorize

3. RE: De-allocating memory - time issue

Andy:
I tried your example using the following program. For me, it lasted 21
seconds. I own an AMD 500 with 256 Mb RAM.
So, I suspect your timings are a consequence of using virtual RAM. Other
cause of slowness would be changing the size of the sequence's elements, say
intializing them with 0 and afterwards replacing some data with
non-Euphoria-integer data.
Regards.

sequence t
t = repeat(repeat(repeat(0, 3000), 100), 50)
for i = 1 to 50 do
    for j = 1 to 100 do
 for k = 1 to 3000 do
     t[i][j][k] = i * j * k
 end for
    end for
end for

----- Original Message -----
From: Andy Serpa <renegade at earthling.net>
To: EUforum <EUforum at topica.com>
Sent: Friday, September 20, 2002 3:43 AM
Subject: RE: De-allocating memory - time issue


>
> Well, if you ain't got the RAM, you ain't got the RAM and nothing is
> going to help too much, but I recently wrote some code to simulate
> multi-dimensional sequences in direct memory.
>
> I had a 3-dimensional sequence with dimensions something like 50 x 100 x
> 3000 giving me room to store 15 millions atoms. (Which I had enough RAM
> for, so disk-swapping was not the issue.) Just trying to fill it took 10
> minutes or so, even if I pre-allocated it.  And trying to grab or update
> values was incredibly slow.
>
> Seeing as it was of fixed size once created (although the elements
> needed to be updated all the time) I figured I could just poke & peek
> the values directly into memory.  But I needed the ability to reference
> an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions
> that calculated the correct index as poked or peeked it for me.
>
> Doing it this way is obviously slower than using a normal sequence of
> normal size, and the emulated sequence needs to be of fixed size, but
> for giant multi-dimensional sequences there is no comparison as my code
> runs the same speed no matter how big the sequence you're emulating (as
> long as you've got the memory).
>
> If anyone is interested, I can prepare the code for public consumption
> and upload it...
>
> -- Andy
>
>
> petelomax at blueyonder.co.uk wrote:
> > I have a *HUGE* object of some 10 million entries. (I now know the
> > exact size 'cos I coded a status window to let me know)
> > [please read this all the way thru before responding]
> >
> > Given that I have 48MB of ram, that 10 million bears resemblance to me
> > running out of physical memory (4 bytes per integer, yep), and the ole
> > PC begins to disk thrash.
> >
> > So, I code a "Cancel" button, press it and the program waddles on
> > longer than expected.
> >
> > So I trace it to the line
> >
> > stack={}
> >
> > That line alone takes 3 minutes to execute.
> >
> > Now, I kind of understand there are difficulties abound in the general
> > case deallocating ten million entries, most of which are copies of
> > copies of copies (however many it takes to get to ten million).
> >
> > stack is an array of approx repeat({},55000) of which only the middle
> > 3200 have been updated, however each such that stack[i] is set to
> > stack[i+/-1]&{x,y}.
> >
> > So I guess reference counts are off the scale.
> >
> > The annoying point here, then, assuming it *is* the problem, is that
> > the ref count handling is, in this unusual case, wholly irrelevant.
> >
> > I definately doubt there is any way to improve this, it really is not
> > any kind of major problem in any sense, but on the off-chance, I am
> > just asking if anyone has any suggestions.
> >
> > Thanks
> > Pete
> >
> >
>
>
>

new topic     » goto parent     » topic index » view message » categorize

4. RE: De-allocating memory - time issue

Try it with floating point values.


rforno at tutopia.com wrote:
> Andy:
> I tried your example using the following program. For me, it lasted 21
> seconds. I own an AMD 500 with 256 Mb RAM.
> So, I suspect your timings are a consequence of using virtual RAM. Other
> cause of slowness would be changing the size of the sequence's elements, 
> say
> intializing them with 0 and afterwards replacing some data with
> non-Euphoria-integer data.
> Regards.
> 
> sequence t
> t = repeat(repeat(repeat(0, 3000), 100), 50)
> for i = 1 to 50 do
>     for j = 1 to 100 do
>  for k = 1 to 3000 do
>      t[i][j][k] = i * j * k
>  end for
>     end for
> end for
> 
> ----- Original Message -----
> From: Andy Serpa <renegade at earthling.net>
> To: EUforum <EUforum at topica.com>
> Sent: Friday, September 20, 2002 3:43 AM
> Subject: RE: De-allocating memory - time issue
> 
> 
> > Well, if you ain't got the RAM, you ain't got the RAM and nothing is
> > going to help too much, but I recently wrote some code to simulate
> > multi-dimensional sequences in direct memory.
> >
> > I had a 3-dimensional sequence with dimensions something like 50 x 100 x
> > 3000 giving me room to store 15 millions atoms. (Which I had enough RAM
> > for, so disk-swapping was not the issue.) Just trying to fill it took 10
> > minutes or so, even if I pre-allocated it.  And trying to grab or update
> > values was incredibly slow.
> >
> > Seeing as it was of fixed size once created (although the elements
> > needed to be updated all the time) I figured I could just poke & peek
> > the values directly into memory.  But I needed the ability to reference
> > an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions
> > that calculated the correct index as poked or peeked it for me.
> >
> > Doing it this way is obviously slower than using a normal sequence of
> > normal size, and the emulated sequence needs to be of fixed size, but
> > for giant multi-dimensional sequences there is no comparison as my code
> > runs the same speed no matter how big the sequence you're emulating (as
> > long as you've got the memory).
> >
> > If anyone is interested, I can prepare the code for public consumption
> > and upload it...
> >
> > -- Andy
> >
> >
> > petelomax at blueyonder.co.uk wrote:
> > > I have a *HUGE* object of some 10 million entries. (I now know the
> > > exact size 'cos I coded a status window to let me know)
> > > [please read this all the way thru before responding]
> > >
> > > Given that I have 48MB of ram, that 10 million bears resemblance to me
> > > running out of physical memory (4 bytes per integer, yep), and the ole
> > > PC begins to disk thrash.
> > >
> > > So, I code a "Cancel" button, press it and the program waddles on
> > > longer than expected.
> > >
> > > So I trace it to the line
> > >
> > > stack={}
> > >
> > > That line alone takes 3 minutes to execute.
> > >
> > > Now, I kind of understand there are difficulties abound in the general
> > > case deallocating ten million entries, most of which are copies of
> > > copies of copies (however many it takes to get to ten million).
> > >
> > > stack is an array of approx repeat({},55000) of which only the middle
> > > 3200 have been updated, however each such that stack[i] is set to
> > > stack[i+/-1]&{x,y}.
> > >
> > > So I guess reference counts are off the scale.
> > >
> > > The annoying point here, then, assuming it *is* the problem, is that
> > > the ref count handling is, in this unusual case, wholly irrelevant.
> > >
> > > I definately doubt there is any way to improve this, it really is not
> > > any kind of major problem in any sense, but on the off-chance, I am
> > > just asking if anyone has any suggestions.
> > >
> > > Thanks
> > > Pete
> > >
> > >
<snip>

new topic     » goto parent     » topic index » view message » categorize

5. RE: De-allocating memory - time issue

Andy:
I tried it as follows, and it lasted 23 seconds:

sequence t
t = repeat(repeat(repeat(1.13218966051, 3000), 100), 50)
for i = 1 to 50 do
    for j = 1 to 100 do
 for k = 1 to 3000 do
     t[i][j][k] = 1.31764903217
 end for
    end for
end for

But if I perform a floating point calculation like:
t[i][j][k] = (i * j * k) * 1.31764903217
it lasts a long while (I stopped it before getting the timing because I was
tired of waiting...)
Maybe Rob can explain the difference. Of course, there should be a
difference, but not so big, I think.
Regards.

----- Original Message -----
From: Andy Serpa <renegade at earthling.net>
To: EUforum <EUforum at topica.com>
Sent: Saturday, September 21, 2002 10:03 AM
Subject: RE: De-allocating memory - time issue


>
>
> Try it with floating point values.
>
>
> rforno at tutopia.com wrote:
> > Andy:
> > I tried your example using the following program. For me, it lasted 21
> > seconds. I own an AMD 500 with 256 Mb RAM.
> > So, I suspect your timings are a consequence of using virtual RAM. Other
> > cause of slowness would be changing the size of the sequence's elements,
> > say
> > intializing them with 0 and afterwards replacing some data with
> > non-Euphoria-integer data.
> > Regards.
> >
> > sequence t
> > t = repeat(repeat(repeat(0, 3000), 100), 50)
> > for i = 1 to 50 do
> >     for j = 1 to 100 do
> >  for k = 1 to 3000 do
> >      t[i][j][k] = i * j * k
> >  end for
> >     end for
> > end for
> >
> > ----- Original Message -----
> > From: Andy Serpa <renegade at earthling.net>
> > To: EUforum <EUforum at topica.com>
> > Sent: Friday, September 20, 2002 3:43 AM
> > Subject: RE: De-allocating memory - time issue
> >
> >
> > > Well, if you ain't got the RAM, you ain't got the RAM and nothing is
> > > going to help too much, but I recently wrote some code to simulate
> > > multi-dimensional sequences in direct memory.
> > >
> > > I had a 3-dimensional sequence with dimensions something like 50 x 100
x
> > > 3000 giving me room to store 15 millions atoms. (Which I had enough
RAM
> > > for, so disk-swapping was not the issue.) Just trying to fill it took
10
> > > minutes or so, even if I pre-allocated it.  And trying to grab or
update
> > > values was incredibly slow.
> > >
> > > Seeing as it was of fixed size once created (although the elements
> > > needed to be updated all the time) I figured I could just poke & peek
> > > the values directly into memory.  But I needed the ability to
reference
> > > an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions
> > > that calculated the correct index as poked or peeked it for me.
> > >
> > > Doing it this way is obviously slower than using a normal sequence of
> > > normal size, and the emulated sequence needs to be of fixed size, but
> > > for giant multi-dimensional sequences there is no comparison as my
code
> > > runs the same speed no matter how big the sequence you're emulating
(as
> > > long as you've got the memory).
> > >
> > > If anyone is interested, I can prepare the code for public consumption
> > > and upload it...
> > >
> > > -- Andy
> > >
> > >
> > > petelomax at blueyonder.co.uk wrote:
> > > > I have a *HUGE* object of some 10 million entries. (I now know the
> > > > exact size 'cos I coded a status window to let me know)
> > > > [please read this all the way thru before responding]
> > > >
> > > > Given that I have 48MB of ram, that 10 million bears resemblance to
me
> > > > running out of physical memory (4 bytes per integer, yep), and the
ole
> > > > PC begins to disk thrash.
> > > >
> > > > So, I code a "Cancel" button, press it and the program waddles on
> > > > longer than expected.
> > > >
> > > > So I trace it to the line
> > > >
> > > > stack={}
> > > >
> > > > That line alone takes 3 minutes to execute.
> > > >
> > > > Now, I kind of understand there are difficulties abound in the
general
> > > > case deallocating ten million entries, most of which are copies of
> > > > copies of copies (however many it takes to get to ten million).
> > > >
> > > > stack is an array of approx repeat({},55000) of which only the
middle
> > > > 3200 have been updated, however each such that stack[i] is set to
> > > > stack[i+/-1]&{x,y}.
> > > >
> > > > So I guess reference counts are off the scale.
> > > >
> > > > The annoying point here, then, assuming it *is* the problem, is that
> > > > the ref count handling is, in this unusual case, wholly irrelevant.
> > > >
> > > > I definately doubt there is any way to improve this, it really is
not
> > > > any kind of major problem in any sense, but on the off-chance, I am
> > > > just asking if anyone has any suggestions.
> > > >
> > > > Thanks
> > > > Pete
> > > >
> > > >
> <snip>
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

6. RE: De-allocating memory - time issue

Rob:
Many thanks for your explanation. Now I understand a lot more about Euphoria
internals.
Maybe it would be useful to incorporate this kind of performance information
to the manuals. It is possible that I overlooked it, but I don't remember
having seen it in the manuals, nor in the forum, at least at this level of
detail.
I have some additional questions:

1) Why does the timing for each main iteration increases linearly as the
iteration number grows, in the following program?

constant COLUMNS = 1000
constant ROWS = 100
constant PLANES = 50
atom t0, t1
sequence t
t0 = time()
t = repeat(repeat(repeat(1.13218966051, COLUMNS ), ROWS), PLANES)
printf(1, "initialization time = %f\n", time() - t0)
for i = 1 to PLANES do
    t1 = time()
    for j = 1 to ROWS do
    for k = 1 to COLUMNS do
        t[i][j][k] = (i * j * k) * 1.31764903217
    end for
    end for
    printf(1, "i = %d, time = %f\n", {i, time() - t1})
end for
printf(1, "total time = %f\n", time() - t0)

2) I attempted to improve the performance by the following means, without
succeeding. Its timing also increases linearly with the iteration count.
Please explain why it doesn't work:

constant COLUMNS = 1000
constant ROWS = 100
constant PLANES = 50
atom t0, t1
sequence t, s1, s2
t0 = time()
s1 = {}
for k = 1 to COLUMNS do
    s1 &= k + 1.3217097
end for
printf(1, "initialization time 1 = %f\n", time() - t0)
s2 = {}
for j = 1 to ROWS do
    s2 = append(s2, s1 + j)
end for
printf(1, "initialization time 1 + 2 = %f\n", time() - t0)
t = {}
for i = 1 to PLANES do
    t1 = time()
        t = append(t, s2 + i)
    printf(1, "i = %d, time init 3 = %f\n", {i, time() - t1})
end for
printf(1, "initialization time 1 + 2 + 3 = %f\n", time() - t0)
for i = 1 to PLANES do
    t1 = time()
    for j = 1 to ROWS do
        for k = 1 to COLUMNS do
            t[i][j][k] = (i * j * k) * 1.31764903217
        end for
    end for
    printf(1, "i = %d, time = %f\n", {i, time() - t1})
end for
printf(1, "total time = %f\n", time() - t0)

3) Finally, I succeeded in improving radically the performace with the
following approach (in my PC it runs in 92 seconds, against hours with the
other methods):

constant COLUMNS = 1000
constant ROWS = 100
constant PLANES = 50
atom t0, t1
sequence t
t0 = time()
t = repeat(repeat(0, ROWS), PLANES)
for i = 1 to PLANES do
    t1 = time()
    for j = 1 to ROWS do
        t[i][j] = rand(repeat(10000, COLUMNS)) / rand(10000)
    end for
    printf(1, "i = %d, time init = %f\n", {i, time() - t1})
end for
printf(1, "initialization time = %f\n", time() - t0)
for i = 1 to PLANES do
    t1 = time()
    for j = 1 to ROWS do
        for k = 1 to COLUMNS do
            t[i][j][k] = (i * j * k) * 1.31764903217
        end for
    end for
    printf(1, "i = %d, time = %f\n", {i, time() - t1})
end for
printf(1, "total time = %f\n", time() - t0)

4) But that is not all! If you substitute constant COLUMNS = 999 in place of
constant COLUMNS = 1000, instead of decreasing, the execution time grows
enormously! Substitute instead 1001, and you get nearly no change. This goes
to the point where COLUMNS = 100 is slower than COLUMNS = 1000. With some
other modifications, at a certain point the program enters what it seems a
dormant status.

I think this is an interesting way to solve this kind of performance
problems, but maybe this can be better addressed by a modification to the
Euphoria interpreter to have an option to allocate space to a sequence such
that every atom in it occupies storage capable of holding a floating point
item. Just my 2 cents...:)

Regards.

----- Original Message -----
From: Robert Craig <rds at RapidEuphoria.com>
To: EUforum <EUforum at topica.com>
Sent: Sunday, September 22, 2002 2:46 AM
Subject: Re: De-allocating memory - time issue


>
> rforno writes:
> > t = repeat(repeat(repeat(1.13218966051, 3000), 100), 50)
>
> After executing this (type of) statement, in almost
> any language but Euphoria, you would have 3000x100x50 = 15 million
> (8-byte or more) copies of 1.13218966051 stored in memory.
> In Euphoria, you will have 3000 4-byte pointers to a *single*
> copy of 1.13218966051. You will also have 100 4-byte pointers
> to a *single* copy of the 3000-element sequence. You will also
> have 50 4-byte pointers to a *single* copy of  the 100-element sequence.
> Instead of using 15 million x 8 bytes = 120,000,000 bytes of storage,
> Euphoria will use only 3000x4+100x4+50x4 = 12,600 bytes
> plus the storage for *one* copy of 1.13218966051.
>
> > for i = 1 to 50 do
> >    for j = 1 to 100 do
> > for k = 1 to 3000 do
> >     t[i][j][k] = 1.31764903217
> > end for
> >    end for
> > end for
>
> During execution of the above nested loops,
> copies will be made of all the sequences at each level.
> There will now be 15 million 4-byte pointers
> to a *single* copy of 1.31764903217.
> So you are still using much less memory than other languages.
>
> > But if I perform a floating point calculation like:
> > t[i][j][k] = (i * j * k) * 1.31764903217
> > it lasts a long while
>
> Now it's no longer possible to have lots of pointers
> to a single floating-point value. You are creating a
> different f.p. value to store in each element. So you
> have created 15 million new floating-point values.
> Each floating-point value needs 8 bytes plus 4 bytes
> for a reference count, plus 4 bytes for malloc overhead.
> So you suddenly need 15 million x 16 = 240,000,000 bytes
> of *extra* storage in this case. Unless you have a lot of RAM
> on your machine, you will be faced with some very serious
> paging activity that will destroy your performance.
>
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu