1. RE: De-allocating memory - time issue
- Posted by Andy Serpa <renegade at earthling.net> Sep 19, 2002
- 577 views
Well, if you ain't got the RAM, you ain't got the RAM and nothing is going to help too much, but I recently wrote some code to simulate multi-dimensional sequences in direct memory. I had a 3-dimensional sequence with dimensions something like 50 x 100 x 3000 giving me room to store 15 millions atoms. (Which I had enough RAM for, so disk-swapping was not the issue.) Just trying to fill it took 10 minutes or so, even if I pre-allocated it. And trying to grab or update values was incredibly slow. Seeing as it was of fixed size once created (although the elements needed to be updated all the time) I figured I could just poke & peek the values directly into memory. But I needed the ability to reference an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions that calculated the correct index as poked or peeked it for me. Doing it this way is obviously slower than using a normal sequence of normal size, and the emulated sequence needs to be of fixed size, but for giant multi-dimensional sequences there is no comparison as my code runs the same speed no matter how big the sequence you're emulating (as long as you've got the memory). If anyone is interested, I can prepare the code for public consumption and upload it... -- Andy petelomax at blueyonder.co.uk wrote: > I have a *HUGE* object of some 10 million entries. (I now know the > exact size 'cos I coded a status window to let me know) > [please read this all the way thru before responding] > > Given that I have 48MB of ram, that 10 million bears resemblance to me > running out of physical memory (4 bytes per integer, yep), and the ole > PC begins to disk thrash. > > So, I code a "Cancel" button, press it and the program waddles on > longer than expected. > > So I trace it to the line > > stack={} > > That line alone takes 3 minutes to execute. > > Now, I kind of understand there are difficulties abound in the general > case deallocating ten million entries, most of which are copies of > copies of copies (however many it takes to get to ten million). > > stack is an array of approx repeat({},55000) of which only the middle > 3200 have been updated, however each such that stack[i] is set to > stack[i+/-1]&{x,y}. > > So I guess reference counts are off the scale. > > The annoying point here, then, assuming it *is* the problem, is that > the ref count handling is, in this unusual case, wholly irrelevant. > > I definately doubt there is any way to improve this, it really is not > any kind of major problem in any sense, but on the off-chance, I am > just asking if anyone has any suggestions. > > Thanks > Pete > >
2. RE: De-allocating memory - time issue
- Posted by Phil Russell <pg_russell at lineone.net> Sep 20, 2002
- 557 views
Hi Pete/Andy, Not sure if this is relevant or helpful but I came across a good article on memory-mapped files the other day at: http://freespace.virgin.net/james.brown7/tuts/bigmem01.htm This was in the context of writing a hex editor for very large (gigabyte+) files under MS windows. Portions of the files are mapped into a reserved memory range (in 64k chunks) and it is all set up so that changes applied to this area of memory are automatically written to the file for you. The author describes how you can write code to make it look as if the whole file is held in memory from the point of view of an application. Whether or not this is specifically useful it is a very interesting site for Windows programming info. Regards, Phil Russell
3. RE: De-allocating memory - time issue
- Posted by rforno at tutopia.com Sep 20, 2002
- 561 views
Andy: I tried your example using the following program. For me, it lasted 21 seconds. I own an AMD 500 with 256 Mb RAM. So, I suspect your timings are a consequence of using virtual RAM. Other cause of slowness would be changing the size of the sequence's elements, say intializing them with 0 and afterwards replacing some data with non-Euphoria-integer data. Regards. sequence t t = repeat(repeat(repeat(0, 3000), 100), 50) for i = 1 to 50 do for j = 1 to 100 do for k = 1 to 3000 do t[i][j][k] = i * j * k end for end for end for ----- Original Message ----- From: Andy Serpa <renegade at earthling.net> To: EUforum <EUforum at topica.com> Sent: Friday, September 20, 2002 3:43 AM Subject: RE: De-allocating memory - time issue > > Well, if you ain't got the RAM, you ain't got the RAM and nothing is > going to help too much, but I recently wrote some code to simulate > multi-dimensional sequences in direct memory. > > I had a 3-dimensional sequence with dimensions something like 50 x 100 x > 3000 giving me room to store 15 millions atoms. (Which I had enough RAM > for, so disk-swapping was not the issue.) Just trying to fill it took 10 > minutes or so, even if I pre-allocated it. And trying to grab or update > values was incredibly slow. > > Seeing as it was of fixed size once created (although the elements > needed to be updated all the time) I figured I could just poke & peek > the values directly into memory. But I needed the ability to reference > an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions > that calculated the correct index as poked or peeked it for me. > > Doing it this way is obviously slower than using a normal sequence of > normal size, and the emulated sequence needs to be of fixed size, but > for giant multi-dimensional sequences there is no comparison as my code > runs the same speed no matter how big the sequence you're emulating (as > long as you've got the memory). > > If anyone is interested, I can prepare the code for public consumption > and upload it... > > -- Andy > > > petelomax at blueyonder.co.uk wrote: > > I have a *HUGE* object of some 10 million entries. (I now know the > > exact size 'cos I coded a status window to let me know) > > [please read this all the way thru before responding] > > > > Given that I have 48MB of ram, that 10 million bears resemblance to me > > running out of physical memory (4 bytes per integer, yep), and the ole > > PC begins to disk thrash. > > > > So, I code a "Cancel" button, press it and the program waddles on > > longer than expected. > > > > So I trace it to the line > > > > stack={} > > > > That line alone takes 3 minutes to execute. > > > > Now, I kind of understand there are difficulties abound in the general > > case deallocating ten million entries, most of which are copies of > > copies of copies (however many it takes to get to ten million). > > > > stack is an array of approx repeat({},55000) of which only the middle > > 3200 have been updated, however each such that stack[i] is set to > > stack[i+/-1]&{x,y}. > > > > So I guess reference counts are off the scale. > > > > The annoying point here, then, assuming it *is* the problem, is that > > the ref count handling is, in this unusual case, wholly irrelevant. > > > > I definately doubt there is any way to improve this, it really is not > > any kind of major problem in any sense, but on the off-chance, I am > > just asking if anyone has any suggestions. > > > > Thanks > > Pete > > > > > > >
4. RE: De-allocating memory - time issue
- Posted by Andy Serpa <renegade at earthling.net> Sep 21, 2002
- 566 views
Try it with floating point values. rforno at tutopia.com wrote: > Andy: > I tried your example using the following program. For me, it lasted 21 > seconds. I own an AMD 500 with 256 Mb RAM. > So, I suspect your timings are a consequence of using virtual RAM. Other > cause of slowness would be changing the size of the sequence's elements, > say > intializing them with 0 and afterwards replacing some data with > non-Euphoria-integer data. > Regards. > > sequence t > t = repeat(repeat(repeat(0, 3000), 100), 50) > for i = 1 to 50 do > for j = 1 to 100 do > for k = 1 to 3000 do > t[i][j][k] = i * j * k > end for > end for > end for > > ----- Original Message ----- > From: Andy Serpa <renegade at earthling.net> > To: EUforum <EUforum at topica.com> > Sent: Friday, September 20, 2002 3:43 AM > Subject: RE: De-allocating memory - time issue > > > > Well, if you ain't got the RAM, you ain't got the RAM and nothing is > > going to help too much, but I recently wrote some code to simulate > > multi-dimensional sequences in direct memory. > > > > I had a 3-dimensional sequence with dimensions something like 50 x 100 x > > 3000 giving me room to store 15 millions atoms. (Which I had enough RAM > > for, so disk-swapping was not the issue.) Just trying to fill it took 10 > > minutes or so, even if I pre-allocated it. And trying to grab or update > > values was incredibly slow. > > > > Seeing as it was of fixed size once created (although the elements > > needed to be updated all the time) I figured I could just poke & peek > > the values directly into memory. But I needed the ability to reference > > an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions > > that calculated the correct index as poked or peeked it for me. > > > > Doing it this way is obviously slower than using a normal sequence of > > normal size, and the emulated sequence needs to be of fixed size, but > > for giant multi-dimensional sequences there is no comparison as my code > > runs the same speed no matter how big the sequence you're emulating (as > > long as you've got the memory). > > > > If anyone is interested, I can prepare the code for public consumption > > and upload it... > > > > -- Andy > > > > > > petelomax at blueyonder.co.uk wrote: > > > I have a *HUGE* object of some 10 million entries. (I now know the > > > exact size 'cos I coded a status window to let me know) > > > [please read this all the way thru before responding] > > > > > > Given that I have 48MB of ram, that 10 million bears resemblance to me > > > running out of physical memory (4 bytes per integer, yep), and the ole > > > PC begins to disk thrash. > > > > > > So, I code a "Cancel" button, press it and the program waddles on > > > longer than expected. > > > > > > So I trace it to the line > > > > > > stack={} > > > > > > That line alone takes 3 minutes to execute. > > > > > > Now, I kind of understand there are difficulties abound in the general > > > case deallocating ten million entries, most of which are copies of > > > copies of copies (however many it takes to get to ten million). > > > > > > stack is an array of approx repeat({},55000) of which only the middle > > > 3200 have been updated, however each such that stack[i] is set to > > > stack[i+/-1]&{x,y}. > > > > > > So I guess reference counts are off the scale. > > > > > > The annoying point here, then, assuming it *is* the problem, is that > > > the ref count handling is, in this unusual case, wholly irrelevant. > > > > > > I definately doubt there is any way to improve this, it really is not > > > any kind of major problem in any sense, but on the off-chance, I am > > > just asking if anyone has any suggestions. > > > > > > Thanks > > > Pete > > > > > > <snip>
5. RE: De-allocating memory - time issue
- Posted by rforno at tutopia.com Sep 21, 2002
- 563 views
Andy: I tried it as follows, and it lasted 23 seconds: sequence t t = repeat(repeat(repeat(1.13218966051, 3000), 100), 50) for i = 1 to 50 do for j = 1 to 100 do for k = 1 to 3000 do t[i][j][k] = 1.31764903217 end for end for end for But if I perform a floating point calculation like: t[i][j][k] = (i * j * k) * 1.31764903217 it lasts a long while (I stopped it before getting the timing because I was tired of waiting...) Maybe Rob can explain the difference. Of course, there should be a difference, but not so big, I think. Regards. ----- Original Message ----- From: Andy Serpa <renegade at earthling.net> To: EUforum <EUforum at topica.com> Sent: Saturday, September 21, 2002 10:03 AM Subject: RE: De-allocating memory - time issue > > > Try it with floating point values. > > > rforno at tutopia.com wrote: > > Andy: > > I tried your example using the following program. For me, it lasted 21 > > seconds. I own an AMD 500 with 256 Mb RAM. > > So, I suspect your timings are a consequence of using virtual RAM. Other > > cause of slowness would be changing the size of the sequence's elements, > > say > > intializing them with 0 and afterwards replacing some data with > > non-Euphoria-integer data. > > Regards. > > > > sequence t > > t = repeat(repeat(repeat(0, 3000), 100), 50) > > for i = 1 to 50 do > > for j = 1 to 100 do > > for k = 1 to 3000 do > > t[i][j][k] = i * j * k > > end for > > end for > > end for > > > > ----- Original Message ----- > > From: Andy Serpa <renegade at earthling.net> > > To: EUforum <EUforum at topica.com> > > Sent: Friday, September 20, 2002 3:43 AM > > Subject: RE: De-allocating memory - time issue > > > > > > > Well, if you ain't got the RAM, you ain't got the RAM and nothing is > > > going to help too much, but I recently wrote some code to simulate > > > multi-dimensional sequences in direct memory. > > > > > > I had a 3-dimensional sequence with dimensions something like 50 x 100 x > > > 3000 giving me room to store 15 millions atoms. (Which I had enough RAM > > > for, so disk-swapping was not the issue.) Just trying to fill it took 10 > > > minutes or so, even if I pre-allocated it. And trying to grab or update > > > values was incredibly slow. > > > > > > Seeing as it was of fixed size once created (although the elements > > > needed to be updated all the time) I figured I could just poke & peek > > > the values directly into memory. But I needed the ability to reference > > > an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions > > > that calculated the correct index as poked or peeked it for me. > > > > > > Doing it this way is obviously slower than using a normal sequence of > > > normal size, and the emulated sequence needs to be of fixed size, but > > > for giant multi-dimensional sequences there is no comparison as my code > > > runs the same speed no matter how big the sequence you're emulating (as > > > long as you've got the memory). > > > > > > If anyone is interested, I can prepare the code for public consumption > > > and upload it... > > > > > > -- Andy > > > > > > > > > petelomax at blueyonder.co.uk wrote: > > > > I have a *HUGE* object of some 10 million entries. (I now know the > > > > exact size 'cos I coded a status window to let me know) > > > > [please read this all the way thru before responding] > > > > > > > > Given that I have 48MB of ram, that 10 million bears resemblance to me > > > > running out of physical memory (4 bytes per integer, yep), and the ole > > > > PC begins to disk thrash. > > > > > > > > So, I code a "Cancel" button, press it and the program waddles on > > > > longer than expected. > > > > > > > > So I trace it to the line > > > > > > > > stack={} > > > > > > > > That line alone takes 3 minutes to execute. > > > > > > > > Now, I kind of understand there are difficulties abound in the general > > > > case deallocating ten million entries, most of which are copies of > > > > copies of copies (however many it takes to get to ten million). > > > > > > > > stack is an array of approx repeat({},55000) of which only the middle > > > > 3200 have been updated, however each such that stack[i] is set to > > > > stack[i+/-1]&{x,y}. > > > > > > > > So I guess reference counts are off the scale. > > > > > > > > The annoying point here, then, assuming it *is* the problem, is that > > > > the ref count handling is, in this unusual case, wholly irrelevant. > > > > > > > > I definately doubt there is any way to improve this, it really is not > > > > any kind of major problem in any sense, but on the off-chance, I am > > > > just asking if anyone has any suggestions. > > > > > > > > Thanks > > > > Pete > > > > > > > > > <snip> > > > >
6. RE: De-allocating memory - time issue
- Posted by rforno at tutopia.com Sep 22, 2002
- 557 views
Rob: Many thanks for your explanation. Now I understand a lot more about Euphoria internals. Maybe it would be useful to incorporate this kind of performance information to the manuals. It is possible that I overlooked it, but I don't remember having seen it in the manuals, nor in the forum, at least at this level of detail. I have some additional questions: 1) Why does the timing for each main iteration increases linearly as the iteration number grows, in the following program? constant COLUMNS = 1000 constant ROWS = 100 constant PLANES = 50 atom t0, t1 sequence t t0 = time() t = repeat(repeat(repeat(1.13218966051, COLUMNS ), ROWS), PLANES) printf(1, "initialization time = %f\n", time() - t0) for i = 1 to PLANES do t1 = time() for j = 1 to ROWS do for k = 1 to COLUMNS do t[i][j][k] = (i * j * k) * 1.31764903217 end for end for printf(1, "i = %d, time = %f\n", {i, time() - t1}) end for printf(1, "total time = %f\n", time() - t0) 2) I attempted to improve the performance by the following means, without succeeding. Its timing also increases linearly with the iteration count. Please explain why it doesn't work: constant COLUMNS = 1000 constant ROWS = 100 constant PLANES = 50 atom t0, t1 sequence t, s1, s2 t0 = time() s1 = {} for k = 1 to COLUMNS do s1 &= k + 1.3217097 end for printf(1, "initialization time 1 = %f\n", time() - t0) s2 = {} for j = 1 to ROWS do s2 = append(s2, s1 + j) end for printf(1, "initialization time 1 + 2 = %f\n", time() - t0) t = {} for i = 1 to PLANES do t1 = time() t = append(t, s2 + i) printf(1, "i = %d, time init 3 = %f\n", {i, time() - t1}) end for printf(1, "initialization time 1 + 2 + 3 = %f\n", time() - t0) for i = 1 to PLANES do t1 = time() for j = 1 to ROWS do for k = 1 to COLUMNS do t[i][j][k] = (i * j * k) * 1.31764903217 end for end for printf(1, "i = %d, time = %f\n", {i, time() - t1}) end for printf(1, "total time = %f\n", time() - t0) 3) Finally, I succeeded in improving radically the performace with the following approach (in my PC it runs in 92 seconds, against hours with the other methods): constant COLUMNS = 1000 constant ROWS = 100 constant PLANES = 50 atom t0, t1 sequence t t0 = time() t = repeat(repeat(0, ROWS), PLANES) for i = 1 to PLANES do t1 = time() for j = 1 to ROWS do t[i][j] = rand(repeat(10000, COLUMNS)) / rand(10000) end for printf(1, "i = %d, time init = %f\n", {i, time() - t1}) end for printf(1, "initialization time = %f\n", time() - t0) for i = 1 to PLANES do t1 = time() for j = 1 to ROWS do for k = 1 to COLUMNS do t[i][j][k] = (i * j * k) * 1.31764903217 end for end for printf(1, "i = %d, time = %f\n", {i, time() - t1}) end for printf(1, "total time = %f\n", time() - t0) 4) But that is not all! If you substitute constant COLUMNS = 999 in place of constant COLUMNS = 1000, instead of decreasing, the execution time grows enormously! Substitute instead 1001, and you get nearly no change. This goes to the point where COLUMNS = 100 is slower than COLUMNS = 1000. With some other modifications, at a certain point the program enters what it seems a dormant status. I think this is an interesting way to solve this kind of performance problems, but maybe this can be better addressed by a modification to the Euphoria interpreter to have an option to allocate space to a sequence such that every atom in it occupies storage capable of holding a floating point item. Just my 2 cents...:) Regards. ----- Original Message ----- From: Robert Craig <rds at RapidEuphoria.com> To: EUforum <EUforum at topica.com> Sent: Sunday, September 22, 2002 2:46 AM Subject: Re: De-allocating memory - time issue > > rforno writes: > > t = repeat(repeat(repeat(1.13218966051, 3000), 100), 50) > > After executing this (type of) statement, in almost > any language but Euphoria, you would have 3000x100x50 = 15 million > (8-byte or more) copies of 1.13218966051 stored in memory. > In Euphoria, you will have 3000 4-byte pointers to a *single* > copy of 1.13218966051. You will also have 100 4-byte pointers > to a *single* copy of the 3000-element sequence. You will also > have 50 4-byte pointers to a *single* copy of the 100-element sequence. > Instead of using 15 million x 8 bytes = 120,000,000 bytes of storage, > Euphoria will use only 3000x4+100x4+50x4 = 12,600 bytes > plus the storage for *one* copy of 1.13218966051. > > > for i = 1 to 50 do > > for j = 1 to 100 do > > for k = 1 to 3000 do > > t[i][j][k] = 1.31764903217 > > end for > > end for > > end for > > During execution of the above nested loops, > copies will be made of all the sequences at each level. > There will now be 15 million 4-byte pointers > to a *single* copy of 1.31764903217. > So you are still using much less memory than other languages. > > > But if I perform a floating point calculation like: > > t[i][j][k] = (i * j * k) * 1.31764903217 > > it lasts a long while > > Now it's no longer possible to have lots of pointers > to a single floating-point value. You are creating a > different f.p. value to store in each element. So you > have created 15 million new floating-point values. > Each floating-point value needs 8 bytes plus 4 bytes > for a reference count, plus 4 bytes for malloc overhead. > So you suddenly need 15 million x 16 = 240,000,000 bytes > of *extra* storage in this case. Unless you have a lot of RAM > on your machine, you will be faced with some very serious > paging activity that will destroy your performance. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > >