1. De-allocating memory - time issue
- Posted by petelomax at blueyonder.co.uk Sep 18, 2002
- 514 views
I have a *HUGE* object of some 10 million entries. (I now know the exact size 'cos I coded a status window to let me know) [please read this all the way thru before responding] Given that I have 48MB of ram, that 10 million bears resemblance to me running out of physical memory (4 bytes per integer, yep), and the ole PC begins to disk thrash. So, I code a "Cancel" button, press it and the program waddles on longer than expected. So I trace it to the line stack={ That line alone takes 3 minutes to execute. Now, I kind of understand there are difficulties abound in the general case deallocating ten million entries, most of which are copies of copies of copies (however many it takes to get to ten million). stack is an array of approx repeat({},55000) of which only the middle 3200 have been updated, however each such that stack[i] is set to stack[i+/-1]&{x,y. So I guess reference counts are off the scale. The annoying point here, then, assuming it *is* the problem, is that the ref count handling is, in this unusual case, wholly irrelevant. I definately doubt there is any way to improve this, it really is not any kind of major problem in any sense, but on the off-chance, I am just asking if anyone has any suggestions. Thanks Pete
2. Re: De-allocating memory - time issue
- Posted by Kat <kat at kogeijin.com> Sep 18, 2002
- 513 views
On 19 Sep 2002, at 1:17, petelomax at blueyonder.co.uk wrote: > > I have a *HUGE* object of some 10 million entries. (I now know the > exact size 'cos I coded a status window to let me know) > [please read this all the way thru before responding] > > Given that I have 48MB of ram, that 10 million bears resemblance to me > running out of physical memory (4 bytes per integer, yep), and the ole > PC begins to disk thrash. > > So, I code a "Cancel" button, press it and the program waddles on > longer than expected. > > So I trace it to the line > > stack={} > > That line alone takes 3 minutes to execute. > > Now, I kind of understand there are difficulties abound in the general > case deallocating ten million entries, most of which are copies of > copies of copies (however many it takes to get to ten million). > > stack is an array of approx repeat({},55000) of which only the middle > 3200 have been updated, however each such that stack[i] is set to > stack[i+/-1]&{x,y}. > > So I guess reference counts are off the scale. > > The annoying point here, then, assuming it *is* the problem, is that > the ref count handling is, in this unusual case, wholly irrelevant. > > I definately doubt there is any way to improve this, it really is not > any kind of major problem in any sense, but on the off-chance, I am > just asking if anyone has any suggestions. Once you run out of ram, you are at the mercy of the OS and disk speeds. I'd point you to the Bladed Beowulf site, but the DOE took it down about 2 days before 9-11. Not that i can afford that either, assuming it ever makes it into production, but i ran across it while looking into a similar problem as your's: the problem doesn't fit into the computer, and no single cpu/ram combination is going to be fast enough. Just last night, one program ate 304megabytes, and stopped itself. I still haven't figured out how it used so much ram. Kat
3. Re: De-allocating memory - time issue
- Posted by Dan Moyer <DANIELMOYER at prodigy.net> Sep 21, 2002
- 512 views
And in case it might be helpful, I did Ricardo's tests, but with only 100 in the inner loop, to reduce the length of time required, and it took 355 times longer to do the multiplication than the simple put value in sequence; that would suggest it would have taken almost 2.5 hours to complete the multiplication test if he'd let it go on. Dan Moyer ----- Original Message ----- From: <rforno at tutopia.com> To: "EUforum" <EUforum at topica.com> Subject: RE: De-allocating memory - time issue > > Andy: > I tried it as follows, and it lasted 23 seconds: > > sequence t > t = repeat(repeat(repeat(1.13218966051, 3000), 100), 50) > for i = 1 to 50 do > for j = 1 to 100 do > for k = 1 to 3000 do > t[i][j][k] = 1.31764903217 > end for > end for > end for > > But if I perform a floating point calculation like: > t[i][j][k] = (i * j * k) * 1.31764903217 > it lasts a long while (I stopped it before getting the timing because I was > tired of waiting...) > Maybe Rob can explain the difference. Of course, there should be a > difference, but not so big, I think. > Regards. > > ----- Original Message ----- > From: Andy Serpa <renegade at earthling.net> > To: EUforum <EUforum at topica.com> > Sent: Saturday, September 21, 2002 10:03 AM > Subject: RE: De-allocating memory - time issue > > > > Try it with floating point values. > > > > > > rforno at tutopia.com wrote: > > > Andy: > > > I tried your example using the following program. For me, it lasted 21 > > > seconds. I own an AMD 500 with 256 Mb RAM. > > > So, I suspect your timings are a consequence of using virtual RAM. Other > > > cause of slowness would be changing the size of the sequence's elements, > > > say > > > intializing them with 0 and afterwards replacing some data with > > > non-Euphoria-integer data. > > > Regards. > > > > > > sequence t > > > t = repeat(repeat(repeat(0, 3000), 100), 50) > > > for i = 1 to 50 do > > > for j = 1 to 100 do > > > for k = 1 to 3000 do > > > t[i][j][k] = i * j * k > > > end for > > > end for > > > end for > > > > > > ----- Original Message ----- > > > From: Andy Serpa <renegade at earthling.net> > > > To: EUforum <EUforum at topica.com> > > > Sent: Friday, September 20, 2002 3:43 AM > > > Subject: RE: De-allocating memory - time issue > > > > > > > > > > Well, if you ain't got the RAM, you ain't got the RAM and nothing is > > > > going to help too much, but I recently wrote some code to simulate > > > > multi-dimensional sequences in direct memory. > > > > > > > > I had a 3-dimensional sequence with dimensions something like 50 x 100 > x > > > > 3000 giving me room to store 15 millions atoms. (Which I had enough > RAM > > > > for, so disk-swapping was not the issue.) Just trying to fill it took > 10 > > > > minutes or so, even if I pre-allocated it. And trying to grab or > update > > > > values was incredibly slow. > > > > > > > > Seeing as it was of fixed size once created (although the elements > > > > needed to be updated all the time) I figured I could just poke & peek > > > > the values directly into memory. But I needed the ability to > reference > > > > an index in 3 dimensions (e.g. {25,2,1500}) so I wrote some functions > > > > that calculated the correct index as poked or peeked it for me. > > > > > > > > Doing it this way is obviously slower than using a normal sequence of > > > > normal size, and the emulated sequence needs to be of fixed size, but > > > > for giant multi-dimensional sequences there is no comparison as my > code > > > > runs the same speed no matter how big the sequence you're emulating > (as > > > > long as you've got the memory). > > > > > > > > If anyone is interested, I can prepare the code for public consumption > > > > and upload it... > > > > > > > > -- Andy > > > > > > > > > > > > petelomax at blueyonder.co.uk wrote: > > > > > I have a *HUGE* object of some 10 million entries. (I now know the > > > > > exact size 'cos I coded a status window to let me know) > > > > > [please read this all the way thru before responding] > > > > > > > > > > Given that I have 48MB of ram, that 10 million bears resemblance to > me > > > > > running out of physical memory (4 bytes per integer, yep), and the > ole > > > > > PC begins to disk thrash. > > > > > > > > > > So, I code a "Cancel" button, press it and the program waddles on > > > > > longer than expected. > > > > > > > > > > So I trace it to the line > > > > > > > > > > stack={} > > > > > > > > > > That line alone takes 3 minutes to execute. > > > > > > > > > > Now, I kind of understand there are difficulties abound in the > general > > > > > case deallocating ten million entries, most of which are copies of > > > > > copies of copies (however many it takes to get to ten million). > > > > > > > > > > stack is an array of approx repeat({},55000) of which only the > middle > > > > > 3200 have been updated, however each such that stack[i] is set to > > > > > stack[i+/-1]&{x,y}. > > > > > > > > > > So I guess reference counts are off the scale. > > > > > > > > > > The annoying point here, then, assuming it *is* the problem, is that > > > > > the ref count handling is, in this unusual case, wholly irrelevant. > > > > > > > > > > I definately doubt there is any way to improve this, it really is > not > > > > > any kind of major problem in any sense, but on the off-chance, I am > > > > > just asking if anyone has any suggestions. > > > > > > > > > > Thanks > > > > > Pete > > > > > > > > > > > > <snip> > > > > > > >
4. Re: De-allocating memory - time issue
- Posted by Robert Craig <rds at RapidEuphoria.com> Sep 21, 2002
- 522 views
rforno writes: > t = repeat(repeat(repeat(1.13218966051, 3000), 100), 50) After executing this (type of) statement, in almost any language but Euphoria, you would have 3000x100x50 = 15 million (8-byte or more) copies of 1.13218966051 stored in memory. In Euphoria, you will have 3000 4-byte pointers to a *single* copy of 1.13218966051. You will also have 100 4-byte pointers to a *single* copy of the 3000-element sequence. You will also have 50 4-byte pointers to a *single* copy of the 100-element sequence. Instead of using 15 million x 8 bytes = 120,000,000 bytes of storage, Euphoria will use only 3000x4+100x4+50x4 = 12,600 bytes plus the storage for *one* copy of 1.13218966051. > for i = 1 to 50 do > for j = 1 to 100 do > for k = 1 to 3000 do > t[i][j][k] = 1.31764903217 > end for > end for > end for During execution of the above nested loops, copies will be made of all the sequences at each level. There will now be 15 million 4-byte pointers to a *single* copy of 1.31764903217. So you are still using much less memory than other languages. > But if I perform a floating point calculation like: > t[i][j][k] = (i * j * k) * 1.31764903217 > it lasts a long while Now it's no longer possible to have lots of pointers to a single floating-point value. You are creating a different f.p. value to store in each element. So you have created 15 million new floating-point values. Each floating-point value needs 8 bytes plus 4 bytes for a reference count, plus 4 bytes for malloc overhead. So you suddenly need 15 million x 16 = 240,000,000 bytes of *extra* storage in this case. Unless you have a lot of RAM on your machine, you will be faced with some very serious paging activity that will destroy your performance. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
5. Re: De-allocating memory - time issue
- Posted by Robert Craig <rds at RapidEuphoria.com> Sep 23, 2002
- 538 views
rforno writes: > I have some additional questions: > > 1) Why does the timing for each main iteration increases linearly as the > iteration number grows, in the following program? > > constant COLUMNS = 1000 > constant ROWS = 100 > constant PLANES = 50 > atom t0, t1 > sequence t > t0 = time() > t = repeat(repeat(repeat(1.13218966051, COLUMNS ), ROWS), PLANES) > printf(1, "initialization time = %f\n", time() - t0) > for i = 1 to PLANES do > t1 = time() > for j = 1 to ROWS do > for k = 1 to COLUMNS do > t[i][j][k] = (i * j * k) * 1.31764903217 > end for > end for > printf(1, "i = %d, time = %f\n", {i, time() - t1}) > end for > printf(1, "total time = %f\n", time() - t0) I confirmed that over 99% of the time is spent inside Watcom C's (DOS) malloc() routine. It appears as if Watcom's malloc() is searching through all of the millions of allocated blocks (mostly floating-point numbers), so the time increases linearly as more blocks are allocated. I know Watcom's free() routine simply marks a block as "free" by setting a bit. That's all it does. So maybe Watcom does not maintain a linked list of free blocks like most other malloc's, but rather, searches *all* the blocks looking for one that's free. When I run this with Euphoria for DOS as compiled by *DJGPP*, the program runs really fast, with no linear increase in time. When I run it with exw, it runs really fast, but there's still some small linear increase. > 2) I attempted to improve the performance by the following means, without > succeeding. Its timing also increases linearly with the iteration count. > Please explain why it doesn't work: I assume malloc is the culprit here too. > 3) Finally, I succeeded in improving radically the performace with the > following approach (in my PC it runs in 92 seconds, against hours with the > other methods): You slowly initialize everything, and then you start replacing values, one by one. The replacements are fast because for each item that you allocate, another one of the same size is freed. I have a "storage cache" that ensures that small blocks will be allocated and freed very quickly in this case. malloc probably isn't even called. Every implementation of malloc is different. Everyone approaches the problem thinking he can do it better than everyone else. I have no intention of rewriting Watcom's malloc, even though the source code is out there somewhere. > 4) But that is not all! If you substitute constant COLUMNS = 999 > in place of constant COLUMNS = 1000, instead of decreasing, > the execution time grows enormously! I couldn't duplicate this one, so I can't explain it. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com