1. Dramatic slowdown -ping Rob
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Nov 13, 2004
- 586 views
I've narrowed down the unexpected performance hit I mentioned earlier today (Re: Contest Update (12-Nov-2004)) If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes 3.19 seconds, over tenfold performance drop. Why? Does anyone else get these results? You may need to massage the figures a little to account for memory/cpu differences. Rob, if you manage to duplicate this performance hit on 2.4, do you get the same on 2.5?
constant START = time() constant INNER=28 sequence table table=repeat(repeat(0,INNER),1024) integer used used=1 for i=1 to 45000 do used+=1 if used>length(table) then table&=repeat(repeat(0,INNER),1024) end if table[used][5]=used end for printf(1, "Elapsed time: %f\n", time() - START)
Regards, Pete PS the performance hit disappears if I initialise the table to 40,000 entries
2. Re: Dramatic slowdown -ping Rob
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 13, 2004
- 562 views
Pete Lomax wrote: > > I've narrowed down the unexpected performance hit I mentioned earlier > today (Re: Contest Update (12-Nov-2004)) > > If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes > 3.19 seconds, over tenfold performance drop. Why? > > Does anyone else get these results? You may need to massage the > figures a little to account for memory/cpu differences. Yes, I get the same effect. -- Derek Parnell Melbourne, Australia
3. Re: Dramatic slowdown -ping Rob
- Posted by Robert Craig <rds at RapidEuphoria.com> Nov 13, 2004
- 554 views
Pete Lomax wrote: > > I've narrowed down the unexpected performance hit I mentioned earlier > today (Re: Contest Update (12-Nov-2004)) > > If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes > 3.19 seconds, over tenfold performance drop. Why? > > Does anyone else get these results? You may need to massage the > figures a little to account for memory/cpu differences. > > Rob, if you manage to duplicate this performance hit on 2.4, do you > get the same on 2.5? > > }}} <eucode> > constant START = time() > constant INNER=28 > sequence table > table=repeat(repeat(0,INNER),1024) > integer used > used=1 > > for i=1 to 45000 do > used+=1 > if used>length(table) then table&=repeat(repeat(0,INNER),1024) end > if > table[used][5]=used > end for > > printf(1, "Elapsed time: %f\n", time() - START) > </eucode> {{{ > > Regards, > Pete > PS the performance hit disappears if I initialise the table to 40,000 > entries On my 1.8 GHz Pentium-4 machine with 256 Mb RAM, running ex.exe: Euphoria 2.4 27 0.94 28 1.70 Euphoria 2.5 27 0.05 28 0.05 I couldn't believe this, so I dumped "table" into a file and did a diff. There was no difference. The times seemed reasonable. With 2.5 the prompt came back instantly. With 2.4 it took a second or so. I can't really explain this, other than to point out that sometimes weird performance situations can develop with the malloc() heap. 2.5 frees up a lot of front-end data before running the back-end, but that couldn't explain everything. Maybe I put in an optimization that I've forgotten about. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
4. Re: Dramatic slowdown -ping Rob
- Posted by "Juergen Luethje" <j.lue at gmx.de> Nov 13, 2004
- 561 views
Pete Lomax wrote: > I've narrowed down the unexpected performance hit I mentioned earlier > today (Re: Contest Update (12-Nov-2004)) > > If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes > 3.19 seconds, over tenfold performance drop. Why? > > Does anyone else get these results? You may need to massage the > figures a little to account for memory/cpu differences. <snip> I tested INNER in the range from 10 to 40. The following was the only suspect effect that I saw here (Pentium II, 400 MHz, 256 MB RAM, Windows 98, Eu 2.4). Using the 'ex.exe' interpreter (in a DOS box): - INNER = 27: 1.10 - INNER = 28: 1.70 The other steps in the tested range result in smaller time differences. I didn't get such an effect at all when using the 'exw.exe' interpreter, and I wasn't able to produce a dramatic effect such as yours. Regards, Juergen
5. Re: Dramatic slowdown -ping Rob
- Posted by CoJaBo <cojabo at suscom.net> Nov 13, 2004
- 559 views
These are my results on an Athlon 64 2.8Ghz with 2GB RAM: Intepreter INNER time exw 2.4 27 0.08 exw 2.4 28 0.08 ex 2.4 27 0.00 ex 2.4 28 9.30 Yeah... I'd call that dramatic... (CRASH!!) I have to go now, one of the cats knocked something over, and it sounded like it broke... Pete Lomax wrote: > > I've narrowed down the unexpected performance hit I mentioned earlier > today (Re: Contest Update (12-Nov-2004)) > > If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes > 3.19 seconds, over tenfold performance drop. Why? > > Does anyone else get these results? You may need to massage the > figures a little to account for memory/cpu differences. > > Rob, if you manage to duplicate this performance hit on 2.4, do you > get the same on 2.5? > > }}} <eucode> > constant START = time() > constant INNER=28 > sequence table > table=repeat(repeat(0,INNER),1024) > integer used > used=1 > > for i=1 to 45000 do > used+=1 > if used>length(table) then table&=repeat(repeat(0,INNER),1024) end > if > table[used][5]=used > end for > > printf(1, "Elapsed time: %f\n", time() - START) > </eucode> {{{ > > Regards, > Pete > PS the performance hit disappears if I initialise the table to 40,000 > entries > >
6. Re: Dramatic slowdown -ping Rob
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 13, 2004
- 547 views
- Last edited Nov 14, 2004
CoJaBo wrote: > > These are my results on an Athlon 64 2.8Ghz with 2GB RAM: > > Intepreter INNER time > exw 2.4 27 0.08 > exw 2.4 28 0.08 > ex 2.4 27 0.00 > ex 2.4 28 9.30 > Just a reminder, in case its relavent, I'm using exw for the contest. -- Derek Parnell Melbourne, Australia
7. Re: Dramatic slowdown -ping Rob
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Nov 14, 2004
- 562 views
On Fri, 12 Nov 2004 22:11:07 -0800, Robert Craig <guest at RapidEuphoria.com> wrote: >Pete Lomax wrote: >> >> If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes >> 3.19 seconds, over tenfold performance drop. Why? > >On my 1.8 GHz Pentium-4 machine with 256 Mb RAM, >running ex.exe: > >Euphoria 2.4 27 0.94 > 28 1.70 > >Euphoria 2.5 27 0.05 > 28 0.05 > >I couldn't believe this Absolutely, I'd need to see it before I believed it ) >, so I dumped "table" into a file >and did a diff. There was no difference. The times >seemed reasonable. With 2.5 the prompt came back instantly. >With 2.4 it took a second or so. > >I can't really explain this, other than to point out that >sometimes weird performance situations can develop with >the malloc() heap. 2.5 frees up a lot of front-end data >before running the back-end, but that couldn't explain everything. Maybe I triggered a heapCompact, somehow, but I'm not sure why since I haven't (deliberately) released thousands of unused heap slots >Maybe I put in an optimization that I've forgotten about. Wishful thinking, perhaps. OK, you need to concentrate on 2.5, and you don't need to waste time on this issue right now (indeed there is zero time pressure on me over this), but I'd hate to think you drop this as some great "unsolveable mystery" and never think of it again. If there is a genuine problem and it has moved (from 27/28), it'll be a _long_ while before anyone finds it again, and it might be much more serious/much more urgent. Is there some kind of "API spy" program that can give further stats? Regards, Pete
8. Re: Dramatic slowdown -ping Rob
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Nov 14, 2004
- 545 views
On Sat, 13 Nov 2004 13:22:35 -0800, Derek Parnell <guest at RapidEuphoria.com> wrote: >Just a reminder, in case its relavent, I'm using exw for the contest. I know you live other side of planet, but did you just hear a /loud/ DOH? So far, I've been doing *all* my timings on ex.exe! (My mistake, of course) Pete
9. Re: Dramatic slowdown -ping Rob
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Nov 14, 2004
- 540 views
On Sat, 13 Nov 2004 08:18:29 -0800, CoJaBo <guest at RapidEuphoria.com> wrote: >ex 2.4 27 0.00 >ex 2.4 28 9.30 > >Yeah... I'd call that dramatic... Wow, I moaned at 2x slower, that's like 1000-fold. Given you have some 40 times as much memory as me (and 40 squared is 1600), is it looking more and more like some subtle heap corruption? Regards, Pete
10. Re: Dramatic slowdown -ping Rob
- Posted by CoJaBo <cojabo at suscom.net> Nov 14, 2004
- 552 views
Pete Lomax wrote: > > On Sat, 13 Nov 2004 13:22:35 -0800, Derek Parnell > <guest at RapidEuphoria.com> wrote: > > >Just a reminder, in case its relavent, I'm using exw for the contest. > I know you live other side of planet, but did you just hear a /loud/ > DOH? Or a loud ARRGH!... I found what the cats knocked over: a box with a large number of electronic components! It made a considerable mess, but not even close to when they got on top of the piano(which is only used as a table) and sent everthing crashing th the inaccessible floor behind it... Another time one of them even sent a fax! > > So far, I've been doing *all* my timings on ex.exe! > (My mistake, of course) > Pete > >
11. Re: Dramatic slowdown -ping Rob
- Posted by CoJaBo <cojabo at suscom.net> Nov 14, 2004
- 550 views
Pete Lomax wrote: > > On Sat, 13 Nov 2004 08:18:29 -0800, CoJaBo <guest at RapidEuphoria.com> > wrote: > > >ex 2.4 27 0.00 > >ex 2.4 28 9.30 > > > >Yeah... I'd call that dramatic... > Wow, I moaned at 2x slower, that's like 1000-fold. On exw there is no noticeable slowdown, I wonder why? But ex shows a difference of about 9.3, 27 being so fast is showed 0.00 and 28 being so slow I thought it crashed... > > Given you have some 40 times as much memory as me (and 40 squared is > 1600), is it looking more and more like some subtle heap corruption? I'd try it on the other computers, but none of them have Eu installed, and the binder keeps saying somthing about cho isn't a command... > > Regards, > Pete > >
12. Re: Dramatic slowdown -ping Rob
- Posted by CoJaBo <cojabo at suscom.net> Nov 14, 2004
- 563 views
CoJaBo wrote: > > Pete Lomax wrote: > > > > On Sat, 13 Nov 2004 08:18:29 -0800, CoJaBo <guest at RapidEuphoria.com> > > wrote: > > > > >ex 2.4 27 0.00 > > >ex 2.4 28 9.30 > > > > > >Yeah... I'd call that dramatic... > > Wow, I moaned at 2x slower, that's like 1000-fold. > On exw there is no noticeable slowdown, I wonder why? > But ex shows a difference of about 9.3, 27 being so > fast is showed 0.00 and 28 being so slow I thought it > crashed... > > > > > Given you have some 40 times as much memory as me (and 40 squared is > > 1600), is it looking more and more like some subtle heap corruption? > I'd try it on the other computers, but none of them have Eu installed, > and the binder keeps saying somthing about cho isn't a command... The .bat file had gotten messed up... Fixed now. > > > > > Regards, > > Pete > > > >
13. Re: Dramatic slowdown -ping Rob
- Posted by Robert Craig <rds at RapidEuphoria.com> Nov 29, 2004
- 610 views
As promised, I looked into the performance issue raised by Pete Lomax a while back. I don't think there is any bug in Euphoria itself (ref counts etc.). I rather think WATCOM's storage allocator for DOS has some very quirky behavior. In exw.exe I don't use the WATCOM allocator, I use Microsoft's. I earlier reported no problem with ex.exe for 2.5, but I think I must have been using exw.exe for 2.5. exw runs the code below very fast regardless of the size of the sequences. People noticed a big slowdown with ex.exe when moving from 27 to 28. I believe that's because I allocate 50% more space when the size becomes 28. For speed in handling small sequences, I map to a set of standard memory block sizes: 16, 24, 32, 48, 64, 96, 128 bytes, 192 bytes, 256 etc. I think 27 would need 128, while 28 would need 192. However, strangely enough, when the size is increased from 28 to 100, it runs really fast, even though more processing is being done by Euphoria. I also noticed that when it runs slow, it actually runs fast to start with, and then at a certain point it starts taking longer and longer to allocate a block. I downloaded the WATCOM source, but I couldn't really understand how the allocator works. From the comments it did seem like it was fairly complex, with a lot of special cases, and multiple heaps, and there was special code added to deal with worst case situations. Below is Pete's code with a lot of mods to get better information.
-- Time profile based on 167 samples. -- Left margin shows the percentage of total execution time -- consumed by the statement(s) on that line. | |constant START = time() |constant INNER=28 --27 good, 28 bad, 50 bad, 100 good, why? |sequence table |table=repeat(repeat(0,INNER),1024) |integer used |used=1 |sequence t1, t2 |atom t |t = time() | |for i=1 to 45000 do | used+=1 0.60 | if remainder(i,3000) = 0 then 0.60 | printf(1, "%d: %.2f\n", {i, time()-t}) | t = time() | end if 0.60 | if used>length(table) then | t1 = repeat(0, INNER) | t2 = repeat(t1, 1024) 2.40 | table &= t2 | end if 95.81 | table[used][1]=99 -- here's where almost all | -- the 28-long sequence blocks are allocated |end for | |printf(1, "Elapsed time: %f\n", time() - START)
Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
14. Re: Dramatic slowdown -ping Rob
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Nov 30, 2004
- 570 views
On Sun, 28 Nov 2004 20:29:12 -0800, Robert Craig <guest at RapidEuphoria.com> wrote: >As promised, I looked into the performance issue raised >by Pete Lomax a while back. I don't think there is any >bug in Euphoria itself (ref counts etc.). >I rather think WATCOM's storage >allocator for DOS has some very quirky behavior. In exw.exe >I don't use the WATCOM allocator, I use Microsoft's. Thanks for looking into this. Now I know it's a problem in ex rather than exw, I'm no longer overly concerned about it. >I earlier reported no problem with ex.exe for 2.5, >but I think I must have been using exw.exe for 2.5. >exw runs the code below very fast regardless of the >size of the sequences. People noticed a big slowdown >with ex.exe when moving from 27 to 28. I believe that's because >I allocate 50% more space when the size becomes 28. >For speed in handling small sequences, I map to a set of >standard memory block sizes: >16, 24, 32, 48, 64, 96, 128 bytes, 192 bytes, 256 etc. >I think 27 would need 128, while 28 would need 192. > >However, strangely enough, when the size is increased from 28 to >100, it runs really fast, even though more processing is being >done by Euphoria. Sounds to me like Watcom has a problem with 192, which is not all that surprising since few C programs are likely to allocate tens of thousands of blocks of (just) that size. You've probably thought of this already, but you might want to consider avoiding the problem in ex by removing the 192 size (ie just in ex, leave it in exw and exu)? As I said, it does not really bother me personally, but I know that "speed is something of an obsession for you". Regards, Pete
15. Re: Dramatic slowdown -ping Rob
- Posted by Mario Steele <eumario at trilake.net> Nov 30, 2004
- 586 views
I have actually found something of much intrest.... I do not have a copy of 2.4 readily at my disposal, but this test has come up with some.... shall we say, interesting points. This problem at hand, may be due to Heap Corruption as Pete suggested, but it may be not what your expecting. I'm running a AMD Athlon XP 2.16 GHz Proccessor with 192 Megs of ram, and for the original source code as given by Pete, the results are as follows: ex.exe (27): 0.110000 ex.exe (28): 0.110000 However, I was going to test this with exw.exe, and needed to throw in a pause before exiting execution, so I use my normal, while get_key() = -1 do end while to the end of the code, so I could see the results. However, I forgot I put this in, when I went to re-test the code on ex.exe, and found a surprising result.
constant START = time() constant INNER=27 sequence table table=repeat(repeat(0,INNER),1024) integer used used=1 for i=1 to 45000 do used+=1 if used>length(table) then table&=repeat(repeat(0,INNER),1024) end if table[used][5]=used end for printf(1, "Elapsed time: %f\n", time() - START) while get_key() = -1 do end while
One would belive this would return the same result as before, however, these are the results I got: ex.exe (27): 2.030000 ex.exe (28): 2.090000 Now, taking from Rob's own admitance, it seems to me, that it's not the actual execution that's resulting in these times, but instead, the backend may be corrupting the heap when it releases the memory before executing the IL code generated by the frontend. There's definatly something going on, for the interpreter to be generating this much of a diffrence between the original code, to 1 loop inheritance added, with absolutely no code inside actually executed, except for get_key(). Just letting ya know, EuMario Pete Lomax wrote: > > On Sun, 28 Nov 2004 20:29:12 -0800, Robert Craig > <guest at RapidEuphoria.com> wrote: > > >As promised, I looked into the performance issue raised > >by Pete Lomax a while back. I don't think there is any > >bug in Euphoria itself (ref counts etc.). > >I rather think WATCOM's storage > >allocator for DOS has some very quirky behavior. In exw.exe > >I don't use the WATCOM allocator, I use Microsoft's. > Thanks for looking into this. Now I know it's a problem in ex rather > than exw, I'm no longer overly concerned about it. > >I earlier reported no problem with ex.exe for 2.5, > >but I think I must have been using exw.exe for 2.5. > >exw runs the code below very fast regardless of the > >size of the sequences. People noticed a big slowdown > >with ex.exe when moving from 27 to 28. I believe that's because > >I allocate 50% more space when the size becomes 28. > >For speed in handling small sequences, I map to a set of > >standard memory block sizes: > >16, 24, 32, 48, 64, 96, 128 bytes, 192 bytes, 256 etc. > >I think 27 would need 128, while 28 would need 192. > > > >However, strangely enough, when the size is increased from 28 to > >100, it runs really fast, even though more processing is being > >done by Euphoria. > Sounds to me like Watcom has a problem with 192, which is not all that > surprising since few C programs are likely to allocate tens of > thousands of blocks of (just) that size. You've probably thought of > this already, but you might want to consider avoiding the problem in > ex by removing the 192 size (ie just in ex, leave it in exw and exu)? > As I said, it does not really bother me personally, but I know that > "speed is something of an obsession for you". > > Regards, > Pete
16. Re: Dramatic slowdown -ping Rob
- Posted by Robert Craig <rds at RapidEuphoria.com> Nov 30, 2004
- 561 views
Mario Steele wrote: > > I have actually found something of much intrest.... > > I do not have a copy of 2.4 readily at my disposal, > but this test has come up with some.... shall we say, > interesting points. This problem at hand, may be due > to Heap Corruption as Pete suggested, but it may be > not what your expecting. > > I'm running a AMD Athlon XP 2.16 GHz Proccessor with > 192 Megs of ram, and for the original source code as > given by Pete, the results are as follows: > > ex.exe (27): 0.110000 > ex.exe (28): 0.110000 > > However, I was going to test this with exw.exe, and > needed to throw in a pause before exiting execution, > so I use my normal, while get_key() = -1 do end while > to the end of the code, so I could see the results. > > However, I forgot I put this in, when I went to re-test > the code on ex.exe, and found a surprising result. > > }}} <eucode> > constant START = time() > constant INNER=27 > sequence table > table=repeat(repeat(0,INNER),1024) > integer used > used=1 > > for i=1 to 45000 do > used+=1 > if used>length(table) then table&=repeat(repeat(0,INNER),1024) end > if > table[used][5]=used > end for > > printf(1, "Elapsed time: %f\n", time() - START) > while get_key() = -1 do end while > </eucode> {{{ > > One would belive this would return the same result as before, > however, these are the results I got: > > ex.exe (27): 2.030000 > ex.exe (28): 2.090000 > > Now, taking from Rob's own admitance, it seems to me, that > it's not the actual execution that's resulting in these > times, but instead, the backend may be corrupting the > heap when it releases the memory before executing the > IL code generated by the frontend. > > There's definatly something going on, for the interpreter > to be generating this much of a diffrence between the > original code, to 1 loop inheritance added, with absolutely > no code inside actually executed, except for get_key(). > > Just letting ya know, > EuMario I tried a few more experiments, and yes, it seems you can flip ex from fast to slow, by making other seemingly unrelated changes, besides just changing the sequence lengths. exw seems to always be fast. This is not a 2.5-specific issue. Pete originally reported it using 2.4, a few days before 2.5 came out. I have also confirmed it using 2.4, as have some other people. Watcom's allocator is complex, as are most allocators that try to achieve high performance. I don't think corruption is taking place, because the same, correct result is produced whether it's running fast or slow. Also, when it runs slow, it actually starts off fast and gradually slows down, suggesting that maybe it is searching through an increasing number of memory blocks, looking for a free block of the right size. I believe Watcom may be using a system where blocks are marked with a bit that indicates whether they are free or not. In some cases I suspect Watcom searches through all blocks, free or in use, looking for a free block of the right size. This would slow it down as more blocks are allocated, but I think they also use other algorithms in some cases. Most allocators search a linked list containing only free blocks. Thanks, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com