1. Dramatic slowdown -ping Rob

I've narrowed down the unexpected performance hit I mentioned earlier
today (Re: Contest Update (12-Nov-2004))

If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes
3.19 seconds, over tenfold performance drop. Why?

Does anyone else get these results? You may need to massage the
figures a little to account for memory/cpu differences.

Rob, if you manage to duplicate this performance hit on 2.4, do you
get the same on 2.5?

constant START = time()
constant INNER=28
sequence table
		table=repeat(repeat(0,INNER),1024)
integer used
		used=1

for i=1 to 45000 do
	used+=1
	if used>length(table) then table&=repeat(repeat(0,INNER),1024) end
if
	table[used][5]=used
end for

printf(1, "Elapsed time: %f\n", time() - START)


Regards,
Pete
PS the performance hit disappears if I initialise the table to 40,000
entries

new topic     » topic index » view message » categorize

2. Re: Dramatic slowdown -ping Rob

Pete Lomax wrote:
> 
> I've narrowed down the unexpected performance hit I mentioned earlier
> today (Re: Contest Update (12-Nov-2004))
> 
> If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes
> 3.19 seconds, over tenfold performance drop. Why?
> 
> Does anyone else get these results? You may need to massage the
> figures a little to account for memory/cpu differences.

Yes, I get the same effect. 

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

3. Re: Dramatic slowdown -ping Rob

Pete Lomax wrote:
> 
> I've narrowed down the unexpected performance hit I mentioned earlier
> today (Re: Contest Update (12-Nov-2004))
> 
> If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes
> 3.19 seconds, over tenfold performance drop. Why?
> 
> Does anyone else get these results? You may need to massage the
> figures a little to account for memory/cpu differences.
> 
> Rob, if you manage to duplicate this performance hit on 2.4, do you
> get the same on 2.5?
> 
> }}}
<eucode>
> constant START = time()
> constant INNER=28
> sequence table
> 		table=repeat(repeat(0,INNER),1024)
> integer used
> 		used=1
> 
> for i=1 to 45000 do
> 	used+=1
> 	if used>length(table) then table&=repeat(repeat(0,INNER),1024) end
> if
> 	table[used][5]=used
> end for
> 
> printf(1, "Elapsed time: %f\n", time() - START)
> </eucode>
{{{

> 
> Regards,
> Pete
> PS the performance hit disappears if I initialise the table to 40,000
> entries

On my 1.8 GHz Pentium-4 machine with 256 Mb RAM,
running ex.exe:

Euphoria 2.4  27  0.94
              28  1.70

Euphoria 2.5  27  0.05
              28  0.05

I couldn't believe this, so I dumped "table" into a file
and did a diff. There was no difference. The times
seemed reasonable. With 2.5 the prompt came back instantly.
With 2.4 it took a second or so.

I can't really explain this, other than to point out that
sometimes weird performance situations can develop with 
the malloc() heap. 2.5 frees up a lot of front-end data 
before running the back-end, but that couldn't explain everything.
Maybe I put in an optimization that I've forgotten about.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

4. Re: Dramatic slowdown -ping Rob

Pete Lomax wrote:

> I've narrowed down the unexpected performance hit I mentioned earlier
> today (Re: Contest Update (12-Nov-2004))
>
> If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes
> 3.19 seconds, over tenfold performance drop. Why?
>
> Does anyone else get these results? You may need to massage the
> figures a little to account for memory/cpu differences.

<snip>

I tested INNER in the range from 10 to 40. The following was the only
suspect effect that I saw here (Pentium II, 400 MHz, 256 MB RAM,
Windows 98, Eu 2.4).
Using the 'ex.exe' interpreter (in a DOS box):
- INNER = 27: 1.10
- INNER = 28: 1.70

The other steps in the tested range result in smaller time differences.
I didn't get such an effect at all when using the 'exw.exe' interpreter,
and I wasn't able to produce a dramatic effect such as yours.

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

5. Re: Dramatic slowdown -ping Rob

These are my results on an Athlon 64 2.8Ghz with 2GB RAM:

Intepreter INNER time
exw 2.4    27    0.08
exw 2.4    28    0.08
ex 2.4     27    0.00
ex 2.4     28    9.30

Yeah... I'd call that dramatic...
(CRASH!!) I have to go now, one of the cats
knocked something over, and it sounded like
it broke...

Pete Lomax wrote:
> 
> I've narrowed down the unexpected performance hit I mentioned earlier
> today (Re: Contest Update (12-Nov-2004))
> 
> If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes
> 3.19 seconds, over tenfold performance drop. Why?
> 
> Does anyone else get these results? You may need to massage the
> figures a little to account for memory/cpu differences.
> 
> Rob, if you manage to duplicate this performance hit on 2.4, do you
> get the same on 2.5?
> 
> }}}
<eucode>
> constant START = time()
> constant INNER=28
> sequence table
> 		table=repeat(repeat(0,INNER),1024)
> integer used
> 		used=1
> 
> for i=1 to 45000 do
> 	used+=1
> 	if used>length(table) then table&=repeat(repeat(0,INNER),1024) end
> if
> 	table[used][5]=used
> end for
> 
> printf(1, "Elapsed time: %f\n", time() - START)
> </eucode>
{{{

> 
> Regards,
> Pete
> PS the performance hit disappears if I initialise the table to 40,000
> entries
> 
>

new topic     » goto parent     » topic index » view message » categorize

6. Re: Dramatic slowdown -ping Rob

CoJaBo wrote:
> 
> These are my results on an Athlon 64 2.8Ghz with 2GB RAM:
> 
> Intepreter INNER time
> exw 2.4    27    0.08
> exw 2.4    28    0.08
> ex 2.4     27    0.00
> ex 2.4     28    9.30
> 

Just a reminder, in case its relavent, I'm using exw for the contest.

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

7. Re: Dramatic slowdown -ping Rob

On Fri, 12 Nov 2004 22:11:07 -0800, Robert Craig
<guest at RapidEuphoria.com> wrote:

>Pete Lomax wrote:
>> 
>> If INNER is 27, this completes in 0.27 seconds. If it is 28, it takes
>> 3.19 seconds, over tenfold performance drop. Why?
>
>On my 1.8 GHz Pentium-4 machine with 256 Mb RAM,
>running ex.exe:
>
>Euphoria 2.4  27  0.94
>              28  1.70
>
>Euphoria 2.5  27  0.05
>              28  0.05
>
>I couldn't believe this
Absolutely, I'd need to see it before I believed it blink)
>, so I dumped "table" into a file
>and did a diff. There was no difference. The times
>seemed reasonable. With 2.5 the prompt came back instantly.
>With 2.4 it took a second or so.
>
>I can't really explain this, other than to point out that
>sometimes weird performance situations can develop with 
>the malloc() heap. 2.5 frees up a lot of front-end data 
>before running the back-end, but that couldn't explain everything.
Maybe I triggered a heapCompact, somehow, but I'm not sure why since I
haven't (deliberately) released thousands of unused heap slots

>Maybe I put in an optimization that I've forgotten about.
Wishful thinking, perhaps.

OK, you need to concentrate on 2.5, and you don't need to waste time
on this issue right now (indeed there is zero time pressure on me over
this), but I'd hate to think you drop this as some great "unsolveable
mystery" and never think of it again. If there is a genuine problem
and it has moved (from 27/28), it'll be a _long_ while before anyone
finds it again, and it might be much more serious/much more urgent.

Is there some kind of "API spy" program that can give further stats?

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

8. Re: Dramatic slowdown -ping Rob

On Sat, 13 Nov 2004 13:22:35 -0800, Derek Parnell
<guest at RapidEuphoria.com> wrote:

>Just a reminder, in case its relavent, I'm using exw for the contest.
I know you live other side of planet, but did you just hear a /loud/
DOH?

So far, I've been doing *all* my timings on ex.exe!
(My mistake, of course)
Pete

new topic     » goto parent     » topic index » view message » categorize

9. Re: Dramatic slowdown -ping Rob

On Sat, 13 Nov 2004 08:18:29 -0800, CoJaBo <guest at RapidEuphoria.com>
wrote:

>ex 2.4     27    0.00
>ex 2.4     28    9.30
>
>Yeah... I'd call that dramatic...
Wow, I moaned at 2x slower, that's like 1000-fold.

Given you have some 40 times as much memory as me (and 40 squared is
1600), is it looking more and more like some subtle heap corruption?

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

10. Re: Dramatic slowdown -ping Rob

Pete Lomax wrote:
> 
> On Sat, 13 Nov 2004 13:22:35 -0800, Derek Parnell
> <guest at RapidEuphoria.com> wrote:
> 
> >Just a reminder, in case its relavent, I'm using exw for the contest.
> I know you live other side of planet, but did you just hear a /loud/
> DOH?
Or a loud ARRGH!... I found what the cats knocked over:
a box with a large number of electronic components!
It made a considerable mess, but not even close to
when they got on top of the piano(which is only used
as a table) and sent everthing crashing th the inaccessible
floor behind it... Another time one of them even sent a fax!

> 
> So far, I've been doing *all* my timings on ex.exe!
> (My mistake, of course)
> Pete
> 
>

new topic     » goto parent     » topic index » view message » categorize

11. Re: Dramatic slowdown -ping Rob

Pete Lomax wrote:
> 
> On Sat, 13 Nov 2004 08:18:29 -0800, CoJaBo <guest at RapidEuphoria.com>
> wrote:
> 
> >ex 2.4     27    0.00
> >ex 2.4     28    9.30
> >
> >Yeah... I'd call that dramatic...
> Wow, I moaned at 2x slower, that's like 1000-fold.
On exw there is no noticeable slowdown, I wonder why?
But ex shows a difference of about 9.3, 27 being so
fast is showed 0.00 and 28 being so slow I thought it
crashed...

> 
> Given you have some 40 times as much memory as me (and 40 squared is
> 1600), is it looking more and more like some subtle heap corruption?
I'd try it on the other computers, but none of them have Eu installed,
and the binder keeps saying somthing about cho isn't a command...

> 
> Regards,
> Pete
> 
>

new topic     » goto parent     » topic index » view message » categorize

12. Re: Dramatic slowdown -ping Rob

CoJaBo wrote:
> 
> Pete Lomax wrote:
> > 
> > On Sat, 13 Nov 2004 08:18:29 -0800, CoJaBo <guest at RapidEuphoria.com>
> > wrote:
> > 
> > >ex 2.4     27    0.00
> > >ex 2.4     28    9.30
> > >
> > >Yeah... I'd call that dramatic...
> > Wow, I moaned at 2x slower, that's like 1000-fold.
> On exw there is no noticeable slowdown, I wonder why?
> But ex shows a difference of about 9.3, 27 being so
> fast is showed 0.00 and 28 being so slow I thought it
> crashed...
> 
> > 
> > Given you have some 40 times as much memory as me (and 40 squared is
> > 1600), is it looking more and more like some subtle heap corruption?
> I'd try it on the other computers, but none of them have Eu installed,
> and the binder keeps saying somthing about cho isn't a command...
The .bat file had gotten messed up... Fixed now.

> 
> > 
> > Regards,
> > Pete
> > 
> >

new topic     » goto parent     » topic index » view message » categorize

13. Re: Dramatic slowdown -ping Rob

As promised, I looked into the performance issue raised
by Pete Lomax a while back. I don't think there is any
bug in Euphoria itself (ref counts etc.). 
I rather think WATCOM's storage 
allocator for DOS has some very quirky behavior. In exw.exe
I don't use the WATCOM allocator, I use Microsoft's.
I earlier reported no problem with ex.exe for 2.5,
but I think I must have been using exw.exe for 2.5.
exw runs the code below very fast regardless of the 
size of the sequences. People noticed a big slowdown
with ex.exe when moving from 27 to 28. I believe that's because 
I allocate 50% more space when the size becomes 28. 
For speed in handling small sequences, I map to a set of 
standard memory block sizes:
16, 24, 32, 48, 64, 96, 128 bytes, 192 bytes, 256 etc.
I think 27 would need 128, while 28 would need 192.

However, strangely enough, when the size is increased from 28 to 
100, it runs really fast, even though more processing is being
done by Euphoria.

I also noticed that when it runs slow, it actually runs fast
to start with, and then at a certain point it starts taking longer
and longer to allocate a block.

I downloaded the WATCOM source, but I couldn't really understand
how the allocator works. From the comments it did seem like it
was fairly complex, with a lot of special cases, and multiple heaps,
and there was special code added to deal with worst case situations.

Below is Pete's code with a lot of mods to get better information.

-- Time profile based on 167 samples.
-- Left margin shows the percentage of total execution time
-- consumed by the statement(s) on that line.

       |
       |constant START = time()
       |constant INNER=28 --27 good, 28 bad, 50 bad, 100 good, why?
       |sequence table
       |table=repeat(repeat(0,INNER),1024)
       |integer used
       |used=1
       |sequence t1, t2
       |atom t
       |t = time()
       |
       |for i=1 to 45000 do
       |    used+=1
  0.60 |    if remainder(i,3000) = 0 then
  0.60 |        printf(1, "%d: %.2f\n", {i, time()-t})
       |        t = time()
       |    end if
  0.60 |    if used>length(table) then
       |        t1 = repeat(0, INNER)
       |        t2 = repeat(t1, 1024)
  2.40 |        table &= t2
       |    end if
 95.81 |    table[used][1]=99  -- here's where almost all 
       |                       -- the 28-long sequence blocks are allocated
       |end for
       |
       |printf(1, "Elapsed time: %f\n", time() - START)


Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

14. Re: Dramatic slowdown -ping Rob

On Sun, 28 Nov 2004 20:29:12 -0800, Robert Craig
<guest at RapidEuphoria.com> wrote:

>As promised, I looked into the performance issue raised
>by Pete Lomax a while back. I don't think there is any
>bug in Euphoria itself (ref counts etc.). 
>I rather think WATCOM's storage 
>allocator for DOS has some very quirky behavior. In exw.exe
>I don't use the WATCOM allocator, I use Microsoft's.
Thanks for looking into this. Now I know it's a problem in ex rather
than exw, I'm no longer overly concerned about it.
>I earlier reported no problem with ex.exe for 2.5,
>but I think I must have been using exw.exe for 2.5.
>exw runs the code below very fast regardless of the 
>size of the sequences. People noticed a big slowdown
>with ex.exe when moving from 27 to 28. I believe that's because 
>I allocate 50% more space when the size becomes 28. 
>For speed in handling small sequences, I map to a set of 
>standard memory block sizes:
>16, 24, 32, 48, 64, 96, 128 bytes, 192 bytes, 256 etc.
>I think 27 would need 128, while 28 would need 192.
>
>However, strangely enough, when the size is increased from 28 to 
>100, it runs really fast, even though more processing is being
>done by Euphoria.
Sounds to me like Watcom has a problem with 192, which is not all that
surprising since few C programs are likely to allocate tens of
thousands of blocks of (just) that size. You've probably thought of
this already, but you might want to consider avoiding the problem in
ex by removing the 192 size (ie just in ex, leave it in exw and exu)?
As I said, it does not really bother me personally, but I know that
"speed is something of an obsession for you".

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

15. Re: Dramatic slowdown -ping Rob

I have actually found something of much intrest....

I do not have a copy of 2.4 readily at my disposal,
but this test has come up with some.... shall we say,
interesting points.  This problem at hand, may be due
to Heap Corruption as Pete suggested, but it may be
not what your expecting.

I'm running a AMD Athlon XP 2.16 GHz Proccessor with
192 Megs of ram, and for the original source code as
given by Pete, the results are as follows:

ex.exe (27): 0.110000
ex.exe (28): 0.110000

However, I was going to test this with exw.exe, and
needed to throw in a pause before exiting execution,
so I use my normal, while get_key() = -1 do end while
to the end of the code, so I could see the results.

However, I forgot I put this in, when I went to re-test
the code on ex.exe, and found a surprising result.

constant START = time()
constant INNER=27
sequence table
		table=repeat(repeat(0,INNER),1024)
integer used
		used=1

for i=1 to 45000 do
	used+=1
	if used>length(table) then table&=repeat(repeat(0,INNER),1024) end
if
	table[used][5]=used
end for

printf(1, "Elapsed time: %f\n", time() - START)
while get_key() = -1 do end while


One would belive this would return the same result as before,
however, these are the results I got:

ex.exe (27): 2.030000
ex.exe (28): 2.090000

Now, taking from Rob's own admitance, it seems to me, that
it's not the actual execution that's resulting in these
times, but instead, the backend may be corrupting the
heap when it releases the memory before executing the
IL code generated by the frontend.

There's definatly something going on, for the interpreter
to be generating this much of a diffrence between the
original code, to 1 loop inheritance added, with absolutely
no code inside actually executed, except for get_key().

Just letting ya know,
EuMario


Pete Lomax wrote:
> 
> On Sun, 28 Nov 2004 20:29:12 -0800, Robert Craig
> <guest at RapidEuphoria.com> wrote:
> 
> >As promised, I looked into the performance issue raised
> >by Pete Lomax a while back. I don't think there is any
> >bug in Euphoria itself (ref counts etc.). 
> >I rather think WATCOM's storage 
> >allocator for DOS has some very quirky behavior. In exw.exe
> >I don't use the WATCOM allocator, I use Microsoft's.
> Thanks for looking into this. Now I know it's a problem in ex rather
> than exw, I'm no longer overly concerned about it.
> >I earlier reported no problem with ex.exe for 2.5,
> >but I think I must have been using exw.exe for 2.5.
> >exw runs the code below very fast regardless of the 
> >size of the sequences. People noticed a big slowdown
> >with ex.exe when moving from 27 to 28. I believe that's because 
> >I allocate 50% more space when the size becomes 28. 
> >For speed in handling small sequences, I map to a set of 
> >standard memory block sizes:
> >16, 24, 32, 48, 64, 96, 128 bytes, 192 bytes, 256 etc.
> >I think 27 would need 128, while 28 would need 192.
> >
> >However, strangely enough, when the size is increased from 28 to 
> >100, it runs really fast, even though more processing is being
> >done by Euphoria.
> Sounds to me like Watcom has a problem with 192, which is not all that
> surprising since few C programs are likely to allocate tens of
> thousands of blocks of (just) that size. You've probably thought of
> this already, but you might want to consider avoiding the problem in
> ex by removing the 192 size (ie just in ex, leave it in exw and exu)?
> As I said, it does not really bother me personally, but I know that
> "speed is something of an obsession for you".
> 
> Regards,
> Pete

new topic     » goto parent     » topic index » view message » categorize

16. Re: Dramatic slowdown -ping Rob

Mario Steele wrote:
> 
> I have actually found something of much intrest....
> 
> I do not have a copy of 2.4 readily at my disposal,
> but this test has come up with some.... shall we say,
> interesting points.  This problem at hand, may be due
> to Heap Corruption as Pete suggested, but it may be
> not what your expecting.
> 
> I'm running a AMD Athlon XP 2.16 GHz Proccessor with
> 192 Megs of ram, and for the original source code as
> given by Pete, the results are as follows:
> 
> ex.exe (27): 0.110000
> ex.exe (28): 0.110000
> 
> However, I was going to test this with exw.exe, and
> needed to throw in a pause before exiting execution,
> so I use my normal, while get_key() = -1 do end while
> to the end of the code, so I could see the results.
> 
> However, I forgot I put this in, when I went to re-test
> the code on ex.exe, and found a surprising result.
> 
> }}}
<eucode>
> constant START = time()
> constant INNER=27
> sequence table
> 		table=repeat(repeat(0,INNER),1024)
> integer used
> 		used=1
> 
> for i=1 to 45000 do
> 	used+=1
> 	if used>length(table) then table&=repeat(repeat(0,INNER),1024) end
> if
> 	table[used][5]=used
> end for
> 
> printf(1, "Elapsed time: %f\n", time() - START)
> while get_key() = -1 do end while
> </eucode>
{{{

> 
> One would belive this would return the same result as before,
> however, these are the results I got:
> 
> ex.exe (27): 2.030000
> ex.exe (28): 2.090000
> 
> Now, taking from Rob's own admitance, it seems to me, that
> it's not the actual execution that's resulting in these
> times, but instead, the backend may be corrupting the
> heap when it releases the memory before executing the
> IL code generated by the frontend.
> 
> There's definatly something going on, for the interpreter
> to be generating this much of a diffrence between the
> original code, to 1 loop inheritance added, with absolutely
> no code inside actually executed, except for get_key().
> 
> Just letting ya know,
> EuMario

I tried a few more experiments, and yes, it seems
you can flip ex from fast to slow, by making other 
seemingly unrelated changes, besides just changing 
the sequence lengths. exw seems to always be fast.

This is not a 2.5-specific issue.
Pete originally reported it using 2.4,
a few days before 2.5 came out. I have also
confirmed it using 2.4, as have some other people.

Watcom's allocator is complex, as are most allocators 
that try to achieve high performance. I don't think
corruption is taking place, because the same, correct
result is produced whether it's running fast or slow.
Also, when it runs slow, it actually starts off fast and gradually
slows down, suggesting that maybe it is searching through 
an increasing number of memory blocks, looking for a free block of
the right size. I believe Watcom may be using a system where
blocks are marked with a bit that indicates whether they are free
or not. In some cases I suspect Watcom searches
through all blocks, free or in use, looking for a free block
of the right size. This would slow it down as more blocks are 
allocated, but I think they also use other algorithms in some cases.
Most allocators search a linked list containing only free blocks.

Thanks,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu