1. sloooooww
Could someone take a look at this code, and maybe tell me why it's so slow?
And possibly even let me know how I could speed it up? Thanks
---Code begins here---
-- fireplay.ex : inspired by Gerhard Piran's Burn
-- Jiri babor
-- J.babor at gns.cri.nz
-- version 1.00 97-01-13
include graphics.e
include get.e
constant
a=#A0000,
ymin=10, ymax= 199, -- top & bottom of fire window
true=1, false=0
integer cc, junk, key, plasma, cp, finished, xmin, xmax
sequence buff, pal, par
procedure write_buff()
for y = ymin to ymax-2 do
poke(a+320*y+xmin,buff[y-ymin+1])
end for
end procedure
procedure burn() -- fire calculations
if plasma then -- plasma-like fire
for r = 2 to ymax-ymin do
for c = 2 to xmax-xmin do
cc = floor((buff[r][c-1]+buff[r][c]+
buff[r][c+1]+buff[r+1][c])/4)
if cc>=par[2][3] then cc=cc-par[2][3] end if
buff[r-1][c]=cc
end for
end for
else -- curtain-like fire
for r = ymax-ymin to 2 by -1 do
for c = 2 to xmax-xmin do
cc = floor((buff[r][c-1]+buff[r][c]+
buff[r][c+1]+buff[r+1][c])/4)
if cc>0 then cc=cc-1 end if
buff[r-1][c]=cc
end for
end for
end if
end procedure
procedure spark()
integer x,y
if rand(par[3][3])-1 then
x=rand(xmax-xmin-21)+10
y=rand(ymax-ymin-40)+20
buff[y][x]=255
while rand(2)-1 do
y=y+1
buff[y][x]=par[1][3]
end while
end if
end procedure
procedure seed_fire()
-- set randomly the bottom two lines to start the fire & to keep it going
integer imin
imin=floor(0.1*par[4][3]*par[1][3])
for i = 2 to xmax-xmin do
buff[ymax-ymin][i]=imin+rand(par[1][3]-imin)
end for
buff[ymax-ymin+1]=buff[ymax-ymin]
end procedure
procedure control()
if key=27 then -- esc: exit
finished=true
end if
end procedure
procedure set_palette()
-- set yellow-red-slate fire palette
pal=repeat({0,0,0},256)
for i = 1 to 32 do
pal[i]=floor({0,(i-1)/2,(i-1)/2})
pal[i+32]=floor({(i-1)/2,15,16-i/2})
end for
for i = 1 to 64 do
pal[i+64]=floor({15+3*i/4,15,0})
pal[i+128]=floor({63,15+3*i/4,0})
pal[i+192]=floor({63,63,15+3*i/4})
end for
all_palette(pal)
end procedure
-- main
------------------------------------------------------------------------
-- fire parameters: defaults
-- min max cur def
par = {{ 45, 255, 255,255}, -- fire intensity
{ 1, 5, 1, 1}, -- fire attenuation
{ 1, 10, 10, 10}, -- sparks
{ 0, 9, 0, 0}, -- fire smoothness at base
{ 200, 320, 320,600}} -- fire window width (def: 40/320/100/100)
cp=1 -- currently active parameter
finished= false
plasma=true
xmin=(320-par[5][3])/2
xmax=xmin+par[5][3]-1
buff=repeat(repeat(0,xmax-xmin+1),ymax-ymin+1)
--introduction()
junk = graphics_mode(19)
set_palette()
--n=0 -- frame count
while not finished do -- main loop
seed_fire()
burn() -- main fire routine
write_buff() -- show updated buffer
-- if n>80 then
-- spark()
-- end if
-- n=n+1 -- increment frame count
key=get_key()
if key!=-1 then
control()
end if
end while
junk = graphics_mode(-1)
--END CODE--
2. Re: sloooooww
Optimization, my friend, optimization!
> procedure write_buff()
> for y = ymin to ymax-2 do
> poke(a+320*y+xmin,buff[y-ymin+1])
> end for
> end procedure
How about adding 320 instead of multiplying by it? i.e.
poke(a+q+xmin, ...)
q = q + 320
This would be much, much faster. And since write_buff() gets called a lot ...
Also see about stuffing some of your calculations in a lookup table, or
at least sticking predictable values in a table to reduce the amount of
multiplications you have to do. Mults and divs kill execution time.
--
Cameron Kaiser
http://www.sserv.com/
spectre at sserv.com
--
3. Re: sloooooww
Cameron Kaiser wrote:
>Optimization, my friend, optimization!
>> procedure write_buff()
>> for y = ymin to ymax-2 do
>> poke(a+320*y+xmin,buff[y-ymin+1])
>> end for
>> end procedure
>
>How about adding 320 instead of multiplying by it? i.e.
> poke(a+q+xmin, ...)
> q = q + 320
>
>This would be much, much faster. And since write_buff() gets called a lot
...
>Also see about stuffing some of your calculations in a lookup table, or
>at least sticking predictable values in a table to reduce the amount of
>multiplications you have to do. Mults and divs kill execution time.
I'm sorry, but that is the most ridiculous thing I have ever heard. Adding
320 to y in procedure write_buff will seriously screw the program. You
*have* to multiply in
order to get each line in its proper place (ie. to advance down the screen).
Adding
might draw everything on a single line, or crash the program.
Also, write_buff does *not* get called "a lot" when you compare it to other
parts of the code. It only gets called once for every new screen completed.
By doing
execution and time profiles, you can see that for an approximate 30 second
run,
write_buff is only called 86 times, which results in a total of 16,168 pokes
to the vga
screen. If that was the only thing running, it would be much faster than we
see now.
Write_buff only takes about 10% of the processing time. Over 85% of the
processing is done inside procedure burn(). Here is an execution count
profile, and
a time profile from procedures write_buf and burn:
Execution count:
|procedure write_buff()
86 | for y = ymin to ymax-2 do
16168 | poke(a+320*y+xmin,buff[y-ymin+1])
16168 | end for
86 |end procedure
|
|procedure burn() -- fire calculations
86 | if plasma then -- plasma-like fire
86 | for r = 2 to ymax-ymin do
16168 | for c = 2 to xmax-xmin do
5141424 | cc = floor((buff[r][c-1]+buff[r][c]+
| buff[r][c+1]+buff[r+1][c])/4)
5141424 | if cc>=par[2][3] then cc=cc-par[2][3] end if
5141424 | buff[r-1][c]=cc
5141424 | end for
16168 | end for
| else -- curtain-like fire
| -- omitted, never executed
| end if
86 |end procedure
Time profile:
|procedure write_buff()
| for y = ymin to ymax-2 do
10.10 | poke(a+320*y+xmin,buff[y-ymin+1])
| end for
|end procedure
|
|procedure burn() -- fire calculations
| if plasma then -- plasma-like fire
| for r = 2 to ymax-ymin do
0.06 | for c = 2 to xmax-xmin do
62.56 | cc = floor((buff[r][c-1]+buff[r][c]+
| buff[r][c+1]+buff[r+1][c])/4)
14.35 | if cc>=par[2][3] then cc=cc-par[2][3] end if
9.02 | buff[r-1][c]=cc
1.97 | end for
| end for
| else -- curtain-like fire
| -- omitted, never executed
| end if
|end procedure
As you can see, burn is doing a *huge* amount of addition and division, doing
over
5,141,424 additions and divisions. (just to be sure, i did profiles with
write_buff
commented out. the extra speed increase only gained 14 more calls to burn().
obviously, write_buff is *not* the bottleneck that Cameron believes it to
be.) The
trick, then, is to make the calculations in burn go faster. A *lot* faster.
I wanted to see how fast euphoria could do division, so here's a sample
program:
atom s, e
integer t, r
t = 9
s = time()
for x = 1 to 5141424 do
r = floor(x/t)
end for
e = time()
printf(1,"%d",(e - s))
This little baby runs in about 3 seconds on my p133! Why then does burn()
run so slow when 5 million divides take about 3 seconds? This interested me,
so I modified
my little program to this:
atom s, e
atom t, r, c
t = 7
c = 13
s = time()
for x = 1 to 5141424 do
t = t + c
c = c + t
r = floor((x + t + c)/4)
end for
e = time()
printf(1,"%d",(e - s))
Now, it takes a whoping 47 seconds to run! The number of divisions did not
increase, but additions were thrown into the mix. So how do we speed up
burn?
Easy, get rid of the additions! Except now we don't get pretty fire...
Anybody know how to get rid of enough calculations inside burn to make it
fast?
Without degrading the quality of the fire??
James Powell
(Damn, this was a long post! Probably not much help, either...
4. Re: sloooooww
> >Optimization, my friend, optimization!
> >> procedure write_buff()
> >> for y = ymin to ymax-2 do
> >> poke(a+320*y+xmin,buff[y-ymin+1])
> >> end for
> >> end procedure
> >
> >How about adding 320 instead of multiplying by it? i.e.
> > poke(a+q+xmin, ...)
> > q = q + 320
> >
> >This would be much, much faster. And since write_buff() gets called a lot
>
> I'm sorry, but that is the most ridiculous thing I have ever heard. Adding
> 320 to y in procedure write_buff will seriously screw the program. You
> *have* to multiply in
> order to get each line in its proper place (ie. to advance down the screen).
> Adding
> might draw everything on a single line, or crash the program.
I think you misunderstand me. q is just a buffer variable. Instead of
multiplying y times 320 every time, start q off with ymin * 320. Then add 320
to q each loop iteration. Run the code first before you criticize it, okay?
In a sense, this is an unrolling-the-loop, since mults are simply repeated
adds.
However, your point about write_buff()'s call frequency is well taken. I
should have done a proper execution profile.
--
Cameron Kaiser
http://www.sserv.com/
spectre at sserv.com
--
5. Re: sloooooww
Cameron Kaiser wrote:
> I think you misunderstand me. q is just a buffer variable. Instead of
> multiplying y times 320 every time, start q off with ymin * 320. Then
> add 320
> to q each loop iteration. Run the code first before you criticize it,
> okay?
> In a sense, this is an unrolling-the-loop, since mults are simply
> repeated
> adds.
>
> However, your point about write_buff()'s call frequency is well taken.
> I
> should have done a proper execution profile.
There are three things, so slow in Euphoria that trying another way
is worth the trouble:
* Calling other functions or procedures in the main loop...
(optimize them either to include the read code or to call the
machine_proc or machine_func yourself)
* Composing a temporary sequence (example:
my_routine ({x,y,s,{e,r,t}})
is slower than...
my_seq = {0,0,0{0,0,0}}
..inner loop...
my_seq[1] = x
my_seq[2] = y
etc.
End of example)
* Heavy calculations, avoid atoms, always use integers, avoid
connversions, use floor, simply never have an atom, you could just
figure out a way to calculate the base and exponent seperately (iy your
Einstein) for your calculation, and then make an atom out of it when you
return, need or use the value.
I know these are just easily made comments and have no real effect
on the code, but i'll promise, i'll look into it tomorrow and rewrite it
as optimized as i possible can.
6. Re: sloooooww
--------------6AD2306A6804
klepto wrote:
> Could someone take a look at this code, and maybe tell me why it's so slow?
> And possibly even let me know how I could speed it up? Thanks
Here's my attempt at making it faster, but I cheated I think. I used
assembler. I tried optimizing in plain Euphoria and couldn't get it to
go much faster.
from Pete
--------------6AD2306A6804
7. Re: sloooooww
Sorry about that last post. I forgot that attaching files doesn't work
well with the listserv. Anyway I uploaded the zip to
Again, sorry
--
_____ _____ _____
________ /\ \ /\ \ /\ \
/ \ \ / \____\ / \____\ / \____\
/ _ \____\ / / ___/_ / /____/ / / ___/_
/ / \ [___] / / /\____\ / \ \ / / /\____\
\ \_/ / / \ \/ / ___/_\ \ \ \ \/ / ___/_
\ /____/ \ / /\ \\/\ \ \ \ / /\ \
\ \ \ \ \/ \____\ \ \ \ \ \/ \____\
\ \ \ \ / / \ \____\ \ / /
\ \____\ \ / / \ / / \ / /
\ / / \ / / \/____/ \ / /
\/____/ \/____/xseal at harborside.com\/____/
8. Re: sloooooww
Pete wrote:
>
>klepto wrote:
>> Could someone take a look at this code, and maybe tell me why it's so slow?
>> And possibly even let me know how I could speed it up? Thanks
>
>Here's my attempt at making it faster, but I cheated I think. I used
>assembler. I tried optimizing in plain Euphoria and couldn't get it to
>go much faster.
>
Another nice work, Pete,
(Why is your work always nice?, i'm nearly bored to have to say that
"Pete, your work is very nice.... blah blah..."
)
i also attempted to speed up klepto's source with plane euphoria, but,
finally concluded from some timing test that "cheating(using machine code)"
is the *only* way to achieve it.
anyway your's are so fast...
one suggestion in your work: Could you modify your source to hide a few
bottom lines of fire? i think it seems better to see if a few bottom
lines near seed line are hidden. i tried ymax=203. it worked, however, leaves
CW error when exit.
Bye! -- from Lee, woo seob.
p.s. i hope you have received successfully the
svga virtual page library zip that i sent.