1. sloooooww

Could someone take a look at this code, and maybe tell me why it's so slow?
And possibly even let me know how I could speed it up? Thanks

---Code begins here---
-- fireplay.ex : inspired by Gerhard Piran's Burn
-- Jiri babor
-- J.babor at gns.cri.nz
-- version 1.00   97-01-13

include graphics.e
include get.e

constant
   a=#A0000,
   ymin=10,    ymax= 199,        -- top & bottom of fire window
   true=1,     false=0
integer cc, junk, key, plasma, cp, finished, xmin, xmax
sequence buff, pal, par

procedure write_buff()
   for y = ymin to ymax-2 do
      poke(a+320*y+xmin,buff[y-ymin+1])
   end for
end procedure

procedure burn()     -- fire calculations
   if plasma then    -- plasma-like fire
      for r = 2 to ymax-ymin do
         for c = 2 to xmax-xmin do
            cc = floor((buff[r][c-1]+buff[r][c]+
                        buff[r][c+1]+buff[r+1][c])/4)
            if cc>=par[2][3] then cc=cc-par[2][3] end if
            buff[r-1][c]=cc
         end for
      end for
   else              -- curtain-like fire
      for r = ymax-ymin to 2 by -1 do
         for c = 2 to xmax-xmin do
            cc = floor((buff[r][c-1]+buff[r][c]+
                        buff[r][c+1]+buff[r+1][c])/4)
            if cc>0 then cc=cc-1 end if
            buff[r-1][c]=cc
         end for
      end for
   end if
end procedure

procedure spark()
   integer x,y
   if rand(par[3][3])-1 then
      x=rand(xmax-xmin-21)+10
      y=rand(ymax-ymin-40)+20
      buff[y][x]=255
      while rand(2)-1 do
         y=y+1
         buff[y][x]=par[1][3]
      end while
   end if
end procedure

procedure seed_fire()
   -- set randomly the bottom two lines to start the fire & to keep it going
   integer imin
   imin=floor(0.1*par[4][3]*par[1][3])
   for i = 2 to xmax-xmin do
      buff[ymax-ymin][i]=imin+rand(par[1][3]-imin)
   end for
   buff[ymax-ymin+1]=buff[ymax-ymin]
end procedure

procedure control()
   if key=27 then             -- esc: exit
      finished=true
   end if
end procedure

procedure set_palette()
   -- set yellow-red-slate fire palette
   pal=repeat({0,0,0},256)
   for i = 1 to 32 do
      pal[i]=floor({0,(i-1)/2,(i-1)/2})
      pal[i+32]=floor({(i-1)/2,15,16-i/2})
   end for
   for i = 1 to 64 do
      pal[i+64]=floor({15+3*i/4,15,0})
      pal[i+128]=floor({63,15+3*i/4,0})
      pal[i+192]=floor({63,63,15+3*i/4})
   end for
   all_palette(pal)
end procedure


-- main
------------------------------------------------------------------------

-- fire parameters: defaults
--         min  max  cur  def
par = {{    45, 255, 255,255},    -- fire intensity
       {     1,   5,   1,  1},    -- fire attenuation
       {     1,  10,  10, 10},    -- sparks
       {     0,   9,   0,  0},    -- fire smoothness at base
       {    200, 320, 320,600}}    -- fire window width (def: 40/320/100/100)
cp=1        -- currently active parameter
finished= false
plasma=true
xmin=(320-par[5][3])/2
xmax=xmin+par[5][3]-1
buff=repeat(repeat(0,xmax-xmin+1),ymax-ymin+1)

--introduction()
junk = graphics_mode(19)

set_palette()

--n=0                        -- frame count
while not finished do      -- main loop
   seed_fire()
   burn()                  -- main fire routine
   write_buff()            -- show updated buffer
--   if n>80 then
--    spark()
--   end if
--   n=n+1                   -- increment frame count
   key=get_key()
   if key!=-1 then
    control()
   end if
end while

junk = graphics_mode(-1)

--END CODE--

new topic     » topic index » view message » categorize

2. Re: sloooooww

Optimization, my friend, optimization!

> procedure write_buff()
>    for y = ymin to ymax-2 do
>       poke(a+320*y+xmin,buff[y-ymin+1])
>    end for
> end procedure

How about adding 320 instead of multiplying by it? i.e.

        poke(a+q+xmin, ...)
        q = q + 320

This would be much, much faster. And since write_buff() gets called a lot ...
Also see about stuffing some of your calculations in a lookup table, or
at least sticking predictable values in a table to reduce the amount of
multiplications you have to do. Mults and divs kill execution time.

--
Cameron Kaiser
http://www.sserv.com/
spectre at sserv.com
--

new topic     » goto parent     » topic index » view message » categorize

3. Re: sloooooww

Cameron Kaiser wrote:

>Optimization, my friend, optimization!
>> procedure write_buff()
>>    for y = ymin to ymax-2 do
>>       poke(a+320*y+xmin,buff[y-ymin+1])
>>    end for
>> end procedure
>
>How about adding 320 instead of multiplying by it? i.e.
>        poke(a+q+xmin, ...)
>       q = q + 320
>
>This would be much, much faster. And since write_buff() gets called a lot
...
>Also see about stuffing some of your calculations in a lookup table, or
>at least sticking predictable values in a table to reduce the amount of
>multiplications you have to do. Mults and divs kill execution time.


I'm sorry, but that is the most ridiculous thing I have ever heard.  Adding
320 to y in procedure write_buff will seriously screw the program.  You
*have* to multiply in
order to get each line in its proper place (ie. to advance down the screen).
 Adding
might draw everything on a single line, or crash the program.

Also, write_buff does *not* get called "a lot" when you compare it to other
parts of the code.  It only gets called once for every new screen completed.
 By doing
execution and time profiles, you can see that for an approximate 30 second
run,
write_buff is only called 86 times, which results in a total of 16,168 pokes
to the vga
screen.  If that was the only thing running, it would be much faster than we
see now.

Write_buff only takes about 10% of the processing time.  Over 85% of the
processing is done inside procedure burn().  Here is an execution count
profile, and
a time profile from procedures write_buf and burn:
    Execution count:
       |procedure write_buff()
    86 |   for y = ymin to ymax-2 do
 16168 |      poke(a+320*y+xmin,buff[y-ymin+1])
 16168 |   end for
    86 |end procedure
       |
       |procedure burn()     -- fire calculations
    86 |   if plasma then    -- plasma-like fire
    86 |      for r = 2 to ymax-ymin do
 16168 |         for c = 2 to xmax-xmin do
5141424 |            cc = floor((buff[r][c-1]+buff[r][c]+
       |                        buff[r][c+1]+buff[r+1][c])/4)
5141424 |            if cc>=par[2][3] then cc=cc-par[2][3] end if
5141424 |            buff[r-1][c]=cc
5141424 |         end for
 16168 |      end for
       |   else              -- curtain-like fire
       | -- omitted, never executed
       |   end if
    86 |end procedure

        Time profile:
       |procedure write_buff()
       |   for y = ymin to ymax-2 do
 10.10 |      poke(a+320*y+xmin,buff[y-ymin+1])
       |   end for
       |end procedure
       |
       |procedure burn()     -- fire calculations
       |   if plasma then    -- plasma-like fire
       |      for r = 2 to ymax-ymin do
  0.06 |         for c = 2 to xmax-xmin do
 62.56 |            cc = floor((buff[r][c-1]+buff[r][c]+
       |                        buff[r][c+1]+buff[r+1][c])/4)
 14.35 |            if cc>=par[2][3] then cc=cc-par[2][3] end if
  9.02 |            buff[r-1][c]=cc
  1.97 |         end for
       |      end for
       |   else              -- curtain-like fire
       |  -- omitted, never executed
       |   end if
       |end procedure

As you can see, burn is doing a *huge* amount of addition and division, doing
over
5,141,424 additions and divisions.  (just to be sure, i did profiles with
write_buff
commented out.  the extra speed increase only gained 14 more calls to burn().
obviously, write_buff is *not* the bottleneck that Cameron believes it to
be.)  The
trick, then, is to make the calculations in burn go faster.  A *lot* faster.

I wanted to see how fast euphoria could do division, so here's a sample
program:

atom s, e
integer t, r
t = 9
s = time()
for x = 1 to 5141424 do
    r = floor(x/t)
end for
e = time()
printf(1,"%d",(e - s))

This little baby runs in about 3 seconds on my p133!  Why then does burn()
run so slow when 5 million divides take about 3 seconds?  This interested me,
so I modified
my little program to this:
atom s, e
atom t, r, c
t = 7
c = 13
s = time()
for x = 1 to 5141424 do
    t = t + c
    c = c + t
    r = floor((x + t + c)/4)
end for
e = time()
printf(1,"%d",(e - s))

Now, it takes a whoping 47 seconds to run!  The number of divisions did not
increase, but additions were thrown into the mix.  So how do we speed up
burn?
Easy, get rid of the additions!  Except now we don't get pretty fire...

Anybody know how to get rid of enough calculations inside burn to make it
fast?
Without degrading the quality of the fire??

James Powell
(Damn, this was a long post!  Probably not much help, either...  smile

new topic     » goto parent     » topic index » view message » categorize

4. Re: sloooooww

> >Optimization, my friend, optimization!
> >> procedure write_buff()
> >>    for y = ymin to ymax-2 do
> >>       poke(a+320*y+xmin,buff[y-ymin+1])
> >>    end for
> >> end procedure
> >
> >How about adding 320 instead of multiplying by it? i.e.
> >        poke(a+q+xmin, ...)
> >       q = q + 320
> >
> >This would be much, much faster. And since write_buff() gets called a lot
>
> I'm sorry, but that is the most ridiculous thing I have ever heard.  Adding
> 320 to y in procedure write_buff will seriously screw the program.  You
> *have* to multiply in
> order to get each line in its proper place (ie. to advance down the screen).
>  Adding
> might draw everything on a single line, or crash the program.

I think you misunderstand me. q is just a buffer variable. Instead of
multiplying y times 320 every time, start q off with ymin * 320. Then add 320
to q each loop iteration. Run the code first before you criticize it, okay? smile
In a sense, this is an unrolling-the-loop, since mults are simply repeated
adds.

However, your point about write_buff()'s call frequency is well taken. I
should have done a proper execution profile.

--
Cameron Kaiser
http://www.sserv.com/
spectre at sserv.com
--

new topic     » goto parent     » topic index » view message » categorize

5. Re: sloooooww

Cameron Kaiser wrote:

> I think you misunderstand me. q is just a buffer variable. Instead of
> multiplying y times 320 every time, start q off with ymin * 320. Then
> add 320
> to q each loop iteration. Run the code first before you criticize it,
> okay? smile
> In a sense, this is an unrolling-the-loop, since mults are simply
> repeated
> adds.
>
> However, your point about write_buff()'s call frequency is well taken.
> I
> should have done a proper execution profile.

    There are three things, so slow in Euphoria that trying another way
is worth the trouble:

    * Calling other functions or procedures in the main loop...
(optimize them either to include the read code or to call the
machine_proc or machine_func yourself)
    * Composing a temporary sequence  (example:

            my_routine ({x,y,s,{e,r,t}})

        is slower than...

            my_seq = {0,0,0{0,0,0}}

            ..inner loop...
            my_seq[1] = x
            my_seq[2] = y
            etc.

    End of example)

    *  Heavy calculations, avoid atoms, always use integers, avoid
connversions, use floor, simply never have an atom, you could just
figure out a way to calculate the base and exponent seperately (iy your
Einstein) for your calculation, and then make an atom out of it when you
return, need or use the value.

    I know these are just easily made comments and have no real effect
on the code, but i'll promise, i'll look into it tomorrow and rewrite it
as optimized as i possible can.

new topic     » goto parent     » topic index » view message » categorize

6. Re: sloooooww

--------------6AD2306A6804

klepto wrote:
> Could someone take a look at this code, and maybe tell me why it's so slow?
> And possibly even let me know how I could speed it up? Thanks

Here's my attempt at making it faster, but I cheated I think.  I used
assembler.  I tried optimizing in plain Euphoria and couldn't get it to
go much faster.

from Pete

--------------6AD2306A6804

new topic     » goto parent     » topic index » view message » categorize

7. Re: sloooooww

Sorry about that last post.  I forgot that attaching files doesn't work
well with the listserv.  Anyway I uploaded the zip to
Again, sorry
--
                   _____         _____        _____
    ________      /\    \       /\    \      /\    \
   /   \    \    /  \____\     /  \____\    /  \____\
  /  _  \____\  /   / ___/_   /   /____/   /   / ___/_
 /  / \  [___] /   / /\____\ /    \    \  /   / /\____\
 \  \_/ /    / \   \/ / ___/_\     \    \ \   \/ / ___/_
  \    /____/   \    / /\    \\/\   \    \ \    / /\    \
   \   \    \    \   \/  \____\  \   \    \ \   \/  \____\
    \   \    \    \      /    /   \   \____\ \      /    /
     \   \____\    \    /    /     \  /    /  \    /    /
      \  /    /     \  /    /       \/____/    \  /    /
       \/____/       \/____/xseal at harborside.com\/____/

new topic     » goto parent     » topic index » view message » categorize

8. Re: sloooooww

Pete wrote:
>
>klepto wrote:
>> Could someone take a look at this code, and maybe tell me why it's so slow?
>> And possibly even let me know how I could speed it up? Thanks
>
>Here's my attempt at making it faster, but I cheated I think.  I used
>assembler.  I tried optimizing in plain Euphoria and couldn't get it to
>go much faster.
>

Another nice work, Pete,
(Why is your work always nice?, i'm nearly bored to have to say that
"Pete, your work is very nice.... blah blah..." smile)

i also attempted to speed up klepto's source with plane euphoria, but,
finally concluded from some timing test that "cheating(using machine code)"
is the *only* way to achieve it.

anyway your's are so fast...
one suggestion in your work: Could you modify your source to hide a few
bottom lines of fire? i think it seems better to see if a few bottom
lines near seed line are hidden. i tried ymax=203. it worked, however, leaves
CW error when exit.

Bye! -- from Lee, woo seob.
p.s. i hope you have received successfully the
svga virtual page library zip that i sent.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu