1. sloooooww
- Posted by klepto <darkrain at PLAZMA.NET> Oct 23, 1997
- 684 views
Could someone take a look at this code, and maybe tell me why it's so slow? And possibly even let me know how I could speed it up? Thanks ---Code begins here--- -- fireplay.ex : inspired by Gerhard Piran's Burn -- Jiri babor -- J.babor at gns.cri.nz -- version 1.00 97-01-13 include graphics.e include get.e constant a=#A0000, ymin=10, ymax= 199, -- top & bottom of fire window true=1, false=0 integer cc, junk, key, plasma, cp, finished, xmin, xmax sequence buff, pal, par procedure write_buff() for y = ymin to ymax-2 do poke(a+320*y+xmin,buff[y-ymin+1]) end for end procedure procedure burn() -- fire calculations if plasma then -- plasma-like fire for r = 2 to ymax-ymin do for c = 2 to xmax-xmin do cc = floor((buff[r][c-1]+buff[r][c]+ buff[r][c+1]+buff[r+1][c])/4) if cc>=par[2][3] then cc=cc-par[2][3] end if buff[r-1][c]=cc end for end for else -- curtain-like fire for r = ymax-ymin to 2 by -1 do for c = 2 to xmax-xmin do cc = floor((buff[r][c-1]+buff[r][c]+ buff[r][c+1]+buff[r+1][c])/4) if cc>0 then cc=cc-1 end if buff[r-1][c]=cc end for end for end if end procedure procedure spark() integer x,y if rand(par[3][3])-1 then x=rand(xmax-xmin-21)+10 y=rand(ymax-ymin-40)+20 buff[y][x]=255 while rand(2)-1 do y=y+1 buff[y][x]=par[1][3] end while end if end procedure procedure seed_fire() -- set randomly the bottom two lines to start the fire & to keep it going integer imin imin=floor(0.1*par[4][3]*par[1][3]) for i = 2 to xmax-xmin do buff[ymax-ymin][i]=imin+rand(par[1][3]-imin) end for buff[ymax-ymin+1]=buff[ymax-ymin] end procedure procedure control() if key=27 then -- esc: exit finished=true end if end procedure procedure set_palette() -- set yellow-red-slate fire palette pal=repeat({0,0,0},256) for i = 1 to 32 do pal[i]=floor({0,(i-1)/2,(i-1)/2}) pal[i+32]=floor({(i-1)/2,15,16-i/2}) end for for i = 1 to 64 do pal[i+64]=floor({15+3*i/4,15,0}) pal[i+128]=floor({63,15+3*i/4,0}) pal[i+192]=floor({63,63,15+3*i/4}) end for all_palette(pal) end procedure -- main ------------------------------------------------------------------------ -- fire parameters: defaults -- min max cur def par = {{ 45, 255, 255,255}, -- fire intensity { 1, 5, 1, 1}, -- fire attenuation { 1, 10, 10, 10}, -- sparks { 0, 9, 0, 0}, -- fire smoothness at base { 200, 320, 320,600}} -- fire window width (def: 40/320/100/100) cp=1 -- currently active parameter finished= false plasma=true xmin=(320-par[5][3])/2 xmax=xmin+par[5][3]-1 buff=repeat(repeat(0,xmax-xmin+1),ymax-ymin+1) --introduction() junk = graphics_mode(19) set_palette() --n=0 -- frame count while not finished do -- main loop seed_fire() burn() -- main fire routine write_buff() -- show updated buffer -- if n>80 then -- spark() -- end if -- n=n+1 -- increment frame count key=get_key() if key!=-1 then control() end if end while junk = graphics_mode(-1) --END CODE--
2. Re: sloooooww
- Posted by Cameron Kaiser <spectre at WWW2.BUOY.COM> Oct 23, 1997
- 705 views
Optimization, my friend, optimization! > procedure write_buff() > for y = ymin to ymax-2 do > poke(a+320*y+xmin,buff[y-ymin+1]) > end for > end procedure How about adding 320 instead of multiplying by it? i.e. poke(a+q+xmin, ...) q = q + 320 This would be much, much faster. And since write_buff() gets called a lot ... Also see about stuffing some of your calculations in a lookup table, or at least sticking predictable values in a table to reduce the amount of multiplications you have to do. Mults and divs kill execution time. -- Cameron Kaiser http://www.sserv.com/ spectre at sserv.com --
3. Re: sloooooww
- Posted by James Powell <PowellSoft at AOL.COM> Oct 23, 1997
- 695 views
Cameron Kaiser wrote: >Optimization, my friend, optimization! >> procedure write_buff() >> for y = ymin to ymax-2 do >> poke(a+320*y+xmin,buff[y-ymin+1]) >> end for >> end procedure > >How about adding 320 instead of multiplying by it? i.e. > poke(a+q+xmin, ...) > q = q + 320 > >This would be much, much faster. And since write_buff() gets called a lot ... >Also see about stuffing some of your calculations in a lookup table, or >at least sticking predictable values in a table to reduce the amount of >multiplications you have to do. Mults and divs kill execution time. I'm sorry, but that is the most ridiculous thing I have ever heard. Adding 320 to y in procedure write_buff will seriously screw the program. You *have* to multiply in order to get each line in its proper place (ie. to advance down the screen). Adding might draw everything on a single line, or crash the program. Also, write_buff does *not* get called "a lot" when you compare it to other parts of the code. It only gets called once for every new screen completed. By doing execution and time profiles, you can see that for an approximate 30 second run, write_buff is only called 86 times, which results in a total of 16,168 pokes to the vga screen. If that was the only thing running, it would be much faster than we see now. Write_buff only takes about 10% of the processing time. Over 85% of the processing is done inside procedure burn(). Here is an execution count profile, and a time profile from procedures write_buf and burn: Execution count: |procedure write_buff() 86 | for y = ymin to ymax-2 do 16168 | poke(a+320*y+xmin,buff[y-ymin+1]) 16168 | end for 86 |end procedure | |procedure burn() -- fire calculations 86 | if plasma then -- plasma-like fire 86 | for r = 2 to ymax-ymin do 16168 | for c = 2 to xmax-xmin do 5141424 | cc = floor((buff[r][c-1]+buff[r][c]+ | buff[r][c+1]+buff[r+1][c])/4) 5141424 | if cc>=par[2][3] then cc=cc-par[2][3] end if 5141424 | buff[r-1][c]=cc 5141424 | end for 16168 | end for | else -- curtain-like fire | -- omitted, never executed | end if 86 |end procedure Time profile: |procedure write_buff() | for y = ymin to ymax-2 do 10.10 | poke(a+320*y+xmin,buff[y-ymin+1]) | end for |end procedure | |procedure burn() -- fire calculations | if plasma then -- plasma-like fire | for r = 2 to ymax-ymin do 0.06 | for c = 2 to xmax-xmin do 62.56 | cc = floor((buff[r][c-1]+buff[r][c]+ | buff[r][c+1]+buff[r+1][c])/4) 14.35 | if cc>=par[2][3] then cc=cc-par[2][3] end if 9.02 | buff[r-1][c]=cc 1.97 | end for | end for | else -- curtain-like fire | -- omitted, never executed | end if |end procedure As you can see, burn is doing a *huge* amount of addition and division, doing over 5,141,424 additions and divisions. (just to be sure, i did profiles with write_buff commented out. the extra speed increase only gained 14 more calls to burn(). obviously, write_buff is *not* the bottleneck that Cameron believes it to be.) The trick, then, is to make the calculations in burn go faster. A *lot* faster. I wanted to see how fast euphoria could do division, so here's a sample program: atom s, e integer t, r t = 9 s = time() for x = 1 to 5141424 do r = floor(x/t) end for e = time() printf(1,"%d",(e - s)) This little baby runs in about 3 seconds on my p133! Why then does burn() run so slow when 5 million divides take about 3 seconds? This interested me, so I modified my little program to this: atom s, e atom t, r, c t = 7 c = 13 s = time() for x = 1 to 5141424 do t = t + c c = c + t r = floor((x + t + c)/4) end for e = time() printf(1,"%d",(e - s)) Now, it takes a whoping 47 seconds to run! The number of divisions did not increase, but additions were thrown into the mix. So how do we speed up burn? Easy, get rid of the additions! Except now we don't get pretty fire... Anybody know how to get rid of enough calculations inside burn to make it fast? Without degrading the quality of the fire?? James Powell (Damn, this was a long post! Probably not much help, either...
4. Re: sloooooww
- Posted by Cameron Kaiser <spectre at WWW2.BUOY.COM> Oct 23, 1997
- 708 views
> >Optimization, my friend, optimization! > >> procedure write_buff() > >> for y = ymin to ymax-2 do > >> poke(a+320*y+xmin,buff[y-ymin+1]) > >> end for > >> end procedure > > > >How about adding 320 instead of multiplying by it? i.e. > > poke(a+q+xmin, ...) > > q = q + 320 > > > >This would be much, much faster. And since write_buff() gets called a lot > > I'm sorry, but that is the most ridiculous thing I have ever heard. Adding > 320 to y in procedure write_buff will seriously screw the program. You > *have* to multiply in > order to get each line in its proper place (ie. to advance down the screen). > Adding > might draw everything on a single line, or crash the program. I think you misunderstand me. q is just a buffer variable. Instead of multiplying y times 320 every time, start q off with ymin * 320. Then add 320 to q each loop iteration. Run the code first before you criticize it, okay? In a sense, this is an unrolling-the-loop, since mults are simply repeated adds. However, your point about write_buff()'s call frequency is well taken. I should have done a proper execution profile. -- Cameron Kaiser http://www.sserv.com/ spectre at sserv.com --
5. Re: sloooooww
- Posted by Ralf Nieuwenhuijsen <nieuwen at XS4ALL.NL> Oct 24, 1997
- 728 views
Cameron Kaiser wrote: > I think you misunderstand me. q is just a buffer variable. Instead of > multiplying y times 320 every time, start q off with ymin * 320. Then > add 320 > to q each loop iteration. Run the code first before you criticize it, > okay? > In a sense, this is an unrolling-the-loop, since mults are simply > repeated > adds. > > However, your point about write_buff()'s call frequency is well taken. > I > should have done a proper execution profile. There are three things, so slow in Euphoria that trying another way is worth the trouble: * Calling other functions or procedures in the main loop... (optimize them either to include the read code or to call the machine_proc or machine_func yourself) * Composing a temporary sequence (example: my_routine ({x,y,s,{e,r,t}}) is slower than... my_seq = {0,0,0{0,0,0}} ..inner loop... my_seq[1] = x my_seq[2] = y etc. End of example) * Heavy calculations, avoid atoms, always use integers, avoid connversions, use floor, simply never have an atom, you could just figure out a way to calculate the base and exponent seperately (iy your Einstein) for your calculation, and then make an atom out of it when you return, need or use the value. I know these are just easily made comments and have no real effect on the code, but i'll promise, i'll look into it tomorrow and rewrite it as optimized as i possible can.
6. Re: sloooooww
- Posted by Pete Eberlein <xseal at HARBORSIDE.COM> Oct 24, 1997
- 710 views
--------------6AD2306A6804 klepto wrote: > Could someone take a look at this code, and maybe tell me why it's so slow? > And possibly even let me know how I could speed it up? Thanks Here's my attempt at making it faster, but I cheated I think. I used assembler. I tried optimizing in plain Euphoria and couldn't get it to go much faster. from Pete --------------6AD2306A6804
7. Re: sloooooww
- Posted by Pete Eberlein <xseal at HARBORSIDE.COM> Oct 24, 1997
- 678 views
Sorry about that last post. I forgot that attaching files doesn't work well with the listserv. Anyway I uploaded the zip to Again, sorry -- _____ _____ _____ ________ /\ \ /\ \ /\ \ / \ \ / \____\ / \____\ / \____\ / _ \____\ / / ___/_ / /____/ / / ___/_ / / \ [___] / / /\____\ / \ \ / / /\____\ \ \_/ / / \ \/ / ___/_\ \ \ \ \/ / ___/_ \ /____/ \ / /\ \\/\ \ \ \ / /\ \ \ \ \ \ \/ \____\ \ \ \ \ \/ \____\ \ \ \ \ / / \ \____\ \ / / \ \____\ \ / / \ / / \ / / \ / / \ / / \/____/ \ / / \/____/ \/____/xseal at harborside.com\/____/
8. Re: sloooooww
- Posted by Lee woo seob <wslee at HHI.CO.KR> Oct 27, 1997
- 698 views
Pete wrote: > >klepto wrote: >> Could someone take a look at this code, and maybe tell me why it's so slow? >> And possibly even let me know how I could speed it up? Thanks > >Here's my attempt at making it faster, but I cheated I think. I used >assembler. I tried optimizing in plain Euphoria and couldn't get it to >go much faster. > Another nice work, Pete, (Why is your work always nice?, i'm nearly bored to have to say that "Pete, your work is very nice.... blah blah..." ) i also attempted to speed up klepto's source with plane euphoria, but, finally concluded from some timing test that "cheating(using machine code)" is the *only* way to achieve it. anyway your's are so fast... one suggestion in your work: Could you modify your source to hide a few bottom lines of fire? i think it seems better to see if a few bottom lines near seed line are hidden. i tried ymax=203. it worked, however, leaves CW error when exit. Bye! -- from Lee, woo seob. p.s. i hope you have received successfully the svga virtual page library zip that i sent.