1. which is faster?
- Posted by don cole <doncole at pacbell.net> May 08, 2007
- 683 views
Hello everybody, Has anybody benchmarked this:
function won_lost(sequence a,sequence b) if equal(a,b) then return "Won" end if return "Lost" end function -------------------------or----------------------- function won_lost(sequence a,sequence b) if equal(a,b) then return "Won" else return "Lost" end if end function
Thanks Bernie, Don Cole
2. Re: which is faster?
- Posted by egg <egege at rambler.ru> May 08, 2007
- 724 views
the second is faster
3. Re: which is faster?
- Posted by egg <egege at rambler.ru> May 08, 2007
- 650 views
You may use this simple speed comparer: include sort.e include machine.e include misc.e include get.e include graphics.e atom start,finish integer cycles,x,y sequence s,t,a s = "Äà" t = "Íåò" cycles = 10000000 puts(1,"Start...\n") start = time() function won_lost1(sequence a,sequence b) if equal(a,b) then return "Won" end if return "Lost" end function -------------------------or----------------------- function won_lost2(sequence a,sequence b) if equal(a,b) then return "Won" else return "Lost" end if end function for i=1 to cycles do --construction1 a = won_lost1(s,t) end for finish = time() - start position(23,2) printf(1,"time1 = %f sec\n ",finish) start = time() for i=1 to cycles do --construction2 a = won_lost2(s,t) end for finish = time() - start printf(1,"time2 = %f sec\n",finish) x = wait_key()
4. Re: which is faster?
- Posted by egg <egege at rambler.ru> May 08, 2007
- 664 views
sorry, there was a mistake It must done so: include sort.e include machine.e include misc.e include get.e include graphics.e atom start,finish integer cycles,x,y sequence s,t,a s = "Yes" t = "No" cycles = 10000000 puts(1,"Start...\n") function won_lost1(sequence a,sequence b) if equal(a,b) then return "Won" end if return "Lost" end function -------------------------or----------------------- function won_lost2(sequence a,sequence b) if equal(a,b) then return "Won" else return "Lost" end if end function start = time() for i=1 to cycles do --construction1 a = won_lost1(s,t) end for finish = time() - start position(23,2) printf(1,"time1 = %f sec\n ",finish) start = time() for i=1 to cycles do --construction2 a = won_lost2(s,t) end for finish = time() - start printf(1,"time2 = %f sec\n",finish) x = wait_key()
5. Re: which is faster?
- Posted by don cole <doncole at pacbell.net> May 09, 2007
- 658 views
Hello egg, Thanks for the trouble. Don Cole
6. which is faster?
- Posted by Jules <jdavy at d?l.pi?ex.com> Nov 08, 2007
- 641 views
I'm squaring values in a loop, is it faster to use (s[i]-a)*(s[i]-a) or power((s[i]-a), 2)? Thanks.
7. Re: which is faster?
- Posted by Jules <jdavy at dsl.p?pex.co?> Nov 08, 2007
- 644 views
this is what I want to optimise:
global function st_dev(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) s -= m for i = 1 to length(s) do sum += power(s[i], 2) end for sd = sqrt(sum/(length(s)-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function
can this be improved?
8. Re: which is faster?
- Posted by Jules <jdavy at ds?.pipex.c?m> Nov 08, 2007
- 613 views
that should have been:
function st_dev(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) s -= m for i = 1 to l do -- sum of squared deviations sum += power(s[i], 2) end for sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function
9. Re: which is faster?
- Posted by CChris <christian.cuvier at agricu?ture.gouv.?r> Nov 08, 2007
- 630 views
I didn't quote original message, because the formatting was very messy and unreadable, I don't know why for sure. Instead of substracting m to every element, just do
m = mean(s) for i = 1 to l do -- sum of squared deviations sum += power(s[i], 2) end for sum -= m*m -- same result, but avoids the s-=m statement
I didn't check whether this was better:
atom si --... m = mean(s) for i = 1 to l do -- sum of squared deviations a = s[i] sum += a*a end for sum -= m*m
but I'd bet it's faster than calling power(). Also, computing the sum of elements of s inside the for loop will shave the call to mean() and speed things even more. CChris
10. Re: which is faster?
- Posted by CChris <christian.cuvier at agric?l?ure.gouv.fr> Nov 08, 2007
- 623 views
Ooops, should be sum -= l*m*m CChris
11. Re: which is faster?
- Posted by Michael J. Sabal <m_sabal at ya?oo.com> Nov 08, 2007
- 630 views
Jules wrote: > > I'm squaring values in a loop, is it faster to use (s[i]-a)*(s[i]-a) or > power((s[i]-a), > 2)? > Thanks. The interpreter optimizes power(x,2) to x*x, so at runtime they are the same.
12. Re: which is faster?
- Posted by CChris <christian.cuvier at a?ricu?ture.gouv.fr> Nov 08, 2007
- 650 views
Here is a benchmark to prove and disprove stuff....
sequence s,s1 -- data constant num_runs=10000, len_data=10000 s=repeat(0,num_runs) s1=repeat(0,len_data) -- generate random data for k=1 to num_runs do for i=1 to len_data do s1[i]=rand(12345+k) end for s[k]=s1 end for puts(1,"Data initialised, starting benchmark...\n") function mean(sequence s) -- adds verything up in s, returns average atom result result=s[1] for i=2 to length(s) do result+=s[i] end for return result/length(s) end function global function st_dev0(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) s -= m for i = 1 to length(s) do sum += power(s[i], 2) end for sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev1(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) for i = 1 to length(s) do sum += power(s[i], 2) end for sum -= m*m*l sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev2(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) for i = 1 to length(s) do sd = s[i] sum += sd*sd end for sum -= m*m*l sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev3(sequence s, atom X) atom m, sum, sd , m2 integer l sum = 0 l = length(s) m = mean(s) m2 = m*m for i = 1 to length(s) do sd = s[i] sum += sd*sd end for sum -= m2 sd = sqrt((sum/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev4(sequence s, atom X) atom m, sum, sum_sq, sd , m2 integer l sum_sq = 0.0 sum = 0.0 l = length(s) for i = 1 to length(s) do sd = s[i] sum += sd sum_sq += sd*sd end for m = sum/l m2 = m*m sum_sq -= m2 sd = sqrt((sum_sq/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev5(sequence s, atom X) atom m, sum, sum_sq, sd , m2 integer l sum_sq = 0.0 sum = 0.0 l = length(s) for i = 1 to length(s) do sum += s[i] sum_sq += power(s[i],2) end for m = sum/l m2 = m*m sum_sq -= m2 sd = sqrt((sum_sq/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function atom t,junk t=time() for i=1 to length(s) do junk=st_dev0(s[i],0.0) end for printf(1,"Variant form #0 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev1(s[i],0.0) end for printf(1,"Variant form #1 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev2(s[i],0.0) end for printf(1,"Variant form #2 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev3(s[i],0.0) end for printf(1,"Variant form #3 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev4(s[i],0.0) end for printf(1,"Variant form #4 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev5(s[i],0.0) end for printf(1,"Variant form #5 took %3.6gs.\n",time()-t) ?machine_func(26,0)
On an AMD3300+, 2.0GHz, 960Mo RAM under WinXP SP2 Pro, I'm getting the following output from ex.exe: Data initialised, starting benchmark... Variant form #0 took 75.46s. Variant form #1 took 14.89s. Variant form #2 took 16.81s. Variant form #3 took 16.97s. Variant form #4 took 16.37s. Variant form #5 took 17.68s. I thought that precomputing s[i] would speed things up, but this didn't occur. Not sure why. I didn't try to vary the order of tests. CChris
13. Re: which is faster?
- Posted by Jules <jdavy at dsl.pipe?.?om> Nov 08, 2007
- 664 views
wow thanks! I'm amazed at the comparative slowness of variant #0...
14. Re: which is faster?
- Posted by Jules <jdavy at dsl.p?p?x.com> Nov 08, 2007
- 624 views
on my system (Amd Athlon dual-core 6000+, running Linux) Variant form #0 took 26.53s. Variant form #1 took 10.9s. Variant form #2 took 12.16s. Variant form #3 took 12.12s. Variant form #4 took 11.2s. Variant form #5 took 12.49s.
15. Re: which is faster?
- Posted by Matt Lewis <matthewwalkerlewis at gmail.??m> Nov 08, 2007
- 630 views
- Last edited Nov 09, 2007
CChris wrote: > > > I thought that precomputing s[i] would speed things up, but this didn't occur. > Not sure why. I think you didn't use s[i] enough to make the difference. Also, the interpreter is smart enough to reuse the temp variable that it creates. Here's the IL that's generated when you don't try to cache s[i]: 40: 058 119 # STARTLINE: string.exw(119)<<sum += power(s[i], # 2)>> 42: 025 171 178 179 # RHS_SUBS: [s:171] sub [i:178] => [_temp_:179] 46: 013 179 179 179 # MULTIPLY: [_temp_:179], [_temp_:179] => # [_temp_:179] 50: 011 174 179 174 # PLUS: [sum:174], [_temp_:179] => [sum:174] Here's the IL when you *do* cache s[i]: 52: 058 138 # STARTLINE: string.exw(138)<<sd = s[i]>> 54: 025 181 189 185 # RHS_SUBS: [s:181] sub [i:189] => [sd:185] 58: 101 185 # ATOM_CHECK: [sd:185] 60: 087 185 # DISPLAY_VAR: [sd:185] 62: 058 139 # STARTLINE: string.exw(139)<<sum += sd*sd>> 64: 013 185 185 190 # MULTIPLY: [sd:185], [sd:185] => [_temp_:190] 68: 011 184 190 184 # PLUS: [sum:184], [_temp_:190] => [sum:184] The STARTLINE and DISPLAY_VAR codes are due to with trace (which I put in so we could see the actual lines for clarity) and don't really affect the algorithm (assuming that a production release would make sure that without trace was in effect). But the bottom line is that you've added an extra ATOM_CHECK op that doesn't happen when you use it straight. I think the interpreter is able to do this because it's on the same line (though I'll admit that the logic of when to use temps and when to save them is still opaque to me, so Rob is probably the only one who could explain this without pouring over the code). Matt
16. Re: which is faster?
- Posted by Robert Craig <rds at RapidEuphor?a.com> Nov 08, 2007
- 662 views
- Last edited Nov 09, 2007
Matt Lewis wrote: > CChris wrote: > > I thought that precomputing s[i] would speed things up, but this didn't > > occur. > > Not sure why. > > I think you didn't use s[i] enough to make the difference. Also, the > interpreter is smart enough to reuse the temp variable that it creates. > Here's the IL that's generated when you don't try to cache s[i]: > > 40: 058 119 # STARTLINE: string.exw(119)<<sum += power(s[i], > # 2)>> > 42: 025 171 178 179 # RHS_SUBS: [s:171] sub [i:178] => [_temp_:179] > 46: 013 179 179 179 # MULTIPLY: [_temp_:179], [_temp_:179] => > # [_temp_:179] > 50: 011 174 179 174 # PLUS: [sum:174], [_temp_:179] => [sum:174] > > Here's the IL when you *do* cache s[i]: > 52: 058 138 # STARTLINE: string.exw(138)<<sd = s[i]>> > 54: 025 181 189 185 # RHS_SUBS: [s:181] sub [i:189] => [sd:185] > 58: 101 185 # ATOM_CHECK: [sd:185] > 60: 087 185 # DISPLAY_VAR: [sd:185] > 62: 058 139 # STARTLINE: string.exw(139)<<sum += sd*sd>> > 64: 013 185 185 190 # MULTIPLY: [sd:185], [sd:185] => [_temp_:190] > 68: 011 184 190 184 # PLUS: [sum:184], [_temp_:190] => [sum:184] > > The STARTLINE and DISPLAY_VAR codes are due to with trace (which I put in > so we could see the actual lines for clarity) and don't really affect > the algorithm (assuming that a production release would make sure that > without trace was in effect). > > But the bottom line is that you've added an extra ATOM_CHECK op that > doesn't happen when you use it straight. I think the interpreter is able > to do this because it's on the same line (though I'll admit that the > logic of when to use temps and when to save them is still opaque to > me, so Rob is probably the only one who could explain this without pouring > over the code). When type_check is in force (by default) then the interpreter has to do the ATOM_CHECK when you assign to a variable declared as atom. If you say "without type_check", that check should be eliminated. As Michael Sabal noted, power(x, 2) is optimized to x * x (see emit.e), so you get: temp = s[i] temp = temp * temp It's smart enough to re-use the same temp. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
17. Re: which is faster?
- Posted by CChris <christian.cuvier at ?gri?ulture.gouv.fr> Nov 09, 2007
- 642 views
Matt Lewis wrote: > > CChris wrote: > > > > > > I thought that precomputing s[i] would speed things up, but this didn't > > occur. > > Not sure why. > > I think you didn't use s[i] enough to make the difference. Also, the > interpreter is smart enough to reuse the temp variable that it creates. > Here's the IL that's generated when you don't try to cache s[i]: > > 40: 058 119 # STARTLINE: string.exw(119)<<sum += power(s[i], > # 2)>> > 42: 025 171 178 179 # RHS_SUBS: [s:171] sub [i:178] => [_temp_:179] > 46: 013 179 179 179 # MULTIPLY: [_temp_:179], [_temp_:179] => > # [_temp_:179] > 50: 011 174 179 174 # PLUS: [sum:174], [_temp_:179] => [sum:174] > > Here's the IL when you *do* cache s[i]: > 52: 058 138 # STARTLINE: string.exw(138)<<sd = s[i]>> > 54: 025 181 189 185 # RHS_SUBS: [s:181] sub [i:189] => [sd:185] > 58: 101 185 # ATOM_CHECK: [sd:185] > 60: 087 185 # DISPLAY_VAR: [sd:185] > 62: 058 139 # STARTLINE: string.exw(139)<<sum += sd*sd>> > 64: 013 185 185 190 # MULTIPLY: [sd:185], [sd:185] => [_temp_:190] > 68: 011 184 190 184 # PLUS: [sum:184], [_temp_:190] => [sum:184] > > The STARTLINE and DISPLAY_VAR codes are due to with trace (which I put in > so we could see the actual lines for clarity) and don't really affect > the algorithm (assuming that a production release would make sure that > without trace was in effect). > > But the bottom line is that you've added an extra ATOM_CHECK op that > doesn't happen when you use it straight. I think the interpreter is able > to do this because it's on the same line (though I'll admit that the > logic of when to use temps and when to save them is still opaque to > me, so Rob is probably the only one who could explain this without pouring > over the code). > > Matt I see... using an object sd instead of atom sd would remove the check.... CChris
18. Re: which is faster?
- Posted by Chris Bensler <eu at creati?e?ortal.ca> Nov 09, 2007
- 657 views
CChris wrote: > > Here is a benchmark to prove and disprove stuff....
Try adding this bit of test code to verify that the variations are accurate. sequence test test = { st_dev0(s[1],0) ,st_dev1(s[1],0) ,st_dev2(s[1],0) ,st_dev3(s[1],0) ,st_dev4(s[1],0) ,st_dev5(s[1],0) } test = (test = test[1]) test[1] = 0 for i = 2 to length(test) do if test[i] = 0 then test[1] += 1 printf(1,"st_dev%d() != st_dev0()\n",{i-1}) end if end for if test[1] then machine_proc(26,{}) abort(1) end if
Chris Bensler Code is Alchemy
19. Re: which is faster?
- Posted by CChris <christian.cuvier at a?riculture.gouv.?r> Nov 09, 2007
- 632 views
Chris Bensler wrote: > > CChris wrote: > > > > Here is a benchmark to prove and disprove stuff.... > > }}} <eucode> > Try adding this bit of test code to verify that the variations are accurate. > > sequence test > test = { > st_dev0(s[1],0) > ,st_dev1(s[1],0) > ,st_dev2(s[1],0) > ,st_dev3(s[1],0) > ,st_dev4(s[1],0) > ,st_dev5(s[1],0) > } > test = (test = test[1]) > test[1] = 0 > for i = 2 to length(test) do > if test[i] = 0 then > test[1] += 1 > printf(1,"st_dev%d() != st_dev0()\n",{i-1}) > end if > end for > if test[1] then > machine_proc(26,{}) > abort(1) > end if > </eucode> {{{ > > Chris Bensler > Code is Alchemy I had tried this, and all values differ, though by very tiny amounts. This is known as roundoff error accumulating. Adding N numbers with some fuzz in a ]-eps..+eps[ range will result in a total error whose magnitude is in the order of eps*sqrt(N). With 53 bit precision on fp numbers and N=10000, you can easily expect that the few last bits are different when you change the method of computation. But there is hardly a point in estimating a deviation estimate with many significant digits, is it? You can compute a mean with more accuracy by doing:
m += ((s[i] - m)/i) w/eucode> starting with m = s[1]. The roundoff error buildup is way slower. But because of the many integer divisions, the algorithm is way slower too.
CChris }}}
20. Re: which is faster?
- Posted by CChris <christian.cuvier at ag?iculture.?ouv.fr> Nov 09, 2007
- 623 views
Robert Craig wrote: > > Matt Lewis wrote: > > CChris wrote: > > > I thought that precomputing s[i] would speed things up, but this didn't > > > occur. > > > Not sure why. > > > > I think you didn't use s[i] enough to make the difference. Also, the > > interpreter is smart enough to reuse the temp variable that it creates. > > Here's the IL that's generated when you don't try to cache s[i]: > > > > 40: 058 119 # STARTLINE: string.exw(119)<<sum += power(s[i], > > # 2)>> > > 42: 025 171 178 179 # RHS_SUBS: [s:171] sub [i:178] => [_temp_:179] > > 46: 013 179 179 179 # MULTIPLY: [_temp_:179], [_temp_:179] => > > # [_temp_:179] > > 50: 011 174 179 174 # PLUS: [sum:174], [_temp_:179] => [sum:174] > > > > Here's the IL when you *do* cache s[i]: > > 52: 058 138 # STARTLINE: string.exw(138)<<sd = s[i]>> > > 54: 025 181 189 185 # RHS_SUBS: [s:181] sub [i:189] => [sd:185] > > 58: 101 185 # ATOM_CHECK: [sd:185] > > 60: 087 185 # DISPLAY_VAR: [sd:185] > > 62: 058 139 # STARTLINE: string.exw(139)<<sum += sd*sd>> > > 64: 013 185 185 190 # MULTIPLY: [sd:185], [sd:185] => [_temp_:190] > > 68: 011 184 190 184 # PLUS: [sum:184], [_temp_:190] => [sum:184] > > > > The STARTLINE and DISPLAY_VAR codes are due to with trace (which I put in > > so we could see the actual lines for clarity) and don't really affect > > the algorithm (assuming that a production release would make sure that > > without trace was in effect). > > > > But the bottom line is that you've added an extra ATOM_CHECK op that > > doesn't happen when you use it straight. I think the interpreter is able > > to do this because it's on the same line (though I'll admit that the > > logic of when to use temps and when to save them is still opaque to > > me, so Rob is probably the only one who could explain this without pouring > > over the code). > > When type_check is in force (by default) then > the interpreter has to do the ATOM_CHECK when you > assign to a variable declared as atom. If you say > "without type_check", that check should be eliminated. > > As Michael Sabal noted, power(x, 2) is optimized > to x * x (see emit.e), so you get: > > temp = s[i] > temp = temp * temp > > It's smart enough to re-use the same temp. > > Regards, > Rob Craig > Rapid Deployment Software > <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a> Indeed, I had forgotten about that type_check one... A modified benchmark, and a pretty much modified output:
sequence s,s1 -- data constant num_runs=10000, len_data=10000 s=repeat(0,num_runs) s1=repeat(0,len_data) without type_check -- generate random data for k=1 to num_runs do for i=1 to len_data do s1[i]=rand(12345+k) end for s[k]=s1 end for puts(1,"Data initialised, starting benchmark...\n") function mean(sequence s) -- adds verything up in s, returns average atom result result=s[1] for i=2 to length(s) do result+=s[i] end for return result/length(s) end function global function st_dev0(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) s -= m for i = 1 to length(s) do sum += power(s[i], 2) end for sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev1(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) for i = 1 to length(s) do sum += power(s[i], 2) end for sum -= m*m*l sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev2(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) for i = 1 to length(s) do sd = s[i] sum += sd*sd end for sum -= m*m*l sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev3(sequence s, atom X) atom m, sum, sd , m2 integer l sum = 0 l = length(s) m = mean(s) m2 = m*m for i = 1 to length(s) do sd = s[i] sum += sd*sd end for sum -= m2 sd = sqrt((sum/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev4(sequence s, atom X) atom m, sum, sum_sq, sd , m2 integer l sum_sq = 0.0 sum = 0.0 l = length(s) for i = 1 to length(s) do sd = s[i] sum += sd sum_sq += sd*sd end for m = sum/l m2 = m*m sum_sq -= m2 sd = sqrt((sum_sq/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev5(sequence s, atom X) atom m, sum, sum_sq, sd , m2 integer l object x sum_sq = 0.0 sum = 0.0 l = length(s) for i = 1 to length(s) do x = s[i] sum += x sum_sq += x*x end for m = sum/l m2 = m*m sum_sq -= m2 sd = sqrt((sum_sq/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function atom t,junk t=time() for i=1 to length(s) do junk=st_dev0(s[i],0.0) end for printf(1,"Variant form #0 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev1(s[i],0.0) end for printf(1,"Variant form #1 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev2(s[i],0.0) end for printf(1,"Variant form #2 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev3(s[i],0.0) end for printf(1,"Variant form #3 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev4(s[i],0.0) end for printf(1,"Variant form #4 took %3.6gs.\n",time()-t) t=time() for i=1 to length(s) do junk=st_dev5(s[i],0.0) end for printf(1,"Variant form #5 took %3.6gs.\n",time()-t) ?machine_func(26,0)
Note that variant #5 was modified. This now results in: Data initialised, starting benchmark... Variant form #0 took 70.96s. Variant form #1 took 13.35s. Variant form #2 took 13.51s. Variant form #3 took 13.62s. Variant form #4 took 12.42s. Variant form #5 took 12.41s. You will notice how the call to a helper function, while it makes the code clearer and easier to maintain, induces a 7-8% speed penalty. I vote for macros in Eu. Of course, they have to be carefully managed: the C way of doing it has syntactical problems which have been talked about at length. The mixin() construct from D would probably be a better idea. CChris
21. Re: which is faster?
- Posted by Fernando Bauer <fmbauer at h?tm?il.com> Nov 10, 2007
- 642 views
Hi, I think the reason of relative low speed of variant 0 is because, commonly, the computed mean is not an Euphoria integer. Then, the subtraction needs to use floating point which is slower. Also, in squaring the deviations, floating point operations again occur. But if the sequence (data) contains mainly Euphoria integers (as in the showed case), it's possible to round the mean and obtain more rapidly the integer deviations and their squares. In the end, the effect of using the rounded mean is compensated. Below, variants 6, 7, 8 and 9 use this trick and are compared with the other implementations. The magnitude of the mean (offset) can influence the speed of the variants (1,2,3,4,5) that don't subtract the mean. Finally, in the case of non-integers, the simulations show that it's better not use the trick.
function mean(sequence s) -- adds verything up in s, returns average atom result result=s[1] for i=2 to length(s) do result+=s[i] end for return result/length(s) end function global function st_dev0(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) s -= m for i = 1 to length(s) do sum += power(s[i], 2) end for sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev1(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) for i = 1 to length(s) do sum += power(s[i], 2) end for sum -= m*m*l sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev2(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) for i = 1 to length(s) do sd = s[i] sum += sd*sd end for sum -= m*m*l sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev3(sequence s, atom X) atom m, sum, sd , m2 integer l sum = 0 l = length(s) m = mean(s) m2 = m*m for i = 1 to length(s) do sd = s[i] sum += sd*sd end for sum -= m2 sd = sqrt((sum/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev4(sequence s, atom X) atom m, sum, sum_sq, sd , m2 integer l sum_sq = 0.0 sum = 0.0 l = length(s) for i = 1 to length(s) do sd = s[i] sum += sd sum_sq += sd*sd end for m = sum/l m2 = m*m sum_sq -= m2 sd = sqrt((sum_sq/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev5(sequence s, atom X) atom m, sum, sum_sq, sd , m2 integer l object x sum_sq = 0.0 sum = 0.0 l = length(s) for i = 1 to length(s) do x = s[i] sum += x sum_sq += x*x end for m = sum/l m2 = m*m sum_sq -= m2 sd = sqrt((sum_sq/(l-1)) - m2) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev6(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) s -= floor(m) for i = 1 to length(s) do sum += power(s[i], 2) end for sum -= l*power(m-floor(m),2) sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev7(sequence s, atom X) atom m, sum, sd, m1 integer l sum = 0 l = length(s) m = mean(s) m1 = floor(m) for i = 1 to length(s) do sum += power(s[i]-m1, 2) end for sum -= l*power(m-m1,2) sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev8(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) s -= floor(m) s *= s for i = 1 to length(s) do sum += s[i] end for sum -= l*power(m-floor(m),2) sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function global function st_dev9(sequence s, atom X) atom m, sum, sd integer l sum = 0 l = length(s) m = mean(s) s = power(s-floor(m),2) for i = 1 to length(s) do sum += s[i] end for sum -= l*power(m-floor(m),2) sd = sqrt(sum/(l-1)) if X = 0 then return sd else -- return the z-score return (X - m)/sd end if end function atom t, t0 procedure show(integer i) printf(1,"Variant form #%d took %5.2fs. (%5.1f%%)\n", {i, t, 100*t/t0}) end procedure procedure test(sequence title, sequence s) atom junk printf(1,"%s\n", {title}) t=time() for i=1 to length(s) do junk=st_dev0(s[i],0.0) end for t = time()-t t0 = t show(0) t=time() for i=1 to length(s) do junk=st_dev1(s[i],0.0) end for t = time()-t show(1) t=time() for i=1 to length(s) do junk=st_dev2(s[i],0.0) end for t = time()-t show(2) t=time() for i=1 to length(s) do junk=st_dev3(s[i],0.0) end for t = time()-t show(3) t=time() for i=1 to length(s) do junk=st_dev4(s[i],0.0) end for t = time()-t show(4) t=time() for i=1 to length(s) do junk=st_dev5(s[i],0.0) end for t = time()-t show(5) t=time() for i=1 to length(s) do junk=st_dev6(s[i],0.0) end for t = time()-t show(6) t=time() for i=1 to length(s) do junk=st_dev7(s[i],0.0) end for t = time()-t show(7) t=time() for i=1 to length(s) do junk=st_dev8(s[i],0.0) end for t = time()-t show(8) t=time() for i=1 to length(s) do junk=st_dev9(s[i],0.0) end for t = time()-t show(9) puts(1,'\n') end procedure sequence s,s1 -- data constant num_runs=5000, len_data=6000 s=repeat(0,num_runs) s1=repeat(0,len_data) -- generate random data for k=1 to num_runs do for i=1 to len_data do s1[i]=rand(12345+k) end for s[k]=s1 end for puts(1,"Data initialised, starting benchmark...\n") test("Original random data",s) test("Data 50%=0 50%=1",repeat((repeat(0,len_data/2)&repeat(1,len_data/2)), num_runs)) test("Data 50%=1000 50%=1001",repeat((repeat(1000,len_data/2)&repeat(1001,len_data/2)), num_runs)) test("Data 50%=10000 50%=10001",repeat((repeat(10000,len_data/2)&repeat(10001,len_data/2)), num_runs)) test("Data 50%=0.1 50%=1.1",repeat((repeat(0.1,len_data/2)&repeat(1.1,len_data/2)), num_runs)) ?machine_func(26,0)
My results: Data initialised, starting benchmark... Original random data Variant form #0 took 4.47s. (100.0%) Variant form #1 took 2.24s. ( 50.1%) Variant form #2 took 2.84s. ( 63.5%) Variant form #3 took 2.86s. ( 64.0%) Variant form #4 took 2.84s. ( 63.5%) Variant form #5 took 2.41s. ( 53.9%) Variant form #6 took 2.51s. ( 56.2%) Variant form #7 took 2.35s. ( 52.6%) Variant form #8 took 2.83s. ( 63.3%) Variant form #9 took 2.84s. ( 63.5%) Data 50%=0 50%=1 Variant form #0 took 4.47s. (100.0%) Variant form #1 took 1.48s. ( 33.1%) Variant form #2 took 2.10s. ( 47.0%) Variant form #3 took 2.09s. ( 46.8%) Variant form #4 took 2.00s. ( 44.7%) Variant form #5 took 1.64s. ( 36.7%) Variant form #6 took 1.59s. ( 35.6%) Variant form #7 took 1.49s. ( 33.3%) Variant form #8 took 1.98s. ( 44.3%) Variant form #9 took 2.00s. ( 44.7%) Data 50%=1000 50%=1001 Variant form #0 took 4.46s. (100.0%) Variant form #1 took 2.06s. ( 46.2%) Variant form #2 took 2.69s. ( 60.3%) Variant form #3 took 2.70s. ( 60.5%) Variant form #4 took 2.64s. ( 59.2%) Variant form #5 took 2.27s. ( 50.9%) Variant form #6 took 1.59s. ( 35.7%) Variant form #7 took 1.50s. ( 33.6%) Variant form #8 took 1.97s. ( 44.2%) Variant form #9 took 1.98s. ( 44.4%) Data 50%=10000 50%=10001 Variant form #0 took 4.49s. (100.0%) Variant form #1 took 2.18s. ( 48.6%) Variant form #2 took 2.83s. ( 63.0%) Variant form #3 took 2.82s. ( 62.8%) Variant form #4 took 2.79s. ( 62.1%) Variant form #5 took 2.46s. ( 54.8%) Variant form #6 took 1.62s. ( 36.1%) Variant form #7 took 1.52s. ( 33.9%) Variant form #8 took 1.98s. ( 44.1%) Variant form #9 took 2.00s. ( 44.5%) Data 50%=0.1 50%=1.1 Variant form #0 took 5.34s. (100.0%) Variant form #1 took 4.30s. ( 80.5%) Variant form #2 took 4.66s. ( 87.3%) Variant form #3 took 4.68s. ( 87.6%) Variant form #4 took 4.52s. ( 84.6%) Variant form #5 took 4.16s. ( 77.9%) Variant form #6 took 5.31s. ( 99.4%) Variant form #7 took 5.45s. (102.1%) Variant form #8 took 5.14s. ( 96.3%) Variant form #9 took 5.32s. ( 99.6%) Regards, Fernando