1. Programming Shootout: reverse-complement
- Posted by James W Hofmann <jhofmann at ucsc.edu> Jan 03, 2007
- 690 views
We seem pretty close to filling up all the benchmarks in the Programming Shootout. I've just come back to Euphoria after trying it a few years ago, liking it, but ultimately leaving it because it wasn't widely used or supported. Now, I'd like to try to change that; I still like the language. So to help refresh myself I did this one, which is based on the fasta benchmark. There are probably some ways to further optimize it, but I think I've done an OK job there. Not as good on code style though. The "anticipate certain subsets of data" optimization might disqualify it too, but that wouldn't be hard to rip out. I've only tested it in Windows but it will probably work fine on Linux. (I'll find out soon since I've got a new distro to try)
--reverse complements of DNA sequences (for the Great Programming Shootout) --By James W. Hofmann 2007 --This code is public domain code. You may reuse, redistribute or modify it --however you wish. sequence argc sequence in,dseq,headers,lseq integer char integer file integer head -- the header we're on integer charcount, newlinecount integer ALLOCATE -- guessed length of input sequence constant MAXLINE = 60 -- specified line length ALLOCATE = 10000 -- guess how many chars needed; adjustable for different N-size in = {repeat(0,ALLOCATE),repeat(0,ALLOCATE),repeat(0,ALLOCATE)} headers = {{},{},{}} dseq = {{},{},{}} -- optimization for each of the three sequence types dseq[1] = {"ACGT", "TGCA"} -- uppercase only dseq[2] = {"acgtMRWSYKVHDBN", "TGCAKYWSRMBDHVN"} -- lowercase acgt, uppercase ambiguous codes dseq[3] = {"acgt", "TGCA"} -- lowercase only lseq = {1,1,1} -- real length of each sequence head = 0 charcount = 0 -- overall position in a sequence's output newlinecount = 0 -- we have to recreate newlines when reversing argc = command_line() file = open(argc[3],"rb") function getComp(integer code, integer headerstyle) for m=1 to length(dseq[headerstyle][1]) do if code=dseq[headerstyle][1][m] then code=dseq[headerstyle][2][m] return code end if end for return code -- don't change unrecognized stuff end function while 1 do char = getc(file) if char=-1 then exit -- EOF elsif char='>' then head+=1 while char!='\n' do -- write header if char>13 then -- take only printable stuff here headers[head]&=char end if char = getc(file) end while headers[head]&=char -- add LF (but not CR) else if char>13 then -- a real character lseq[head]+=1 in[head][lseq[head]]=getComp(char,head) -- convert chars one at a time -- we will add newlines when we write end if if lseq[head]>=ALLOCATE then -- anticipate more space being needed in[1] = in[1] & repeat(0,ALLOCATE) in[2] = in[2] & repeat(0,ALLOCATE) in[3] = in[3] & repeat(0,ALLOCATE) ALLOCATE *= 2 -- this expands the expected allocation with larger requests end if end if end while close(file) file = 1 for seqnum=1 to 3 do puts(file, headers[seqnum]) newlinecount = lseq[seqnum]-MAXLINE+1 for n = lseq[seqnum] to 1 by -1 do puts(file, in[seqnum][n]) if n<=newlinecount and n>2 then -- n>2 lets the end of the sequence -- terminate naturally puts(file,'\n') newlinecount-=MAXLINE end if end for puts(file,'\n') end for close(file)
2. Re: Programming Shootout: reverse-complement
- Posted by Salix <salix at freemail.hu> Jan 03, 2007
- 667 views
James W Hofmann wrote: > > We seem pretty close to filling up all the benchmarks in the Programming > Shootout. I don't know how close you are as the last entry in the archive is from Jason and dated 02/06. Anyway I highly support the idea and to show my appreciation let me provide you the following masterpiece. ))
-- partial-sums benchmark (for the Great Programming Shootout) -- By András Szabó 2007 -- This code is public domain code. You may reuse, redistribute or modify it -- however you wish. -- -- about the partial-sums benchmark -- diff program output N = 25000 with this output file to check your program is correct before contributing. -- (Programs may use a single-loop or several-loops; programs may cache recomputed values in local variables) -- Each program should use the same naive iterative double-precision algorithms to calculate partial-sums of the series. without trace without warning without profile constant N=25000 sequence res res={0,0,0,0,0,0,0,0,0} for k=1 to N by 1 do res[1]+=power((2/3),k-1) res[2]+=power(k,-0.5) res[3]+=1/(k*(k+1)) res[4]+=1/(power(k,3)*power(sin(k),2)) res[5]+=1/(power(k,3)*power(cos(k),2)) res[6]+=1/k res[7]+=1/power(k,2) res[8]+=power(-1,k+1)/k res[9]+=power(-1,k+1)/(2*k-1) end for integer fn fn=open("partialsums-output.txt","w") printf(fn,"%.9f\n%.9f\n%.9f\n%.9f\n%.9f\n%.9f\n%.9f\n%.9f\n%.9f",res) close(fn)
Regards, Salix
3. Re: Programming Shootout: reverse-complement
- Posted by Matt Lewis <matthewwalkerlewis at gmail.com> Jan 03, 2007
- 694 views
Salix wrote: > > James W Hofmann wrote: > > > > We seem pretty close to filling up all the benchmarks in the Programming > > Shootout. > > I don't know how close you are as the last entry in the archive is from > Jason and dated 02/06. Anyway I highly support the idea and to show my > appreciation let me provide you the following masterpiece. )) > Here's a quick optimization of that (took out subscripting, and cached a couple of recomputed values). I get about a 30-40% improvement:
atom res1, res2, res3, res4, res5, res6, res7, res8, res9 res1 = 0.0 res2 = 0.0 res3 = 0.0 res4 = 0.0 res5 = 0.0 res6 = 0.0 res7 = 0.0 res8 = 0.0 res9 = 0.0 atom temp for k=1 to N by 1 do res1+=power((2/3),k-1) res2+=power(k,-0.5) res3+=1/(k*(k+1)) temp = 1/(power(k,3)*power(sin(k),2)) res4+=temp res5+=temp res6+=1/k res7+=1/power(k,2) temp = power(-1,k+1) res8+=temp/k res9+=temp/(2*k-1) end for
4. Re: Programming Shootout: reverse-complement
- Posted by Jason Gade <jaygade at yahoo.com> Jan 03, 2007
- 678 views
James W Hofmann wrote: > > We seem pretty close to filling up all the benchmarks in the Programming > Shootout. I've just come back to Euphoria after trying it a few years ago, > liking it, but ultimately leaving it because it wasn't widely used or > supported. Now, I'd like to try to change that; I still like the language. > So to help refresh myself I did this one, which is based on the fasta > benchmark. > > There are probably some ways to further optimize it, but I think I've done > an OK job there. Not as good on code style though. The "anticipate certain > subsets of data" optimization might disqualify it too, but that wouldn't be > hard to rip out. > > I've only tested it in Windows but it will probably work fine on Linux. > (I'll find out soon since I've got a new distro to try) <snipped code> Cool! I haven't looked at the shootout in awhile but I was realizing that now that EU is open source we can upload our work to the shootout site. Plus didn't someone do a Debian package for EU which was also one of the upload requirements? If we get uploaded that will be some good advertising for us. I haven't looked at the shootout website in awhile. I know I need to weed out the obsolete tests and add the newer ones. I want to work on this some more but I'm not sure when. -- "Any programming problem can be solved by adding a level of indirection." --anonymous "Any performance problem can be solved by removing a level of indirection." --M. Haertel "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
5. Re: Programming Shootout: reverse-complement
- Posted by Jason Gade <jaygade at yahoo.com> Jan 03, 2007
- 685 views
Okay, doing a quick survey right now here is what is done compared to the current test set: Test.....................Euphoria Version ackermann (obsolete).....Yes chameneos................ cheap-concurrency........ count-words (obsolete)...Yes fannkuch.................Yes fasta....................Yes harmonic (obsolete)......Yes k-nucleotide............. mandelbrot...............Yes meteor-contest (new)..... n-body...................Yes nsieve...................Yes nsieve-bits.............. partial-sums.............(right above but not included yet) pidigits................. recursive................ regex-dna................ reverse-complement.......(right above but not included yet) spectral-norm............ random...................(Yes--not included in testing but needed for a test) startup..................Yes sum-file.................Yes takfp (obsolete).........Yes I'll try to test and add the two programs above by James W. Hoffman, Salix, and Matt Lewis. I should probably go through all of the programs and remove any timing routines from them. I never got around to benchmarking EU 2.4 vs. 2.5 vs 3.0 vs. OOEU. Eventually! We should have enough done to submit to the Shootout site. -- "Any programming problem can be solved by adding a level of indirection." --anonymous "Any performance problem can be solved by removing a level of indirection." --M. Haertel "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
6. Re: Programming Shootout: reverse-complement
- Posted by Jason Gade <jaygade at yahoo.com> Jan 03, 2007
- 653 views
I just noticed the meteor-contest is just that--a contest. No programs have entered yet. I guess it would be cool if we were one of the first... -- "Any programming problem can be solved by adding a level of indirection." --anonymous "Any performance problem can be solved by removing a level of indirection." --M. Haertel "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
7. Re: Programming Shootout: reverse-complement
- Posted by Jason Gade <jaygade at yahoo.com> Jan 04, 2007
- 661 views
Matt Lewis wrote: > > Salix wrote: > > > > James W Hofmann wrote: > > > > > > We seem pretty close to filling up all the benchmarks in the Programming > > > Shootout. > > > > I don't know how close you are as the last entry in the archive is from > > Jason and dated 02/06. Anyway I highly support the idea and to show my > > appreciation let me provide you the following masterpiece. )) > > > > Here's a quick optimization of that (took out subscripting, and cached > a couple of recomputed values). I get about a 30-40% improvement: > > }}} <eucode> > atom res1, res2, res3, res4, res5, res6, res7, res8, res9 > res1 = 0.0 > res2 = 0.0 > res3 = 0.0 > res4 = 0.0 > res5 = 0.0 > res6 = 0.0 > res7 = 0.0 > res8 = 0.0 > res9 = 0.0 > atom temp > for k=1 to N by 1 do > res1+=power((2/3),k-1) > res2+=power(k,-0.5) > res3+=1/(k*(k+1)) > temp = 1/(power(k,3)*power(sin(k),2)) > res4+=temp > res5+=temp > res6+=1/k > res7+=1/power(k,2) > temp = power(-1,k+1) > res8+=temp/k > res9+=temp/(2*k-1) > end for > </eucode> {{{ Dilemma on this... Should the program be the highest performance (as above) or should it use sequences like James' did to show "Euphoria's uniqueness"? What's your opinions? -- "Any programming problem can be solved by adding a level of indirection." --anonymous "Any performance problem can be solved by removing a level of indirection." --M. Haertel "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
8. Re: Programming Shootout: reverse-complement
- Posted by Salix <salix at freemail.hu> Jan 04, 2007
- 664 views
- Last edited Jan 05, 2007
Jason Gade wrote: > > Dilemma on this... > > Should the program be the highest performance (as above) or should it use > sequences > like James' did to show "Euphoria's uniqueness"? What's your opinions? > Performance comes first. I would even change the name of the variables from "temp" to "t" and "res1" to "r1". Rgds, Salix
9. Re: Programming Shootout: reverse-complement
- Posted by Jason Gade <jaygade at yahoo.com> Jan 04, 2007
- 672 views
- Last edited Jan 05, 2007
Salix wrote: > > Jason Gade wrote: > > > > Dilemma on this... > > > > Should the program be the highest performance (as above) or should it use > > sequences > > like James' did to show "Euphoria's uniqueness"? What's your opinions? > > > > Performance comes first. I would even change the name of the variables from > "temp" to "t" and "res1" to "r1". > > Rgds, > > Salix Okay, but the variable names should have almost zero effect on the speed and the clarity in this case *is* more important. Well, not that "res1" is any more clear than "r1" but... -- "Any programming problem can be solved by adding a level of indirection." --anonymous "Any performance problem can be solved by removing a level of indirection." --M. Haertel "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.