1. Programming Shootout: reverse-complement

We seem pretty close to filling up all the benchmarks in the Programming 
Shootout. I've just come back to Euphoria after trying it a few years ago,
liking it, but ultimately leaving it because it wasn't widely used or 
supported. Now, I'd like to try to change that; I still like the language.
So to help refresh myself I did this one, which is based on the fasta
benchmark.

There are probably some ways to further optimize it, but I think I've done
an OK job there. Not as good on code style though. The "anticipate certain
subsets of data" optimization might disqualify it too, but that wouldn't be
hard to rip out.

I've only tested it in Windows but it will probably work fine on Linux.
(I'll find out soon since I've got a new distro to try)

--reverse complements of DNA sequences (for the Great Programming Shootout)
--By James W. Hofmann 2007
--This code is public domain code. You may reuse, redistribute or modify it
--however you wish.

sequence argc
sequence in,dseq,headers,lseq
integer char
integer file
integer head -- the header we're on
integer charcount, newlinecount
integer ALLOCATE -- guessed length of input sequence
constant MAXLINE = 60 -- specified line length
ALLOCATE = 10000 -- guess how many chars needed; adjustable for different N-size
in = {repeat(0,ALLOCATE),repeat(0,ALLOCATE),repeat(0,ALLOCATE)}
headers = {{},{},{}}
dseq = {{},{},{}}

-- optimization for each of the three sequence types

dseq[1] = {"ACGT",
        "TGCA"} -- uppercase only
dseq[2] = {"acgtMRWSYKVHDBN",
        "TGCAKYWSRMBDHVN"} -- lowercase acgt, uppercase ambiguous codes 
dseq[3] = {"acgt",
        "TGCA"} -- lowercase only

lseq = {1,1,1} -- real length of each sequence
head = 0
charcount = 0 -- overall position in a sequence's output
newlinecount = 0 -- we have to recreate newlines when reversing
argc = command_line()

file = open(argc[3],"rb")

function getComp(integer code, integer headerstyle)
		for m=1 to length(dseq[headerstyle][1]) do
			if code=dseq[headerstyle][1][m] then
				code=dseq[headerstyle][2][m]
				return code
			end if
		end for	
		return code -- don't change unrecognized stuff
end function

while 1 do
    char = getc(file)    
    if char=-1 then
        exit   -- EOF
	elsif char='>' then
		head+=1
		while char!='\n' do -- write header
			if char>13 then -- take only printable stuff here
				headers[head]&=char				
			end if
			char = getc(file)
		end while
		headers[head]&=char -- add LF (but not CR)
	else
		if char>13 then -- a real character
		    lseq[head]+=1
		    in[head][lseq[head]]=getComp(char,head) -- convert chars one at a time
		    -- we will add newlines when we write
		end if
			
		if lseq[head]>=ALLOCATE then -- anticipate more space being needed
			in[1] = in[1] & repeat(0,ALLOCATE)
			in[2] = in[2] & repeat(0,ALLOCATE)
			in[3] = in[3] & repeat(0,ALLOCATE)
			ALLOCATE *= 2 -- this expands the expected allocation with larger requests
		end if

	end if
end while

close(file)

file = 1

for seqnum=1 to 3 do
	
	puts(file, headers[seqnum])

	newlinecount = lseq[seqnum]-MAXLINE+1
	for n = lseq[seqnum] to 1 by -1 do
		puts(file, in[seqnum][n])
		if n<=newlinecount and n>2 then -- n>2 lets the end of the sequence 
										-- terminate naturally
			puts(file,'\n')
			newlinecount-=MAXLINE
		end if
	end for

	puts(file,'\n')

end for

close(file)


new topic     » topic index » view message » categorize

2. Re: Programming Shootout: reverse-complement

James W Hofmann wrote:
> 
> We seem pretty close to filling up all the benchmarks in the Programming 
> Shootout. 

I don't know how close you are as the last entry in the archive is from 
Jason and dated 02/06. Anyway I highly support the idea and to show my 
appreciation let me provide you the following masterpiece. smile))

-- partial-sums benchmark (for the Great Programming Shootout)
-- By András Szabó 2007
-- This code is public domain code. You may reuse, redistribute or modify it
-- however you wish.
--
-- about the partial-sums benchmark
-- diff program output N = 25000 with this output file to check your program is
correct before contributing.
-- (Programs may use a single-loop or several-loops; programs may cache
recomputed values in local variables)
-- Each program should use the same naive iterative double-precision algorithms
to calculate partial-sums of the series.

without trace
without warning
without profile

constant N=25000
sequence res
res={0,0,0,0,0,0,0,0,0}

for k=1 to N by 1 do
	res[1]+=power((2/3),k-1)
	res[2]+=power(k,-0.5)
	res[3]+=1/(k*(k+1))
	res[4]+=1/(power(k,3)*power(sin(k),2))
	res[5]+=1/(power(k,3)*power(cos(k),2))
 	res[6]+=1/k
 	res[7]+=1/power(k,2)
 	res[8]+=power(-1,k+1)/k
 	res[9]+=power(-1,k+1)/(2*k-1)
end for

integer fn
fn=open("partialsums-output.txt","w")
printf(fn,"%.9f\n%.9f\n%.9f\n%.9f\n%.9f\n%.9f\n%.9f\n%.9f\n%.9f",res)
close(fn)


Regards, 

Salix

new topic     » goto parent     » topic index » view message » categorize

3. Re: Programming Shootout: reverse-complement

Salix wrote:
> 
> James W Hofmann wrote:
> > 
> > We seem pretty close to filling up all the benchmarks in the Programming 
> > Shootout. 
> 
> I don't know how close you are as the last entry in the archive is from 
> Jason and dated 02/06. Anyway I highly support the idea and to show my 
> appreciation let me provide you the following masterpiece. smile))
> 

Here's a quick optimization of that (took out subscripting, and cached
a couple of recomputed values).  I get about a 30-40% improvement:

atom res1, res2, res3, res4, res5, res6, res7, res8, res9
res1 = 0.0
res2 = 0.0
res3 = 0.0
res4 = 0.0
res5 = 0.0
res6 = 0.0
res7 = 0.0
res8 = 0.0
res9 = 0.0
atom temp
for k=1 to N by 1 do
	res1+=power((2/3),k-1)
	res2+=power(k,-0.5)
	res3+=1/(k*(k+1))
	temp = 1/(power(k,3)*power(sin(k),2))
	res4+=temp
	res5+=temp
 	res6+=1/k
 	res7+=1/power(k,2)
 	temp = power(-1,k+1)
 	res8+=temp/k
 	res9+=temp/(2*k-1)
end for


new topic     » goto parent     » topic index » view message » categorize

4. Re: Programming Shootout: reverse-complement

James W Hofmann wrote:
> 
> We seem pretty close to filling up all the benchmarks in the Programming 
> Shootout. I've just come back to Euphoria after trying it a few years ago,
> liking it, but ultimately leaving it because it wasn't widely used or 
> supported. Now, I'd like to try to change that; I still like the language.
> So to help refresh myself I did this one, which is based on the fasta
> benchmark.
> 
> There are probably some ways to further optimize it, but I think I've done
> an OK job there. Not as good on code style though. The "anticipate certain
> subsets of data" optimization might disqualify it too, but that wouldn't be
> hard to rip out.
> 
> I've only tested it in Windows but it will probably work fine on Linux.
> (I'll find out soon since I've got a new distro to try)
<snipped code>

Cool! I haven't looked at the shootout in awhile but I was realizing that now
that EU is open source we can upload our work to the shootout site. Plus didn't
someone do a Debian package for EU which was also one of the upload requirements?
If we get uploaded that will be some good advertising for us.

I haven't looked at the shootout website in awhile. I know I need to weed out
the obsolete tests and add the newer ones.

I want to work on this some more but I'm not sure when.

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

5. Re: Programming Shootout: reverse-complement

Okay, doing a quick survey right now here is what is done compared to the
current test set:

Test.....................Euphoria Version
ackermann (obsolete).....Yes
chameneos................
cheap-concurrency........
count-words (obsolete)...Yes
fannkuch.................Yes
fasta....................Yes
harmonic (obsolete)......Yes
k-nucleotide.............
mandelbrot...............Yes
meteor-contest (new).....
n-body...................Yes
nsieve...................Yes
nsieve-bits..............
partial-sums.............(right above but not included yet)
pidigits.................
recursive................
regex-dna................
reverse-complement.......(right above but not included yet)
spectral-norm............
random...................(Yes--not included in testing but needed for a test)
startup..................Yes
sum-file.................Yes
takfp (obsolete).........Yes

I'll try to test and add the two programs above by James W. Hoffman, Salix, and
Matt Lewis. I should probably go through all of the programs and remove any
timing routines from them.

I never got around to benchmarking EU 2.4 vs. 2.5 vs 3.0 vs. OOEU. Eventually!

We should have enough done to submit to the Shootout site.
 
--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Programming Shootout: reverse-complement

I just noticed the meteor-contest is just that--a contest. No programs have
entered yet.

I guess it would be cool if we were one of the first...

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

7. Re: Programming Shootout: reverse-complement

Matt Lewis wrote:
> 
> Salix wrote:
> > 
> > James W Hofmann wrote:
> > > 
> > > We seem pretty close to filling up all the benchmarks in the Programming 
> > > Shootout. 
> > 
> > I don't know how close you are as the last entry in the archive is from 
> > Jason and dated 02/06. Anyway I highly support the idea and to show my 
> > appreciation let me provide you the following masterpiece. smile))
> > 
> 
> Here's a quick optimization of that (took out subscripting, and cached
> a couple of recomputed values).  I get about a 30-40% improvement:
> 
> }}}
<eucode>
> atom res1, res2, res3, res4, res5, res6, res7, res8, res9
> res1 = 0.0
> res2 = 0.0
> res3 = 0.0
> res4 = 0.0
> res5 = 0.0
> res6 = 0.0
> res7 = 0.0
> res8 = 0.0
> res9 = 0.0
> atom temp
> for k=1 to N by 1 do
> 	res1+=power((2/3),k-1)
> 	res2+=power(k,-0.5)
> 	res3+=1/(k*(k+1))
> 	temp = 1/(power(k,3)*power(sin(k),2))
> 	res4+=temp
> 	res5+=temp
>  	res6+=1/k
>  	res7+=1/power(k,2)
>  	temp = power(-1,k+1)
>  	res8+=temp/k
>  	res9+=temp/(2*k-1)
> end for
> </eucode>
{{{


Dilemma on this...

Should the program be the highest performance (as above) or should it use
sequences like James' did to show "Euphoria's uniqueness"? What's your opinions?

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Programming Shootout: reverse-complement

Jason Gade wrote:
> 
> Dilemma on this...
> 
> Should the program be the highest performance (as above) or should it use
> sequences
> like James' did to show "Euphoria's uniqueness"? What's your opinions?
> 

Performance comes first. I would even change the name of the variables from
"temp" to "t" and "res1" to "r1".

Rgds, 

Salix

new topic     » goto parent     » topic index » view message » categorize

9. Re: Programming Shootout: reverse-complement

Salix wrote:
> 
> Jason Gade wrote:
> > 
> > Dilemma on this...
> > 
> > Should the program be the highest performance (as above) or should it use
> > sequences
> > like James' did to show "Euphoria's uniqueness"? What's your opinions?
> > 
> 
> Performance comes first. I would even change the name of the variables from
> "temp" to "t" and "res1" to "r1".
> 
> Rgds, 
> 
> Salix

Okay, but the variable names should have almost zero effect on the speed and the
clarity in this case *is* more important.

Well, not that "res1" is any more clear than "r1" but...

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu