1. What am I doing wrong...
- Posted by PQ <quistnet at HOTMAIL.COM> Jan 06, 2000
- 455 views
------=_NextPart_000_0009_01BF5841.D7CB0BB0 charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hy, Something EU this time, while I was studying some ASM things like boot = sectors etc. I remembered some thing I created in EU using ASM, but it = isn't very good: (I'm not very good in ASM, I'm trying to understand it) (I used ASM.E by Pete Eberlein) (If someones version of ASM.E produces an error because of the entered = values for resolve_param(), remove the ", fASM" 's) -- Begin: MATH.E -- include asm.e atom f1,f2,f3 f1=3Dallocate(8) f2=3Dallocate(8) f3=3Dallocate(8) global function ADD(atom A, atom B) object fASM,ANS fASM=3Dget_asm( "pusha "& "fld qword ptr [f1] "& "fld qword ptr [f2] "& "fadd st(1) "& "fst qword ptr [f3] "& "popa "& "ret") poke(f1, atom_to_float64(A)) poke(f2, atom_to_float64(B)) resolve_param("f1", fASM, f1) resolve_param("f2", fASM, f2) resolve_param("f3", fASM, f3) call(fASM) ANS=3Dfloat64_to_atom(peek({f3,8})) free(fASM) return ANS end function global function SUB(atom A, atom B) object fASM,ANS fASM=3Dget_asm( "pusha "& "fld qword ptr [f1] "& "fld qword ptr [f2] "& "fsub st(1) "& "fst qword ptr [f3] "& "popa "& "ret") poke(f1, atom_to_float64(A)) poke(f2, atom_to_float64(B)) resolve_param("f1", fASM, f1) resolve_param("f2", fASM, f2) resolve_param("f3", fASM, f3) call(fASM) ANS=3Dfloat64_to_atom(peek({f3,8})) free(fASM) return ANS end function global function SQRT(atom A) object fASM,ANS fASM=3Dget_asm( "pusha "& "fld qword ptr [f1] "& "fsqrt "& "fst qword ptr [f3] "& "popa "& "ret") poke(f1, atom_to_float64(A)) resolve_param("f1", fASM, f1) resolve_param("f3", fASM, f3) call(fASM) ANS=3Dfloat64_to_atom(peek({f3,8})) free(fASM) return ANS end function global function MULT(atom A, atom B) object fASM,ANS fASM=3Dget_asm( "pusha "& "fld qword ptr [f1] "& "fld qword ptr [f2] "& "fmul st(1) "& "fst qword ptr [f3] "& "popa "& "ret") poke(f1, atom_to_float64(A)) poke(f2, atom_to_float64(B)) resolve_param("f1", fASM, f1) resolve_param("f2", fASM, f2) resolve_param("f3", fASM, f3) call(fASM) ANS=3Dfloat64_to_atom(peek({f3,8})) free(fASM) return ANS end function -- END: MATH.E -- -- Begin: TEST.EX -- include math.e atom tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8 atom t1,t2 t1=3Dtime() for a=3D1 to 10000 do tmp1=3D5+7 tmp2=3D7-5 tmp3=3Dsqrt(5) tmp4=3D5*7 end for t1=3Dtime()-t1 t2=3Dtime() for a=3D1 to 1 do tmp5=3DADD(5,7) -- << tmp6=3DSUB(5,7) -- << Remove these lines and everyrhing will "look" = alright. tmp7=3DSQRT(5) -- << tmp5=3DADD(5,7) tmp6=3DSUB(5,7) tmp7=3DSQRT(5) tmp8=3DMULT(5,7) end for t2=3Dtime()-t2 puts(1,"Normal :\n") printf(1,"%f s\n",{t1}) printf(1,"7+5=3D\t%f\n",{tmp1}) printf(1,"7-5=3D\t%f\n",{tmp2}) printf(1,"SQRT 5=3D\t%f\n",{tmp3}) printf(1,"7*5=3D\t%f\n",{tmp4}) puts(1,"\n") puts(1,"ASM :\n") printf(1,"%f s\n",{t2}) printf(1,"7+5=3D\t%f\n",{tmp5}) printf(1,"7-5=3D\t%f\n",{tmp6}) printf(1,"SQRT 5=3D\t%f\n",{tmp7}) printf(1,"7*5=3D\t%f\n",{tmp8}) --END: TEST.EX -- As you can see, if I use these functions too often, they don't return = the right values. What am I doing wrong, or what is wrong. Also some suggestions for comments are also welcome, I'm not very good = at that either. Thanks, PQ QC ------=_NextPart_000_0009_01BF5841.D7CB0BB0 charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META content=3D"text/html; charset=3Diso-8859-1" = http-equiv=3DContent-Type> <META content=3D"MSHTML 5.00.2314.1000" name=3DGENERATOR> <STYLE></STYLE> </HEAD> <BODY bgColor=3D#ffffff> <DIV><FONT face=3DArial size=3D2>Hy,</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>Something EU this time, while I was = studying some=20 ASM things like boot sectors etc. I remembered some thing I created in = EU using=20 ASM, but it isn't very good:</FONT></DIV> <DIV><FONT face=3DArial size=3D2>(I'm not very good in ASM, I'm trying = to understand=20 it)</FONT></DIV> <DIV><FONT face=3DArial size=3D2>(I used ASM.E by Pete = Eberlein)</FONT></DIV> <DIV><FONT face=3DArial size=3D2>(If someones version of ASM.E produces = an error=20 because of the entered values for resolve_param(), remove the ", fASM"=20 's)</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>-- Begin: MATH.E --</FONT></DIV> <DIV><FONT face=3DArial size=3D2>include asm.e</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>atom f1,f2,f3<BR>f1=3Dallocate(8) = f2=3Dallocate(8)=20 f3=3Dallocate(8)</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>global function ADD(atom A, atom = B)<BR>object=20 fASM,ANS<BR>fASM=3Dget_asm(<BR> "pusha "&<BR> "fld qword = ptr [f1]=20 "&<BR> "fld qword ptr [f2] "&<BR> "fadd st(1)=20 "&<BR> "fst qword ptr [f3] "&<BR> "popa=20 "&<BR> "ret")<BR> poke(f1, = atom_to_float64(A))<BR> poke(f2,=20 atom_to_float64(B))<BR> resolve_param("f1", fASM,=20 f1)<BR> resolve_param("f2", fASM, f2)<BR> resolve_param("f3", = fASM,=20 nbsp;free(fASM)<BR> return=20 ANS<BR>end function</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>global function SUB(atom A, atom = B)<BR>object=20 fASM,ANS<BR>fASM=3Dget_asm(<BR> "pusha "&<BR> "fld qword = ptr [f1]=20 "&<BR> "fld qword ptr [f2] "&<BR> "fsub st(1)=20 "&<BR> "fst qword ptr [f3] "&<BR> "popa=20 "&<BR> "ret")<BR> poke(f1, = atom_to_float64(A))<BR> poke(f2,=20 atom_to_float64(B))<BR> resolve_param("f1", fASM,=20 f1)<BR> resolve_param("f2", fASM, f2)<BR> resolve_param("f3", = fASM,=20 nbsp;free(fASM)<BR> return=20 ANS<BR>end function</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>global function SQRT(atom A)<BR>object=20 fASM,ANS<BR>fASM=3Dget_asm(<BR> "pusha "&<BR> "fld qword = ptr [f1]=20 "&<BR> "fsqrt "&<BR> "fst qword ptr [f3] = "&<BR> "popa=20 "&<BR> "ret")<BR> poke(f1,=20 atom_to_float64(A))<BR> resolve_param("f1", fASM,=20 f1)<BR> resolve_param("f3", fASM,=20 nbsp;free(fASM)<BR> return=20 ANS<BR>end function</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>global function MULT(atom A, atom = B)<BR>object=20 fASM,ANS<BR>fASM=3Dget_asm(<BR> "pusha "&<BR> "fld qword = ptr [f1]=20 "&<BR> "fld qword ptr [f2] "&<BR> "fmul st(1)=20 "&<BR> "fst qword ptr [f3] "&<BR> "popa=20 "&<BR> "ret")<BR> poke(f1, = atom_to_float64(A))<BR> poke(f2,=20 atom_to_float64(B))<BR> resolve_param("f1", fASM,=20 f1)<BR> resolve_param("f2", fASM, f2)<BR> resolve_param("f3", = fASM,=20 nbsp;free(fASM)<BR> return=20 ANS<BR>end function</FONT></DIV> <DIV><FONT face=3DArial size=3D2>-- END: MATH.E --</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>-- Begin: TEST.EX --</FONT></DIV> <DIV><FONT face=3DArial size=3D2>include math.e</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>atom=20 tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8<BR>atom t1,t2</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>t1=3Dtime()<BR>for a=3D1 to 10000=20 sp;tmp4=3D5*7<BR>end=20 for<BR>t1=3Dtime()-t1</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>t2=3Dtime()<BR>for a=3D1 to 1 = do</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2> tmp5=3DADD(5,7) --=20 <<<BR> tmp6=3DSUB(5,7) -- << Remove = these lines=20 and everyrhing will "look" = alright.<BR> tmp7=3DSQRT(5) --=20 <<</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial=20 QRT(5)<BR> tmp8=3DMULT(5,7)</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>end for<BR>t2=3Dtime()-t2</FONT></DIV> <DIV><FONT face=3DArial size=3D2><BR>puts(1,"Normal = :\n")<BR>printf(1,"%f=20 {tmp2})<BR>printf(1,"SQRT=20 <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>puts(1,"\n")</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>puts(1,"ASM :\n")<BR>printf(1,"%f=20 {tmp6})<BR>printf(1,"SQRT=20 <DIV><FONT face=3DArial size=3D2>--END: TEST.EX --</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>As you can see, if I use these = functions too=20 often, they don't return the right values.</FONT></DIV> <DIV><FONT face=3DArial size=3D2>What am I doing wrong, or what is=20 wrong.</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>Also some suggestions for comments are = also=20 welcome, I'm not very good at that either.</FONT></DIV> <DIV> </DIV> <DIV><FONT face=3DArial size=3D2>Thanks,</FONT></DIV> <DIV><FONT face=3DArial size=3D2>PQ</FONT></DIV> ------=_NextPart_000_0009_01BF5841.D7CB0BB0--
2. Re: What am I doing wrong...
- Posted by Pete Eberlein <xseal at HARBORSIDE.COM> Jan 06, 2000
- 493 views
Hi PQ, My suggested changes are posted in between the quoted code: >(I used ASM.E by Pete Eberlein) >(If someones version of ASM.E produces an error because of the entered values for resolve_param(), remove the ", fASM" 's) The newer versions of asm.e will produce this error. The address of the machine code is no longer required. >-- Begin: MATH.E -- >include asm.e > >atom f1,f2,f3 >f1=allocate(8) f2=allocate(8) f3=allocate(8) > >global function ADD(atom A, atom B) >object fASM,ANS >fASM=get_asm( > "pusha "& > "fld qword ptr [f1] "& > "fld qword ptr [f2] "& > "fadd st(1) "& > "fst qword ptr [f3] "& faddp st(1), st ; add st(0) to st(1) and pop st(0) fstp qword ptr [f3] ; store st(0) in memory, then pop st(0) > "popa "& > "ret") > poke(f1, atom_to_float64(A)) > poke(f2, atom_to_float64(B)) > resolve_param("f1", fASM, f1) > resolve_param("f2", fASM, f2) > resolve_param("f3", fASM, f3) > call(fASM) > ANS=float64_to_atom(peek({f3,8})) > free(fASM) > return ANS >end function > >global function SUB(atom A, atom B) >object fASM,ANS >fASM=get_asm( > "pusha "& > "fld qword ptr [f1] "& > "fld qword ptr [f2] "& > "fsub st(1) "& > "fst qword ptr [f3] "& fsubrp st(1), st ; subtract st(1) from st(0), ; store result in st(1) and pop st(0) fstp qword ptr [f3] ; store st(0) in memory, then pop st(0) > "popa "& > "ret") > poke(f1, atom_to_float64(A)) > poke(f2, atom_to_float64(B)) > resolve_param("f1", fASM, f1) > resolve_param("f2", fASM, f2) > resolve_param("f3", fASM, f3) > call(fASM) > ANS=float64_to_atom(peek({f3,8})) > free(fASM) > return ANS >end function > >global function SQRT(atom A) >object fASM,ANS >fASM=get_asm( > "pusha "& > "fld qword ptr [f1] "& > "fsqrt "& > "fst qword ptr [f3] "& fstp qword ptr [f3] ; store st(0) in memory, then pop st(0) > "popa "& > "ret") > poke(f1, atom_to_float64(A)) > resolve_param("f1", fASM, f1) > resolve_param("f3", fASM, f3) > call(fASM) > ANS=float64_to_atom(peek({f3,8})) > free(fASM) > return ANS >end function > >global function MULT(atom A, atom B) >object fASM,ANS >fASM=get_asm( > "pusha "& > "fld qword ptr [f1] "& > "fld qword ptr [f2] "& > "fmul st(1) "& > "fst qword ptr [f3] "& faddp st(1), st ; mult st(1) by st(0) and pop st(0) fstp qword ptr [f3] ; store st(0) in memory, then pop st(0) > "popa "& > "ret") > poke(f1, atom_to_float64(A)) > poke(f2, atom_to_float64(B)) > resolve_param("f1", fASM, f1) > resolve_param("f2", fASM, f2) > resolve_param("f3", fASM, f3) > call(fASM) > ANS=float64_to_atom(peek({f3,8})) > free(fASM) > return ANS >end function >-- END: MATH.E -- > >-- Begin: TEST.EX -- >include math.e > >atom tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8 >atom t1,t2 > >t1=time() >for a=1 to 10000 do > tmp1=5+7 > tmp2=7-5 > tmp3=sqrt(5) > tmp4=5*7 >end for >t1=time()-t1 > >t2=time() >for a=1 to 1 do > > tmp5=ADD(5,7) -- << > tmp6=SUB(5,7) -- << Remove these lines and everyrhing will "look" alright. > tmp7=SQRT(5) -- << > > tmp5=ADD(5,7) > tmp6=SUB(5,7) > tmp7=SQRT(5) > tmp8=MULT(5,7) > >end for >t2=time()-t2 > >puts(1,"Normal :\n") >printf(1,"%f s\n",{t1}) >printf(1,"7+5=\t%f\n",{tmp1}) >printf(1,"7-5=\t%f\n",{tmp2}) >printf(1,"SQRT 5=\t%f\n",{tmp3}) >printf(1,"7*5=\t%f\n",{tmp4}) > <snip test.ex> >As you can see, if I use these functions too often, they don't return the right values. >What am I doing wrong, or what is wrong. You are forgetting to pop values from the floating-point stack, leaving it a mess for later functions being called. For each fld, you should have one command that does a pop (usually ending with the letter p) If you forget to pop the values, the stack will overflow and the results of later asm operations will return NaN. >Also some suggestions for comments are also welcome, I'm not very good at that either. It would be much faster to move the get_asm and resolve_param calls outside of the function. For example, this is how I would rewrite the ADD function: constant fADD=get_asm( "pusha "& "fld qword ptr [f1] "& "fld qword ptr [f2] "& "faddp st(1),st "& "fstp qword ptr [f3] "& "popa "& "ret") resolve_param("f1", f1) resolve_param("f2", f2) resolve_param("f3", f3) global function ADD(atom A, atom B) poke(f1, atom_to_float64(A)) poke(f2, atom_to_float64(B)) call(fADD) return float64_to_atom(peek({f3,8})) end function Although, even after converting all the functions this way, the Normal operations are 50 times faster than the ASM routines. This is due to the overhead of atom_to_float64, poke, and call. I have asked Rob Craig to add define_c_proc and define_c_func for ASM routines, which I think would be a great benefit for speed. Proposed new routines: define_mach_proc(atom address, sequence arg_sizes) define_mach_func(atom address, sequence arg_sizes, atom return_type) The address defines the location in memory of the machine code, and the code would of course have to follow the calling conventions of the current architecture. >Thanks, > >PQ >QC > Pete http://www.harborside.com/home/x/xseal/euphoria/