1. What am I doing wrong...
------=_NextPart_000_0009_01BF5841.D7CB0BB0
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Hy,
Something EU this time, while I was studying some ASM things like boot =
sectors etc. I remembered some thing I created in EU using ASM, but it =
isn't very good:
(I'm not very good in ASM, I'm trying to understand it)
(I used ASM.E by Pete Eberlein)
(If someones version of ASM.E produces an error because of the entered =
values for resolve_param(), remove the ", fASM" 's)
-- Begin: MATH.E --
include asm.e
atom f1,f2,f3
f1=3Dallocate(8) f2=3Dallocate(8) f3=3Dallocate(8)
global function ADD(atom A, atom B)
object fASM,ANS
fASM=3Dget_asm(
"pusha "&
"fld qword ptr [f1] "&
"fld qword ptr [f2] "&
"fadd st(1) "&
"fst qword ptr [f3] "&
"popa "&
"ret")
poke(f1, atom_to_float64(A))
poke(f2, atom_to_float64(B))
resolve_param("f1", fASM, f1)
resolve_param("f2", fASM, f2)
resolve_param("f3", fASM, f3)
call(fASM)
ANS=3Dfloat64_to_atom(peek({f3,8}))
free(fASM)
return ANS
end function
global function SUB(atom A, atom B)
object fASM,ANS
fASM=3Dget_asm(
"pusha "&
"fld qword ptr [f1] "&
"fld qword ptr [f2] "&
"fsub st(1) "&
"fst qword ptr [f3] "&
"popa "&
"ret")
poke(f1, atom_to_float64(A))
poke(f2, atom_to_float64(B))
resolve_param("f1", fASM, f1)
resolve_param("f2", fASM, f2)
resolve_param("f3", fASM, f3)
call(fASM)
ANS=3Dfloat64_to_atom(peek({f3,8}))
free(fASM)
return ANS
end function
global function SQRT(atom A)
object fASM,ANS
fASM=3Dget_asm(
"pusha "&
"fld qword ptr [f1] "&
"fsqrt "&
"fst qword ptr [f3] "&
"popa "&
"ret")
poke(f1, atom_to_float64(A))
resolve_param("f1", fASM, f1)
resolve_param("f3", fASM, f3)
call(fASM)
ANS=3Dfloat64_to_atom(peek({f3,8}))
free(fASM)
return ANS
end function
global function MULT(atom A, atom B)
object fASM,ANS
fASM=3Dget_asm(
"pusha "&
"fld qword ptr [f1] "&
"fld qword ptr [f2] "&
"fmul st(1) "&
"fst qword ptr [f3] "&
"popa "&
"ret")
poke(f1, atom_to_float64(A))
poke(f2, atom_to_float64(B))
resolve_param("f1", fASM, f1)
resolve_param("f2", fASM, f2)
resolve_param("f3", fASM, f3)
call(fASM)
ANS=3Dfloat64_to_atom(peek({f3,8}))
free(fASM)
return ANS
end function
-- END: MATH.E --
-- Begin: TEST.EX --
include math.e
atom tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8
atom t1,t2
t1=3Dtime()
for a=3D1 to 10000 do
tmp1=3D5+7
tmp2=3D7-5
tmp3=3Dsqrt(5)
tmp4=3D5*7
end for
t1=3Dtime()-t1
t2=3Dtime()
for a=3D1 to 1 do
tmp5=3DADD(5,7) -- <<
tmp6=3DSUB(5,7) -- << Remove these lines and everyrhing will "look" =
alright.
tmp7=3DSQRT(5) -- <<
tmp5=3DADD(5,7)
tmp6=3DSUB(5,7)
tmp7=3DSQRT(5)
tmp8=3DMULT(5,7)
end for
t2=3Dtime()-t2
puts(1,"Normal :\n")
printf(1,"%f s\n",{t1})
printf(1,"7+5=3D\t%f\n",{tmp1})
printf(1,"7-5=3D\t%f\n",{tmp2})
printf(1,"SQRT 5=3D\t%f\n",{tmp3})
printf(1,"7*5=3D\t%f\n",{tmp4})
puts(1,"\n")
puts(1,"ASM :\n")
printf(1,"%f s\n",{t2})
printf(1,"7+5=3D\t%f\n",{tmp5})
printf(1,"7-5=3D\t%f\n",{tmp6})
printf(1,"SQRT 5=3D\t%f\n",{tmp7})
printf(1,"7*5=3D\t%f\n",{tmp8})
--END: TEST.EX --
As you can see, if I use these functions too often, they don't return =
the right values.
What am I doing wrong, or what is wrong.
Also some suggestions for comments are also welcome, I'm not very good =
at that either.
Thanks,
PQ
QC
------=_NextPart_000_0009_01BF5841.D7CB0BB0
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.2314.1000" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Hy,</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>Something EU this time, while I was =
studying some=20
ASM things like boot sectors etc. I remembered some thing I created in =
EU using=20
ASM, but it isn't very good:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>(I'm not very good in ASM, I'm trying =
to understand=20
it)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>(I used ASM.E by Pete =
Eberlein)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>(If someones version of ASM.E produces =
an error=20
because of the entered values for resolve_param(), remove the ", fASM"=20
's)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>-- Begin: MATH.E --</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>include asm.e</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>atom f1,f2,f3<BR>f1=3Dallocate(8) =
f2=3Dallocate(8)=20
f3=3Dallocate(8)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>global function ADD(atom A, atom =
B)<BR>object=20
fASM,ANS<BR>fASM=3Dget_asm(<BR> "pusha "&<BR> "fld qword =
ptr [f1]=20
"&<BR> "fld qword ptr [f2] "&<BR> "fadd st(1)=20
"&<BR> "fst qword ptr [f3] "&<BR> "popa=20
"&<BR> "ret")<BR> poke(f1, =
atom_to_float64(A))<BR> poke(f2,=20
atom_to_float64(B))<BR> resolve_param("f1", fASM,=20
f1)<BR> resolve_param("f2", fASM, f2)<BR> resolve_param("f3", =
fASM,=20
nbsp;free(fASM)<BR> return=20
ANS<BR>end function</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>global function SUB(atom A, atom =
B)<BR>object=20
fASM,ANS<BR>fASM=3Dget_asm(<BR> "pusha "&<BR> "fld qword =
ptr [f1]=20
"&<BR> "fld qword ptr [f2] "&<BR> "fsub st(1)=20
"&<BR> "fst qword ptr [f3] "&<BR> "popa=20
"&<BR> "ret")<BR> poke(f1, =
atom_to_float64(A))<BR> poke(f2,=20
atom_to_float64(B))<BR> resolve_param("f1", fASM,=20
f1)<BR> resolve_param("f2", fASM, f2)<BR> resolve_param("f3", =
fASM,=20
nbsp;free(fASM)<BR> return=20
ANS<BR>end function</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>global function SQRT(atom A)<BR>object=20
fASM,ANS<BR>fASM=3Dget_asm(<BR> "pusha "&<BR> "fld qword =
ptr [f1]=20
"&<BR> "fsqrt "&<BR> "fst qword ptr [f3] =
"&<BR> "popa=20
"&<BR> "ret")<BR> poke(f1,=20
atom_to_float64(A))<BR> resolve_param("f1", fASM,=20
f1)<BR> resolve_param("f3", fASM,=20
nbsp;free(fASM)<BR> return=20
ANS<BR>end function</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>global function MULT(atom A, atom =
B)<BR>object=20
fASM,ANS<BR>fASM=3Dget_asm(<BR> "pusha "&<BR> "fld qword =
ptr [f1]=20
"&<BR> "fld qword ptr [f2] "&<BR> "fmul st(1)=20
"&<BR> "fst qword ptr [f3] "&<BR> "popa=20
"&<BR> "ret")<BR> poke(f1, =
atom_to_float64(A))<BR> poke(f2,=20
atom_to_float64(B))<BR> resolve_param("f1", fASM,=20
f1)<BR> resolve_param("f2", fASM, f2)<BR> resolve_param("f3", =
fASM,=20
nbsp;free(fASM)<BR> return=20
ANS<BR>end function</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>-- END: MATH.E --</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>-- Begin: TEST.EX --</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>include math.e</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>atom=20
tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8<BR>atom t1,t2</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>t1=3Dtime()<BR>for a=3D1 to 10000=20
sp;tmp4=3D5*7<BR>end=20
for<BR>t1=3Dtime()-t1</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>t2=3Dtime()<BR>for a=3D1 to 1 =
do</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> tmp5=3DADD(5,7) --=20
<<<BR> tmp6=3DSUB(5,7) -- << Remove =
these lines=20
and everyrhing will "look" =
alright.<BR> tmp7=3DSQRT(5) --=20
<<</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial=20
QRT(5)<BR> tmp8=3DMULT(5,7)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>end for<BR>t2=3Dtime()-t2</FONT></DIV>
<DIV><FONT face=3DArial size=3D2><BR>puts(1,"Normal =
:\n")<BR>printf(1,"%f=20
{tmp2})<BR>printf(1,"SQRT=20
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>puts(1,"\n")</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>puts(1,"ASM :\n")<BR>printf(1,"%f=20
{tmp6})<BR>printf(1,"SQRT=20
<DIV><FONT face=3DArial size=3D2>--END: TEST.EX --</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>As you can see, if I use these =
functions too=20
often, they don't return the right values.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>What am I doing wrong, or what is=20
wrong.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>Also some suggestions for comments are =
also=20
welcome, I'm not very good at that either.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2>Thanks,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>PQ</FONT></DIV>
------=_NextPart_000_0009_01BF5841.D7CB0BB0--
2. Re: What am I doing wrong...
Hi PQ,
My suggested changes are posted in between the quoted code:
>(I used ASM.E by Pete Eberlein)
>(If someones version of ASM.E produces an error because of the entered
values for resolve_param(), remove the ", fASM" 's)
The newer versions of asm.e will produce this error. The address of the
machine code is no longer required.
>-- Begin: MATH.E --
>include asm.e
>
>atom f1,f2,f3
>f1=allocate(8) f2=allocate(8) f3=allocate(8)
>
>global function ADD(atom A, atom B)
>object fASM,ANS
>fASM=get_asm(
> "pusha "&
> "fld qword ptr [f1] "&
> "fld qword ptr [f2] "&
> "fadd st(1) "&
> "fst qword ptr [f3] "&
faddp st(1), st ; add st(0) to st(1) and pop st(0)
fstp qword ptr [f3] ; store st(0) in memory, then pop st(0)
> "popa "&
> "ret")
> poke(f1, atom_to_float64(A))
> poke(f2, atom_to_float64(B))
> resolve_param("f1", fASM, f1)
> resolve_param("f2", fASM, f2)
> resolve_param("f3", fASM, f3)
> call(fASM)
> ANS=float64_to_atom(peek({f3,8}))
> free(fASM)
> return ANS
>end function
>
>global function SUB(atom A, atom B)
>object fASM,ANS
>fASM=get_asm(
> "pusha "&
> "fld qword ptr [f1] "&
> "fld qword ptr [f2] "&
> "fsub st(1) "&
> "fst qword ptr [f3] "&
fsubrp st(1), st ; subtract st(1) from st(0),
; store result in st(1) and pop st(0)
fstp qword ptr [f3] ; store st(0) in memory, then pop st(0)
> "popa "&
> "ret")
> poke(f1, atom_to_float64(A))
> poke(f2, atom_to_float64(B))
> resolve_param("f1", fASM, f1)
> resolve_param("f2", fASM, f2)
> resolve_param("f3", fASM, f3)
> call(fASM)
> ANS=float64_to_atom(peek({f3,8}))
> free(fASM)
> return ANS
>end function
>
>global function SQRT(atom A)
>object fASM,ANS
>fASM=get_asm(
> "pusha "&
> "fld qword ptr [f1] "&
> "fsqrt "&
> "fst qword ptr [f3] "&
fstp qword ptr [f3] ; store st(0) in memory, then pop st(0)
> "popa "&
> "ret")
> poke(f1, atom_to_float64(A))
> resolve_param("f1", fASM, f1)
> resolve_param("f3", fASM, f3)
> call(fASM)
> ANS=float64_to_atom(peek({f3,8}))
> free(fASM)
> return ANS
>end function
>
>global function MULT(atom A, atom B)
>object fASM,ANS
>fASM=get_asm(
> "pusha "&
> "fld qword ptr [f1] "&
> "fld qword ptr [f2] "&
> "fmul st(1) "&
> "fst qword ptr [f3] "&
faddp st(1), st ; mult st(1) by st(0) and pop st(0)
fstp qword ptr [f3] ; store st(0) in memory, then pop st(0)
> "popa "&
> "ret")
> poke(f1, atom_to_float64(A))
> poke(f2, atom_to_float64(B))
> resolve_param("f1", fASM, f1)
> resolve_param("f2", fASM, f2)
> resolve_param("f3", fASM, f3)
> call(fASM)
> ANS=float64_to_atom(peek({f3,8}))
> free(fASM)
> return ANS
>end function
>-- END: MATH.E --
>
>-- Begin: TEST.EX --
>include math.e
>
>atom tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8
>atom t1,t2
>
>t1=time()
>for a=1 to 10000 do
> tmp1=5+7
> tmp2=7-5
> tmp3=sqrt(5)
> tmp4=5*7
>end for
>t1=time()-t1
>
>t2=time()
>for a=1 to 1 do
>
> tmp5=ADD(5,7) -- <<
> tmp6=SUB(5,7) -- << Remove these lines and everyrhing will "look"
alright.
> tmp7=SQRT(5) -- <<
>
> tmp5=ADD(5,7)
> tmp6=SUB(5,7)
> tmp7=SQRT(5)
> tmp8=MULT(5,7)
>
>end for
>t2=time()-t2
>
>puts(1,"Normal :\n")
>printf(1,"%f s\n",{t1})
>printf(1,"7+5=\t%f\n",{tmp1})
>printf(1,"7-5=\t%f\n",{tmp2})
>printf(1,"SQRT 5=\t%f\n",{tmp3})
>printf(1,"7*5=\t%f\n",{tmp4})
>
<snip test.ex>
>As you can see, if I use these functions too often, they don't return the
right values.
>What am I doing wrong, or what is wrong.
You are forgetting to pop values from the floating-point stack, leaving it a
mess for later functions being called. For each fld, you should have one
command that does a pop (usually ending with the letter p)
If you forget to pop the values, the stack will overflow and the results of
later asm operations will return NaN.
>Also some suggestions for comments are also welcome, I'm not very good at
that either.
It would be much faster to move the get_asm and resolve_param calls outside
of the function. For example, this is how I would rewrite the ADD function:
constant fADD=get_asm(
"pusha "&
"fld qword ptr [f1] "&
"fld qword ptr [f2] "&
"faddp st(1),st "&
"fstp qword ptr [f3] "&
"popa "&
"ret")
resolve_param("f1", f1)
resolve_param("f2", f2)
resolve_param("f3", f3)
global function ADD(atom A, atom B)
poke(f1, atom_to_float64(A))
poke(f2, atom_to_float64(B))
call(fADD)
return float64_to_atom(peek({f3,8}))
end function
Although, even after converting all the functions this way, the Normal
operations are 50 times faster than the ASM routines. This is due to the
overhead of atom_to_float64, poke, and call. I have asked Rob Craig to add
define_c_proc and define_c_func for ASM routines, which I think would be a
great benefit for speed. Proposed new routines:
define_mach_proc(atom address, sequence arg_sizes)
define_mach_func(atom address, sequence arg_sizes, atom return_type)
The address defines the location in memory of the machine code, and the code
would of course have to follow the calling conventions of the current
architecture.
>Thanks,
>
>PQ
>QC
>
Pete
http://www.harborside.com/home/x/xseal/euphoria/