1. What am I doing wrong...

------=_NextPart_000_0009_01BF5841.D7CB0BB0
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hy,

Something EU this time, while I was studying some ASM things like boot =
sectors etc. I remembered some thing I created in EU using ASM, but it =
isn't very good:
(I'm not very good in ASM, I'm trying to understand it)
(I used ASM.E by Pete Eberlein)
(If someones version of ASM.E produces an error because of the entered =
values for resolve_param(), remove the ", fASM" 's)

-- Begin: MATH.E --
include asm.e

atom f1,f2,f3
f1=3Dallocate(8) f2=3Dallocate(8) f3=3Dallocate(8)

global function ADD(atom A, atom B)
object fASM,ANS
fASM=3Dget_asm(
 "pusha "&
 "fld qword ptr [f1] "&
 "fld qword ptr [f2] "&
 "fadd st(1) "&
 "fst qword ptr [f3] "&
 "popa "&
 "ret")
 poke(f1, atom_to_float64(A))
 poke(f2, atom_to_float64(B))
 resolve_param("f1", fASM, f1)
 resolve_param("f2", fASM, f2)
 resolve_param("f3", fASM, f3)
 call(fASM)
 ANS=3Dfloat64_to_atom(peek({f3,8}))
 free(fASM)
 return ANS
end function

global function SUB(atom A, atom B)
object fASM,ANS
fASM=3Dget_asm(
 "pusha "&
 "fld qword ptr [f1] "&
 "fld qword ptr [f2] "&
 "fsub st(1) "&
 "fst qword ptr [f3] "&
 "popa "&
 "ret")
 poke(f1, atom_to_float64(A))
 poke(f2, atom_to_float64(B))
 resolve_param("f1", fASM, f1)
 resolve_param("f2", fASM, f2)
 resolve_param("f3", fASM, f3)
 call(fASM)
 ANS=3Dfloat64_to_atom(peek({f3,8}))
 free(fASM)
 return ANS
end function

global function SQRT(atom A)
object fASM,ANS
fASM=3Dget_asm(
 "pusha "&
 "fld qword ptr [f1] "&
 "fsqrt "&
 "fst qword ptr [f3] "&
 "popa "&
 "ret")
 poke(f1, atom_to_float64(A))
 resolve_param("f1", fASM, f1)
 resolve_param("f3", fASM, f3)
 call(fASM)
 ANS=3Dfloat64_to_atom(peek({f3,8}))
 free(fASM)
 return ANS
end function

global function MULT(atom A, atom B)
object fASM,ANS
fASM=3Dget_asm(
 "pusha "&
 "fld qword ptr [f1] "&
 "fld qword ptr [f2] "&
 "fmul st(1) "&
 "fst qword ptr [f3] "&
 "popa "&
 "ret")
 poke(f1, atom_to_float64(A))
 poke(f2, atom_to_float64(B))
 resolve_param("f1", fASM, f1)
 resolve_param("f2", fASM, f2)
 resolve_param("f3", fASM, f3)
 call(fASM)
 ANS=3Dfloat64_to_atom(peek({f3,8}))
 free(fASM)
 return ANS
end function
-- END: MATH.E --

-- Begin: TEST.EX --
include math.e

atom tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8
atom  t1,t2

t1=3Dtime()
for a=3D1 to 10000 do
 tmp1=3D5+7
 tmp2=3D7-5
 tmp3=3Dsqrt(5)
 tmp4=3D5*7
end for
t1=3Dtime()-t1

t2=3Dtime()
for a=3D1 to 1 do

 tmp5=3DADD(5,7)  -- <<
 tmp6=3DSUB(5,7)  -- << Remove these lines and everyrhing will "look" =
alright.
 tmp7=3DSQRT(5)  -- <<

 tmp5=3DADD(5,7)
 tmp6=3DSUB(5,7)
 tmp7=3DSQRT(5)
 tmp8=3DMULT(5,7)

end for
t2=3Dtime()-t2

puts(1,"Normal :\n")
printf(1,"%f s\n",{t1})
printf(1,"7+5=3D\t%f\n",{tmp1})
printf(1,"7-5=3D\t%f\n",{tmp2})
printf(1,"SQRT 5=3D\t%f\n",{tmp3})
printf(1,"7*5=3D\t%f\n",{tmp4})

puts(1,"\n")

puts(1,"ASM :\n")
printf(1,"%f s\n",{t2})
printf(1,"7+5=3D\t%f\n",{tmp5})
printf(1,"7-5=3D\t%f\n",{tmp6})
printf(1,"SQRT 5=3D\t%f\n",{tmp7})
printf(1,"7*5=3D\t%f\n",{tmp8})
--END: TEST.EX --

As you can see, if I use these functions too often, they don't return =
the right values.
What am I doing wrong, or what is wrong.

Also some suggestions for comments are also welcome, I'm not very good =
at that either.

Thanks,
PQ
QC

------=_NextPart_000_0009_01BF5841.D7CB0BB0
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.2314.1000" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Hy,</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Something EU this time, while I was =
studying some=20
ASM things like boot sectors etc. I remembered some thing I created in =
EU using=20
ASM, but it isn't very good:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>(I'm not very good in ASM, I'm trying =
to understand=20
it)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>(I used ASM.E by Pete =
Eberlein)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>(If someones version of ASM.E produces =
an error=20
because of the entered values for resolve_param(), remove the ", fASM"=20
's)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>-- Begin: MATH.E --</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>include asm.e</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>atom f1,f2,f3<BR>f1=3Dallocate(8) =
f2=3Dallocate(8)=20
f3=3Dallocate(8)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>global function ADD(atom A, atom =
B)<BR>object=20
fASM,ANS<BR>fASM=3Dget_asm(<BR>&nbsp;"pusha "&amp;<BR>&nbsp;"fld qword =
ptr [f1]=20
"&amp;<BR>&nbsp;"fld qword ptr [f2] "&amp;<BR>&nbsp;"fadd st(1)=20
"&amp;<BR>&nbsp;"fst qword ptr [f3] "&amp;<BR>&nbsp;"popa=20
"&amp;<BR>&nbsp;"ret")<BR>&nbsp;poke(f1, =
atom_to_float64(A))<BR>&nbsp;poke(f2,=20
atom_to_float64(B))<BR>&nbsp;resolve_param("f1", fASM,=20
f1)<BR>&nbsp;resolve_param("f2", fASM, f2)<BR>&nbsp;resolve_param("f3", =
fASM,=20
nbsp;free(fASM)<BR>&nbsp;return=20
ANS<BR>end function</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>global function SUB(atom A, atom =
B)<BR>object=20
fASM,ANS<BR>fASM=3Dget_asm(<BR>&nbsp;"pusha "&amp;<BR>&nbsp;"fld qword =
ptr [f1]=20
"&amp;<BR>&nbsp;"fld qword ptr [f2] "&amp;<BR>&nbsp;"fsub st(1)=20
"&amp;<BR>&nbsp;"fst qword ptr [f3] "&amp;<BR>&nbsp;"popa=20
"&amp;<BR>&nbsp;"ret")<BR>&nbsp;poke(f1, =
atom_to_float64(A))<BR>&nbsp;poke(f2,=20
atom_to_float64(B))<BR>&nbsp;resolve_param("f1", fASM,=20
f1)<BR>&nbsp;resolve_param("f2", fASM, f2)<BR>&nbsp;resolve_param("f3", =
fASM,=20
nbsp;free(fASM)<BR>&nbsp;return=20
ANS<BR>end function</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>global function SQRT(atom A)<BR>object=20
fASM,ANS<BR>fASM=3Dget_asm(<BR>&nbsp;"pusha "&amp;<BR>&nbsp;"fld qword =
ptr [f1]=20
"&amp;<BR>&nbsp;"fsqrt "&amp;<BR>&nbsp;"fst qword ptr [f3] =
"&amp;<BR>&nbsp;"popa=20
"&amp;<BR>&nbsp;"ret")<BR>&nbsp;poke(f1,=20
atom_to_float64(A))<BR>&nbsp;resolve_param("f1", fASM,=20
f1)<BR>&nbsp;resolve_param("f3", fASM,=20
nbsp;free(fASM)<BR>&nbsp;return=20
ANS<BR>end function</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>global function MULT(atom A, atom =
B)<BR>object=20
fASM,ANS<BR>fASM=3Dget_asm(<BR>&nbsp;"pusha "&amp;<BR>&nbsp;"fld qword =
ptr [f1]=20
"&amp;<BR>&nbsp;"fld qword ptr [f2] "&amp;<BR>&nbsp;"fmul st(1)=20
"&amp;<BR>&nbsp;"fst qword ptr [f3] "&amp;<BR>&nbsp;"popa=20
"&amp;<BR>&nbsp;"ret")<BR>&nbsp;poke(f1, =
atom_to_float64(A))<BR>&nbsp;poke(f2,=20
atom_to_float64(B))<BR>&nbsp;resolve_param("f1", fASM,=20
f1)<BR>&nbsp;resolve_param("f2", fASM, f2)<BR>&nbsp;resolve_param("f3", =
fASM,=20
nbsp;free(fASM)<BR>&nbsp;return=20
ANS<BR>end function</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>-- END: MATH.E --</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>-- Begin: TEST.EX --</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>include math.e</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>atom=20
tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8<BR>atom&nbsp; t1,t2</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>t1=3Dtime()<BR>for a=3D1 to 10000=20
sp;tmp4=3D5*7<BR>end=20
for<BR>t1=3Dtime()-t1</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>t2=3Dtime()<BR>for a=3D1 to 1 =
do</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;tmp5=3DADD(5,7)&nbsp;&nbsp;--=20
&lt;&lt;<BR>&nbsp;tmp6=3DSUB(5,7)&nbsp;&nbsp;-- &lt;&lt;&nbsp;Remove =
these lines=20
and everyrhing will "look" =
alright.<BR>&nbsp;tmp7=3DSQRT(5)&nbsp;&nbsp;--=20
&lt;&lt;</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial=20
QRT(5)<BR>&nbsp;tmp8=3DMULT(5,7)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>end for<BR>t2=3Dtime()-t2</FONT></DIV>
<DIV><FONT face=3DArial size=3D2><BR>puts(1,"Normal =
:\n")<BR>printf(1,"%f=20
{tmp2})<BR>printf(1,"SQRT=20
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>puts(1,"\n")</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>puts(1,"ASM :\n")<BR>printf(1,"%f=20
{tmp6})<BR>printf(1,"SQRT=20
<DIV><FONT face=3DArial size=3D2>--END: TEST.EX --</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>As you can see, if I use these =
functions too=20
often,&nbsp;they don't return the right values.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>What am I doing wrong, or what is=20
wrong.</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Also some suggestions for comments are =
also=20
welcome, I'm not very good at that either.</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Thanks,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>PQ</FONT></DIV>

------=_NextPart_000_0009_01BF5841.D7CB0BB0--

new topic     » topic index » view message » categorize

2. Re: What am I doing wrong...

Hi PQ,

My suggested changes are posted in between the quoted code:

>(I used ASM.E by Pete Eberlein)
>(If someones version of ASM.E produces an error because of the entered
values for resolve_param(), remove the ", fASM" 's)

The newer versions of asm.e will produce this error.  The address of the
machine code is no longer required.

>-- Begin: MATH.E --
>include asm.e
>
>atom f1,f2,f3
>f1=allocate(8) f2=allocate(8) f3=allocate(8)
>
>global function ADD(atom A, atom B)
>object fASM,ANS
>fASM=get_asm(
> "pusha "&
> "fld qword ptr [f1] "&
> "fld qword ptr [f2] "&
> "fadd st(1) "&
> "fst qword ptr [f3] "&

faddp st(1), st  ; add st(0) to st(1) and pop st(0)
fstp qword ptr [f3] ; store st(0) in memory, then pop st(0)

> "popa "&
> "ret")
> poke(f1, atom_to_float64(A))
> poke(f2, atom_to_float64(B))
> resolve_param("f1", fASM, f1)
> resolve_param("f2", fASM, f2)
> resolve_param("f3", fASM, f3)
> call(fASM)
> ANS=float64_to_atom(peek({f3,8}))
> free(fASM)
> return ANS
>end function
>
>global function SUB(atom A, atom B)
>object fASM,ANS
>fASM=get_asm(
> "pusha "&
> "fld qword ptr [f1] "&
> "fld qword ptr [f2] "&
> "fsub st(1) "&
> "fst qword ptr [f3] "&

fsubrp st(1), st ; subtract st(1) from st(0),
   ; store result in st(1) and pop st(0)
fstp qword ptr [f3] ; store st(0) in memory, then pop st(0)

> "popa "&
> "ret")
> poke(f1, atom_to_float64(A))
> poke(f2, atom_to_float64(B))
> resolve_param("f1", fASM, f1)
> resolve_param("f2", fASM, f2)
> resolve_param("f3", fASM, f3)
> call(fASM)
> ANS=float64_to_atom(peek({f3,8}))
> free(fASM)
> return ANS
>end function
>
>global function SQRT(atom A)
>object fASM,ANS
>fASM=get_asm(
> "pusha "&
> "fld qword ptr [f1] "&
> "fsqrt "&
> "fst qword ptr [f3] "&

fstp qword ptr [f3] ; store st(0) in memory, then pop st(0)

> "popa "&
> "ret")
> poke(f1, atom_to_float64(A))
> resolve_param("f1", fASM, f1)
> resolve_param("f3", fASM, f3)
> call(fASM)
> ANS=float64_to_atom(peek({f3,8}))
> free(fASM)
> return ANS
>end function
>
>global function MULT(atom A, atom B)
>object fASM,ANS
>fASM=get_asm(
> "pusha "&
> "fld qword ptr [f1] "&
> "fld qword ptr [f2] "&
> "fmul st(1) "&
> "fst qword ptr [f3] "&

faddp st(1), st  ; mult st(1) by st(0) and pop st(0)
fstp qword ptr [f3] ; store st(0) in memory, then pop st(0)

> "popa "&
> "ret")
> poke(f1, atom_to_float64(A))
> poke(f2, atom_to_float64(B))
> resolve_param("f1", fASM, f1)
> resolve_param("f2", fASM, f2)
> resolve_param("f3", fASM, f3)
> call(fASM)
> ANS=float64_to_atom(peek({f3,8}))
> free(fASM)
> return ANS
>end function
>-- END: MATH.E --
>
>-- Begin: TEST.EX --
>include math.e
>
>atom tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7,tmp8
>atom  t1,t2
>
>t1=time()
>for a=1 to 10000 do
> tmp1=5+7
> tmp2=7-5
> tmp3=sqrt(5)
> tmp4=5*7
>end for
>t1=time()-t1
>
>t2=time()
>for a=1 to 1 do
>
> tmp5=ADD(5,7)  -- <<
> tmp6=SUB(5,7)  -- << Remove these lines and everyrhing will "look"
alright.
> tmp7=SQRT(5)  -- <<
>
> tmp5=ADD(5,7)
> tmp6=SUB(5,7)
> tmp7=SQRT(5)
> tmp8=MULT(5,7)
>
>end for
>t2=time()-t2
>
>puts(1,"Normal :\n")
>printf(1,"%f s\n",{t1})
>printf(1,"7+5=\t%f\n",{tmp1})
>printf(1,"7-5=\t%f\n",{tmp2})
>printf(1,"SQRT 5=\t%f\n",{tmp3})
>printf(1,"7*5=\t%f\n",{tmp4})
>

<snip test.ex>

>As you can see, if I use these functions too often, they don't return the
right values.
>What am I doing wrong, or what is wrong.

You are forgetting to pop values from the floating-point stack, leaving it a
mess for later functions being called.  For each fld, you should have one
command that does a pop (usually ending with the letter p)
If you forget to pop the values, the stack will overflow and the results of
later asm operations will return NaN.


>Also some suggestions for comments are also welcome, I'm not very good at
that either.

It would be much faster to move the get_asm and resolve_param calls outside
of the function.  For example, this is how I would rewrite the ADD function:

constant fADD=get_asm(
 "pusha "&
 "fld qword ptr [f1] "&
 "fld qword ptr [f2] "&
 "faddp st(1),st "&
 "fstp qword ptr [f3] "&
 "popa "&
 "ret")
resolve_param("f1", f1)
resolve_param("f2", f2)
resolve_param("f3", f3)

global function ADD(atom A, atom B)
 poke(f1, atom_to_float64(A))
 poke(f2, atom_to_float64(B))
 call(fADD)
 return float64_to_atom(peek({f3,8}))
end function


Although, even after converting all the functions this way, the Normal
operations are 50 times faster than the ASM routines.  This is due to the
overhead of atom_to_float64, poke, and call.  I have asked Rob Craig to add
define_c_proc and define_c_func for ASM routines, which I think would be a
great benefit for speed.  Proposed new routines:
define_mach_proc(atom address, sequence arg_sizes)
define_mach_func(atom address, sequence arg_sizes, atom return_type)
The address defines the location in memory of the machine code, and the code
would of course have to follow the calling conventions of the current
architecture.

>Thanks,
>
>PQ
>QC
>

Pete
http://www.harborside.com/home/x/xseal/euphoria/

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu