RE: ?? shift left

new topic     » goto parent     » topic index » view thread      » older message » newer message

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C0DCC6.6A322380
	charset="iso-8859-1"


> -----Original Message-----
> From: Bernie Ryan [mailto:xotron at localnet.com]

>   And that is why I said that it should be builtin.
>   I still can't see why RDS didn't implement it in the
>   interpeter all it takes is the SHL or SHR instruction
>   in inline assembler. There is too much overhead and loss of
>   speed by using power or any shift function.
>   I used assembler in my librarys to do it but it still
>   uses a function because no easy way to use assembler in Euphoria.

It sure seems like there would be, but I'm not sure how much you're actually
losing.  I'm not sure what your asm looks like, but I ran a little test
myself.  I compared functions calling asm to functions using power().

As you can see, the functions all take integers, which speeds things up a
little.  The poking of arguments, of course, slows the asm functions down.
So I also tested without doing that.  I even changed them to procedures, so
that the routines only call the asm, but still a */ power( 2, b) beat the
asm code.  Oh, yeah, I also used floor() whenever I divided, to ensure an
integral return value.

Looks like Rob has already inlined the shifts through the
power/multiplication/division operators.  OK, you could cut the overhead of
testing for powers of two, but even so, you're probably not gaining much.

Next, I translated to C (using Borland).  At the beginning of _0sal(), I
added:

_0 = 2;
_0 << _b;
_a *= _0;
return _a;

_0shr():
_a >> _b;
return _a;

and added the respective code for the other 4 functions that were using asm.
I found that the speed increase was about 10% for sal/sar and about 25-30%
for shl/shr.  I suppose you could say it's a significant difference (esp
with the shl/shr), but I wonder how much of a difference this would make,
ie, how often are you shifting bits around?  Maybe there's a lot of this in
graphics code, and it *would* make a difference.

I suppose that if you're really interested in speed, you'll be translating
to C anyways, and you can hack the func yourself.

Matt Lewis


------_=_NextPart_000_01C0DCC6.6A322380
Content-Type: application/octet-stream;
	name="SHIFT.EX"
Content-Disposition: attachment;
	filename="SHIFT.EX"

-- shift.ex
without warning
include asm.e
include get.e
include print.e

constant

asm_sal = get_asm( 
    "mov eax, num \n" &
    "sal eax, bits \n" &
    "mov [@num], eax \n" &
    "ret " 
    
    ),
sal_num = get_param( "num" ) + asm_sal,
sal_bits = get_param( "bits" ) + asm_sal,

asm_sar = get_asm(
    "mov eax, num \n" &
    "sar eax, bits \n" &
    "mov [@num], eax \n" &
    "ret " 
    ),
sar_num = asm_sar + get_param( "num" ),
sar_bits = asm_sar + get_param( "bits" ),

asm_shl = get_asm( 
    "mov eax, num \n" &
    "shl eax, bits \n" &
    "mov [@num], eax \n" &
    "ret " 
    
    ),
shl_num = get_param( "num" ) + asm_shl,
shl_bits = get_param( "bits" ) + asm_shl,

asm_shr = get_asm( 
    "mov eax, num \n" &
    "shr eax, bits \n" &
    "mov [@num], eax \n" &
    "ret " 
    
    ),
shr_num = get_param( "num" ) + asm_shr,
shr_bits = get_param( "bits" ) + asm_shr



function sal( integer a, integer b )
    poke4( sal_num , a )
    poke( sal_bits, b )
    call( asm_sal)
    return peek4u( sal_num )
end function

function salp( integer a, integer b )
    return a * power(2,b)
end function

function sar( integer a, integer b )
    poke4( sar_num, a )
    poke( sar_bits, b )
    call( asm_sar )
    return peek4u( sal_num )
end function

function sarp( integer a, integer b )
    return floor(a / power( 2, b ))
end function

function shl( integer a , integer b )
    poke4( shl_num, a )
    poke( shl_bits, b )
    call( asm_shl )
    return peek4u( shl_num )
end function

function shr( integer a , integer b )
    poke4( shr_num, a )
    poke( shr_bits, b )
    call( asm_shr )
    return peek4u( shr_num )
end function

function shlp( integer a, integer b )
    return a * power(2, b)
end function

function shrp( integer a, integer b )
    return floor(a / power( 2, b ))
end function

atom t, x, id
sequence times
times = repeat( {0,""}, 8 )

times[1][2] = "sal"
times[2][2] = "salp"
times[3][2] = "sar"
times[4][2] = "sarp"
times[5][2] = "shl"
times[6][2] = "shlp"
times[7][2] = "shr"
times[8][2] = "shrp"

constant rep = 100000

for i = 1 to length( times ) do
    id = routine_id( times[i][2] )
    t = time()
    for j = 1 to rep do
        x = call_func( id, { rand(1073741823),rand( 32 ) })
    end for
    times[i][1] = time()-t
end for


for i = 1 to length(times) do
    print(1, times[i] )
    puts(1,"\n")
end for
free( asm_sal )
free( asm_sar )
free( asm_shl )
free( asm_shr )




------_=_NextPart_000_01C0DCC6.6A322380--

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu