RE: ?? shift left
- Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> May 14, 2001
- 407 views
This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_01C0DCC6.6A322380 charset="iso-8859-1" > -----Original Message----- > From: Bernie Ryan [mailto:xotron at localnet.com] > And that is why I said that it should be builtin. > I still can't see why RDS didn't implement it in the > interpeter all it takes is the SHL or SHR instruction > in inline assembler. There is too much overhead and loss of > speed by using power or any shift function. > I used assembler in my librarys to do it but it still > uses a function because no easy way to use assembler in Euphoria. It sure seems like there would be, but I'm not sure how much you're actually losing. I'm not sure what your asm looks like, but I ran a little test myself. I compared functions calling asm to functions using power(). As you can see, the functions all take integers, which speeds things up a little. The poking of arguments, of course, slows the asm functions down. So I also tested without doing that. I even changed them to procedures, so that the routines only call the asm, but still a */ power( 2, b) beat the asm code. Oh, yeah, I also used floor() whenever I divided, to ensure an integral return value. Looks like Rob has already inlined the shifts through the power/multiplication/division operators. OK, you could cut the overhead of testing for powers of two, but even so, you're probably not gaining much. Next, I translated to C (using Borland). At the beginning of _0sal(), I added: _0 = 2; _0 << _b; _a *= _0; return _a; _0shr(): _a >> _b; return _a; and added the respective code for the other 4 functions that were using asm. I found that the speed increase was about 10% for sal/sar and about 25-30% for shl/shr. I suppose you could say it's a significant difference (esp with the shl/shr), but I wonder how much of a difference this would make, ie, how often are you shifting bits around? Maybe there's a lot of this in graphics code, and it *would* make a difference. I suppose that if you're really interested in speed, you'll be translating to C anyways, and you can hack the func yourself. Matt Lewis ------_=_NextPart_000_01C0DCC6.6A322380 Content-Type: application/octet-stream; name="SHIFT.EX" Content-Disposition: attachment; filename="SHIFT.EX" -- shift.ex without warning include asm.e include get.e include print.e constant asm_sal = get_asm( "mov eax, num \n" & "sal eax, bits \n" & "mov [@num], eax \n" & "ret " ), sal_num = get_param( "num" ) + asm_sal, sal_bits = get_param( "bits" ) + asm_sal, asm_sar = get_asm( "mov eax, num \n" & "sar eax, bits \n" & "mov [@num], eax \n" & "ret " ), sar_num = asm_sar + get_param( "num" ), sar_bits = asm_sar + get_param( "bits" ), asm_shl = get_asm( "mov eax, num \n" & "shl eax, bits \n" & "mov [@num], eax \n" & "ret " ), shl_num = get_param( "num" ) + asm_shl, shl_bits = get_param( "bits" ) + asm_shl, asm_shr = get_asm( "mov eax, num \n" & "shr eax, bits \n" & "mov [@num], eax \n" & "ret " ), shr_num = get_param( "num" ) + asm_shr, shr_bits = get_param( "bits" ) + asm_shr function sal( integer a, integer b ) poke4( sal_num , a ) poke( sal_bits, b ) call( asm_sal) return peek4u( sal_num ) end function function salp( integer a, integer b ) return a * power(2,b) end function function sar( integer a, integer b ) poke4( sar_num, a ) poke( sar_bits, b ) call( asm_sar ) return peek4u( sal_num ) end function function sarp( integer a, integer b ) return floor(a / power( 2, b )) end function function shl( integer a , integer b ) poke4( shl_num, a ) poke( shl_bits, b ) call( asm_shl ) return peek4u( shl_num ) end function function shr( integer a , integer b ) poke4( shr_num, a ) poke( shr_bits, b ) call( asm_shr ) return peek4u( shr_num ) end function function shlp( integer a, integer b ) return a * power(2, b) end function function shrp( integer a, integer b ) return floor(a / power( 2, b )) end function atom t, x, id sequence times times = repeat( {0,""}, 8 ) times[1][2] = "sal" times[2][2] = "salp" times[3][2] = "sar" times[4][2] = "sarp" times[5][2] = "shl" times[6][2] = "shlp" times[7][2] = "shr" times[8][2] = "shrp" constant rep = 100000 for i = 1 to length( times ) do id = routine_id( times[i][2] ) t = time() for j = 1 to rep do x = call_func( id, { rand(1073741823),rand( 32 ) }) end for times[i][1] = time()-t end for for i = 1 to length(times) do print(1, times[i] ) puts(1,"\n") end for free( asm_sal ) free( asm_sar ) free( asm_shl ) free( asm_shr ) ------_=_NextPart_000_01C0DCC6.6A322380--