1. fast bit rotation
Happy New Year to you all - it's already 2004 (1.40 am) here in the Godzone!
I am appealing to all speed merchants and optimization wizards (and
witches?, for Kat's sake): I need a really fast *single bit* rotation of an
integer, in either direction. Please...
jiri
2. Re: fast bit rotation
jiri babor wrote:
> Happy New Year to you all - it's already 2004 (1.40 am) here in the
> Godzone!
Blimey. I have a whole ten-and-a-bit hours to go yet.
> I am appealing to all speed merchants and optimization wizards (and
> witches?, for Kat's sake): I need a really fast *single bit* rotation
> of an integer, in either direction. Please...
I'm not sure there's a fast native way. How about:
function rotate_left_1(integer n, integer bitwidth)
atom mask mask = power(2,bitwidth)
return and_bits(n*2,mask-1)+(0!=and_bits(n,floor(mask/2)))
end function
function rotate_right_1(integer n, integer bitwidth)
atom mask mask = power(2,bitwidth)
return and_bits(floor(n/2),mask-1)+floor(mask/2)*and_bits(n,1)
end function
They simplify right down if you have a fixed bit width. e.g.:
-- for 8 bits width:
function rotate8_left_1(integer n)
return and_bits(n*2,255)+(0!=and_bits(n,128))
end function
function rotate8_right_1(integer n, integer bitwidth)
return and_bits(floor(n/2),255)+128*and_bits(n,1)
end function
Carl
--
[ Carl R White == aka () = The Domain of Cyrek = ]
[ Cyrek the Illogical /\ www.cyreksoft.yorks.com ]
3. Re: fast bit rotation
Thanks, Carl, I'll play with it. jiri
----- Original Message -----
From: "Carl W." <euphoria at cyreksoft.yorks.com>
To: <EUforum at topica.com>
Sent: 01 January 2004 2:14 AM
Subject: Re: fast bit rotation
>
>
> jiri babor wrote:
>
> > Happy New Year to you all - it's already 2004 (1.40 am) here in the
> > Godzone!
>
> Blimey. I have a whole ten-and-a-bit hours to go yet.
>
> > I am appealing to all speed merchants and optimization wizards (and
> > witches?, for Kat's sake): I need a really fast *single bit* rotation
> > of an integer, in either direction. Please...
>
> I'm not sure there's a fast native way. How about:
>
> function rotate_left_1(integer n, integer bitwidth)
> atom mask mask = power(2,bitwidth)
> return and_bits(n*2,mask-1)+(0!=and_bits(n,floor(mask/2)))
> end function
>
> function rotate_right_1(integer n, integer bitwidth)
> atom mask mask = power(2,bitwidth)
> return and_bits(floor(n/2),mask-1)+floor(mask/2)*and_bits(n,1)
> end function
>
> They simplify right down if you have a fixed bit width. e.g.:
>
> -- for 8 bits width:
> function rotate8_left_1(integer n)
> return and_bits(n*2,255)+(0!=and_bits(n,128))
> end function
>
> function rotate8_right_1(integer n, integer bitwidth)
> return and_bits(floor(n/2),255)+128*and_bits(n,1)
> end function
>
> Carl
>
> --
> [ Carl R White == aka () = The Domain of Cyrek = ]
> [ Cyrek the Illogical /\ www.cyreksoft.yorks.com ]
4. Re: fast bit rotation
------- <code>
include machine.e
constant rotate_left = allocate(15), rotate_left_param = rotate_left + 2
poke(rotate_left, {
#60, -- 0: pusha
#B8,#00,#00,#00,#00, -- 1: mov eax, dword param (2)
#D1,#C0, -- 6: rol eax, 1
#A3,#6A,#0A,#82,#82, -- 8: mov [@param], eax
#61, -- D: popa
#C3}) -- E: ret
poke4(rotate_left + 9, rotate_left_param) -- @param
constant rotate_right = allocate(15), rotate_right_param = rotate_right + 2
poke(rotate_right, {
#60, -- 0: pusha
#B8,#00,#00,#00,#00, -- 1: mov eax, dword param (2)
#D1,#C8, -- 6: ror eax, 1
#A3,#6A,#3A,#C2,#83, -- 8: mov [@param], eax
#61, -- D: popa
#C3}) -- E: ret
poke4(rotate_right + 9, rotate_right_param) -- @param
-- To use:
poke4(rotate_left_param, i) -- i = an integer
call(rotate_left)
i = peek4s(rotate_left_param)
poke4(rotate_right_param, j) -- j = an integer
call(rotate_right)
j = peek4s(rotate_right_param)
-- Before closing your program, you should call:
free(rotate_left)
free(rotate_right)
------- </code>
Perhaps there are faster methods than this, but on my laptop (Pentium II
433MHz, 64MB RAM), I can do 10_000_000 (ten million) rotations in 19.11
seconds, which is more than 500_000 rotations per second.
--
Tommy Carlier
tommy online: http://users.pandora.be/tommycarlier
5. Re: fast bit rotation
hello jiri,
Do you want object operations possible?
example:
sequence s
s = {34, 234, 12, {105}, ...} -- set of integers. could be any depth.
left_rotate_all(s, 5) -- rotate each integer by 5?
What I have below is without object operations.
If you want object operations then it looks like a modification of l4()
would be the best fit. Others are welcome to out do me.
I only did left rotations below... rights would be similar or
you could simply do inverse left instead.
Example: shift right 1 = shift left 8-1 or 7
-----------------------------------
-- 8-bit, Bit rotation to the left.
function l1(integer i, integer r)
integer result
result = i
for A = 1 to r do
result *= 2
if (result > 255) then
result -= 255
end if
end for
return result
end function
function l2(integer i, integer r)
integer result
result = i
for A = 1 to r do
result *= 2
end for
if (result > 255) then
result += (and_bits(result, #FF00)/#100)
result = and_bits(result, #FF)
end if
return result
end function
function l3(integer i, integer r)
integer result
result = i * power(2, r)
if (result > 255) then
result += (and_bits(result, #FF00)/#100)
result = and_bits(result, #FF)
end if
return result
end function
sequence rotation_table
rotation_table = repeat(repeat(0, 7), 256)
for i = 0 to 255 do
for r = 1 to 7 do
rotation_table[i+1][r] = l1(i, r)
end for
end for
constant
POWER_TABLE = {2, 4, 8, 16, 32, 64, 128, 256},
ROTATION_TABLE = rotation_table
rotation_table = repeat(repeat(0, 256), 7)
for i = 0 to 255 do
for r = 1 to 7 do
rotation_table[r][i+1] = l1(i, r)
end for
end for
constant
ROTATION_TABLE2 = rotation_table
function l4(integer i, integer r)
integer result
result = i * POWER_TABLE[r]
if (result > 255) then
result += (and_bits(result, #FF00)/#100)
result = and_bits(result, #FF)
end if
return result
end function
function l5(integer i, integer r)
return ROTATION_TABLE[i+1][r]
end function
function l6(integer i, integer r)
return ROTATION_TABLE2[r][i+1]
end function
-----------------------
atom t
integer j, c, d, l
l = 0
d = 5
c = 0
t = time() + d
while (t > time()) do
for i = 0 to 255 do
for A = 1 to 7 do
j = l1(i, A)
end for
end for
c += 1
end while
l += 1
? {l, c/d}
c = 0
t = time() + d
while (t > time()) do
for i = 0 to 255 do
for A = 1 to 7 do
j = l2(i, A)
end for
end for
c += 1
end while
l += 1
? {l, c/d}
c = 0
t = time() + d
while (t > time()) do
for i = 0 to 255 do
for A = 1 to 7 do
j = l3(i, A)
end for
end for
c += 1
end while
l += 1
? {l, c/d}
c = 0
t = time() + d
while (t > time()) do
for i = 0 to 255 do
for A = 1 to 7 do
j = l4(i, A)
end for
end for
c += 1
end while
l += 1
? {l, c/d}
c = 0
t = time() + d
while (t > time()) do
for i = 0 to 255 do
for A = 1 to 7 do
j = l5(i, A)
end for
end for
c += 1
end while
l += 1
? {l, c/d}
c = 0
t = time() + d
while (t > time()) do
for i = 0 to 255 do
for A = 1 to 7 do
j = l6(i, A)
end for
end for
c += 1
end while
l += 1
? {l, c/d}
machine_proc(26, 1)
-------------------------------------
Lucius L. Hilley III
----- Original Message -----
From: "jiri babor" <jbabor at paradise.net.nz>
To: <EUforum at topica.com>
Sent: Wednesday, December 31, 2003 07:41 AM
Subject: fast bit rotation
>
>
> Happy New Year to you all - it's already 2004 (1.40 am) here in the
Godzone!
>
> I am appealing to all speed merchants and optimization wizards (and
> witches?, for Kat's sake): I need a really fast *single bit* rotation of
an
> integer, in either direction. Please...
>
> jiri
>
>
>
> TOPICA - Start your own email discussion group. FREE!
>
>
6. Re: fast bit rotation
Another thing you could try is to build tables of pre-computed shifts,
eg:
sequence shl, shr
shl=repeat(0,256)
shr=repeat(0,256)
for i = 0 to 255 do
if i<=127 then
shl[i+1]=i*2
else
shl[i+1]=and_bits(i,#7F)*2+1
end if
shr[shl[i+1]+1]=i
end for
Use k=shl[k+1] when k is 0..255, to avoid zero index problems.
You can do your own test/timings, though I got 3.9 million/sec on a
233MHz PII with 48Mb ram, sorry Tommy, CNR
)
Pete
PS No warranty on the above code, btw.
7. Re: fast bit rotation
Pete wrote:
> Another thing you could try is to build tables of pre-computed shifts,
> eg:
>
> sequence shl, shr
> shl=repeat(0,256)
> shr=repeat(0,256)
> for i = 0 to 255 do
> if i<=127 then
> shl[i+1]=i*2
> else
> shl[i+1]=and_bits(i,#7F)*2+1
> end if
> shr[shl[i+1]+1]=i
> end for
>
> Use k=shl[k+1] when k is 0..255, to avoid zero index problems.
>
> You can do your own test/timings, though I got 3.9 million/sec on a
> 233MHz PII with 48Mb ram, sorry Tommy, CNR
)
Of course lookup tables are a lot faster. But the sequences above only
work on 8-bit values between 0 and 255, and shift instead of rotate, and I
think that jiri asked for integer (=32-bit) rotations. If you only need
8-bit rotations, you should definitely go for the pre-computed lookup
sequences. Perhaps my machine-code version (which BTW only works on
Intel-compatible microprocessors) could even be tweaked a little further:
does anybody know if it's OK to replace PUSHA/POPA with PUSH EAX/POP EAX,
if I only modify EAX? I think it would definitely be faster.
>
> Pete
> PS No warranty on the above code, btw.
--
Tommy Carlier
tommy online: http://users.pandora.be/tommycarlier
8. Re: fast bit rotation
Using a precomputed table is 2 of my examples.
One arrangement of 7x256 another of 256x7.
Except for the precomputed tables, any of my examples could
be modified for 32-bit integers and extended to work on sequences
to rotate all of the values within. Making the assumption that
all values within the sequence are not floats.
Lucius L. Hilley III
----- Original Message -----
From: "Pete Lomax" <petelomax at blueyonder.co.uk>
To: <EUforum at topica.com>
Sent: Wednesday, December 31, 2003 10:59 AM
Subject: Re: fast bit rotation
>
>
> Another thing you could try is to build tables of pre-computed shifts,
> eg:
>
> sequence shl, shr
> shl=repeat(0,256)
> shr=repeat(0,256)
> for i = 0 to 255 do
> if i<=127 then
> shl[i+1]=i*2
> else
> shl[i+1]=and_bits(i,#7F)*2+1
> end if
> shr[shl[i+1]+1]=i
> end for
>
> Use k=shl[k+1] when k is 0..255, to avoid zero index problems.
>
> You can do your own test/timings, though I got 3.9 million/sec on a
> 233MHz PII with 48Mb ram, sorry Tommy, CNR
)
>
> Pete
> PS No warranty on the above code, btw.
>
9. Re: fast bit rotation
On 31 Dec 2003, at 15:59, Pete Lomax wrote:
>
>
> Another thing you could try is to build tables of pre-computed shifts,
> eg:
>
> sequence shl, shr
> shl=repeat(0,256)
> shr=repeat(0,256)
> for i = 0 to 255 do
> if i<=127 then
> shl[i+1]=i*2
> else
> shl[i+1]=and_bits(i,#7F)*2+1
> end if
> shr[shl[i+1]+1]=i
> end for
>
> Use k=shl[k+1] when k is 0..255, to avoid zero index problems.
>
> You can do your own test/timings, though I got 3.9 million/sec on a
> 233MHz PII with 48Mb ram, sorry Tommy, CNR
)
But those are bytes, not 31 (or 32) bit full integers. When scaled up:
sequence shl, shr
atom time1, time2, int
int = 65535 * 256 ---- 16776960
shl=repeat(0,65536*256)
shr=repeat(0,65536*256)
time1 = time()
for i = 0 to int do
if i<=127 then
shl[i+1]=i*2
else
shl[i+1]=and_bits(i,#7F)*2+1
end if
shr[shl[i+1]+1]=i
end for
time2 = time()
puts(1,sprintf("%d",time2-time1))
time1 = getc(0)
i got 29 sec for that run, or 578,515 shifts per sec, on a K6-2-233 with 384
megs of ram. Good thing i had the ram, too, because the program used over
131 megabytes of ram. It used 87% of cpu clocks running on win95B.
Kat
10. Re: fast bit rotation
On Wed, 31 Dec 2003 17:41:14 +0100, Tommy Carlier
<tommy.carlier at pandora.be> wrote:
>Pete wrote:
>But the sequences above only
>work on 8-bit values between 0 and 255
Oops. That was rather stupid of me.
Pete
11. Re: fast bit rotation
On Wed, 31 Dec 2003 12:26:34 -0600, Kat <gertie at visionsix.com> wrote:
>But those are bytes, not 31 (or 32) bit full integers. When scaled up:
>shl=repeat(0,65536*256)
Whoops, looks like some of my stupidity rubbed off: that's only 3
bytes or 24-bit integers, btw. I don't think you can buy a 32-bit
machine with the needed 32GB memory, so don't bother trying it
>i got 29 sec for that run, or 578,515 shifts per sec
Yeah, it's not relevant now, but I was actually timing table lookup,
not the initialise.
I'll get me coat... and me dunce's hat.
Pete
12. Re: fast bit rotation
Thanks, guys, for your tremendous response. Tommy's machine code looks most
promising. Currently I am looking at 24 bit integers, later I might even be
forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and
similar 8-bit solutions are interesting, but not of much practical use.
And thank you, Robert, for your explanation. I still think any
self-respecting language should have fast shifts and rotations built into
it.
jiri
13. Re: fast bit rotation
----- Original Message -----
From: "jiri babor" <jbabor at paradise.net.nz>
To: <EUforum at topica.com>
Subject: Re: fast bit rotation
>
>
> Thanks, guys, for your tremendous response. Tommy's machine code looks most
> promising. Currently I am looking at 24 bit integers, later I might even be
> forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and
> similar 8-bit solutions are interesting, but not of much practical use.
Doesn't the INTEGER type only support 30-bit integers? I make the range for
INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It seems that
Euphoria uses 30-bits to hold the value, one bit to hold the sign and the 32nd
bit to hold the flag to tell if its an integer or something else? Just guessing.
> And thank you, Robert, for your explanation. I still think any
> self-respecting language should have fast shifts and rotations built into
> it.
Why is this Jiri? I would have thought that they are only a few programming
problems where bit shifting/rotating is of much use. Low-level device
interfacing, cryptography, image/audio manipulating, low-RAM environments (8-bit
embedded systems), ... What are you working on that needs it?
--
Derek
14. Re: fast bit rotation
Derek wrote:
>Doesn't the INTEGER type only support 30-bit integers? I make the range for
INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It seems that
Euphoria uses 30-bits to hold the value, one bit to hold the sign and the
32nd bit to hold the flag to tell if its an integer or something else? Just
guessing.
I guess it's just semantics. Commonly we talk about 32-bit integers, not 31,
even when all 32 bits are used to hold *signed* integers in just about any
other language.
>And thank you, Robert, for your explanation. I still think any
self-respecting language should have fast shifts and rotations built into
it.
Why is this Jiri? I would have thought that they are only a few programming
problems where bit shifting/rotating is of much use. Low-level device
interfacing, cryptography, image/audio manipulating, low-RAM environments
(8-bit embedded systems), ...
I suppose it depends on your hobby. One of mine happens to be data
compression, which you left out of your otherwise good account. In many
algorithms, especially for lossless compression, you fiddle bits all the
time. I know it's a well trodden turf, but still pretty interesting.
>What are you working on that needs it?
After a couple of years I resurrected my 'dream language' project that I am
sketching in Euphoria. For that I need fast tables. In the past I used
sorted tables with binary searches. Now I want to try hashed tables, and at
the moment I am evaluating several faster hashing algorithms. Many of them
use shifts and/or rotations to introduce some extra noise into the hashing
functions to achieve more even bucket distribution. I have already one or
two really interesting and reasonably flexible schemes going, probably good
enough for general consumption - but only after a bit more testing...
jiri
15. Re: fast bit rotation
Derek wrote:
> ----- Original Message -----
> From: "jiri babor"
>
>> Thanks, guys, for your tremendous response. Tommy's machine code looks most
>> promising. Currently I am looking at 24 bit integers, later I might even be
>> forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and
>> similar 8-bit solutions are interesting, but not of much practical use.
>
> Doesn't the INTEGER type only support 30-bit integers? I make the range
> for INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It
> seems that Euphoria uses 30-bits to hold the value, one bit to hold the
> sign
I think the sign bit must be counted, too. So 30 bits + 1 bit = 31 bits.
If we add power(2,30) to both the minimum value and the maximum value
that you mentioned above, that does not alter the range. Then we get an
unsigned integer type from 0 to (power(2,31)-1.
Now it's pretty clear, that the range consists of 31 bit. Whether or not
one bit is used to hold the sign, is a different question, IMHO.
> and the 32nd bit to hold the flag to tell if its an integer or
> something else? Just guessing.
<snip>
Regards,
Juergen
--
/"\ ASCII ribbon campain | Money is the root of all evil.
\ / against HTML in | Send 20 Dollars for more info.
X e-mail and news, |
/ \ and unneeded MIME | http://home.arcor.de/luethje/prog/
16. Re: fast bit rotation
Tommy wrote:
<snip>
> Perhaps my machine-code version (which BTW only works on
> Intel-compatible microprocessors)
I think Euphoria itself "only" runs on Intel-compatible processors, no?
> could even be tweaked a little further:
> does anybody know if it's OK to replace PUSHA/POPA with PUSH EAX/POP EAX,
> if I only modify EAX? I think it would definitely be faster.
AFAIK it's not necessary to preserve the registers EAX, ECX and EDX.
When you look at the demo program 'callmach.ex' (Euphoria 2.4), you'll
see in Example #2, that RDS also uses EAX without preserving it.
Your code only uses EAX, so I think it's not necessary to preserve any
register at all.
To try it out, just replace #60 and #61 with #90 (NOP) in your code.
It works fine for me.
BTW, Rob:
Is there a typo in 'callmach.ex' Example #2, or do I misunderstand
something?
Concerning add_code(), it reads in the comment:
"12 (4+8) (hex C) bytes are popped off the stack before returning",
but the statement that you use is:
"#C2, #08, #00 -- ret 8 -- pop 8 bytes off the stack"
Regards,
Juergen
--
/"\ ASCII ribbon campain | This message has been ROT-13 encrypted
\ / against HTML in | twice for higher security.
X e-mail and news, |
/ \ and unneeded MIME | http://home.arcor.de/luethje/prog/
17. Re: fast bit rotation
Tommy wrote:
> ------- <code>
> include machine.e
>
> constant rotate_left = allocate(15), rotate_left_param = rotate_left + 2
> poke(rotate_left, {
> #60, -- 0: pusha
> #B8,#00,#00,#00,#00, -- 1: mov eax, dword param (2)
> #D1,#C0, -- 6: rol eax, 1
> #A3,#6A,#0A,#82,#82, -- 8: mov [@param], eax
> #61, -- D: popa
> #C3}) -- E: ret
> poke4(rotate_left + 9, rotate_left_param) -- @param
>
> constant rotate_right = allocate(15), rotate_right_param = rotate_right + 2
> poke(rotate_right, {
> #60, -- 0: pusha
> #B8,#00,#00,#00,#00, -- 1: mov eax, dword param (2)
> #D1,#C8, -- 6: ror eax, 1
> #A3,#6A,#3A,#C2,#83, -- 8: mov [@param], eax
> #61, -- D: popa
> #C3}) -- E: ret
> poke4(rotate_right + 9, rotate_right_param) -- @param
>
> -- To use:
> poke4(rotate_left_param, i) -- i = an integer
> call(rotate_left)
> i = peek4s(rotate_left_param)
>
> poke4(rotate_right_param, j) -- j = an integer
> call(rotate_right)
> j = peek4s(rotate_right_param)
>
> -- Before closing your program, you should call:
> free(rotate_left)
> free(rotate_right)
> ------- </code>
I do not think that it is necessary or important to free the allocated
space for the code under normal circumstances:
- When the program terminates, all allocated space is given back to the
O/S anyhow.
- The space that is occupied by normal Euphoria routines also is not
freed during runtime of the program.
Well, a machine-code routine could gain an advantage here, but that's
not important unless the routine is big.
- If the machine-code routine is defined inside an include file, the
statement for freeing the code space is inside the main program file.
IMHO this is not elegant coding style.
These are just some remarks, so that nobody thinks that I've _forgotten_
to free the space in my code below.
> Perhaps there are faster methods than this, but on my laptop (Pentium II
> 433MHz, 64MB RAM), I can do 10_000_000 (ten million) rotations in 19.11
> seconds, which is more than 500_000 rotations per second.
In Euphoria 2.4, it's now possible to define the parameters and return
value for a machine-code routine. I like it, because in my opinion it's
more elegant than using the "old" method, but I don't know which method
is faster.
Inspired by your code, I modified Example #2 of the demo program
'callmach.ex' by RDS:
-----------------------------[ begin code ]-----------------------------
-- needs Euphoria 2.4 or newer
include machine.e
include dll.e
-- Define the function ROTATE_LEFT();
-- use define_c_func() to allow parameters to be passed to machine code.
constant
ROL_CODE = {
-- first int argument is at stack offset +4, 2nd int is at +8
#8B, #44, #24, #04, -- mov eax, [esp+4]
#8A, #4C, #24, #08, -- mov cl, [esp+8]
#D3, #C0, -- rol eax, cl
#C2, #08, #00 -- ret 8 -- pop 8 bytes off the stack
},
ROL_SPACE = allocate(length(ROL_CODE))
poke(ROL_SPACE, ROL_CODE)
global constant
ROTATE_LEFT = define_c_func("", ROL_SPACE, {C_INT, C_INT}, C_INT)
-- Demo
integer n, count, res
n = 9
count = 2
res = c_func(ROTATE_LEFT, {n, count})
printf(1, "The result of 'ROTATE_LEFT(%d,%d)' is: %d\n", {n, count, res})
------------------------------[ end code ]------------------------------
It seems to work, but there is no warranty on the above code.
I have some experience with writing 16 bit ASM code, but unfortunately
not too much regarding 32 bit ASM code.
Especially I'm not sure, whether the statement "mov cl, [esp+8]" is
correct. Maybe it should be "mov ecx, [esp+8]" or something?
Concerning preserving the registers see my other post, please.
BTW:
Who is the maintainer of the *very great* Euphoria assembler, Pete
Eberlein or Mic? It seems that the assembler does not understand a
command such as 'ret 8'.
It would be very nice, if you could improve the assembler in this regard.
Regards,
Juergen
--
/"\ ASCII ribbon campain | This message has been ROT-13 encrypted
\ / against HTML in | twice for higher security.
X e-mail and news, |
/ \ and unneeded MIME | http://home.arcor.de/luethje/prog/
18. Re: fast bit rotation
This is a multi-part message in MIME format.
--------------A472BA832284E0E9474EDCC4
I programmed 'true euphorian' rotation functions rotl(objekt x) and
rotr(object x), see include file ROTATION.E and test file ROTATION.EXW
of attached zip file. To run it under DOS, rename *.EXW to *.EX.
On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW
1.91 micro seconds for rotr() and
1.63 micro seconds for rotl() per call.
Rolf
--------------A472BA832284E0E9474EDCC4
Content-Type: application/x-zip-compressed;
name="Rotation.zip"
19. Re: fast bit rotation
Rolf wrote:
> I programmed 'true euphorian' rotation functions rotl(objekt x) and
> rotr(object x), see include file ROTATION.E and test file ROTATION.EXW
> of attached zip file. To run it under DOS, rename *.EXW to *.EX.
>
> On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW
> 1.91 micro seconds for rotr() and
> 1.63 micro seconds for rotl() per call.
You can even gain speed by omitting 'pusha' and 'popa'.
I like the idea of wrapping the ASM functions by 'true euphorian'
functions, especially in order to be able to apply them to any Euphoria
object.
However, the main disadvantage of Tommy's routines IMHO is the point,
that they use a fixed count of '1' for any rotation. A versatile library
routine IMHO should allow the user to specify the count as parameter, so
that it would look somehow like:
function rotate_left(value, count)
Regards,
Juergen
--
/"\ ASCII ribbon campain | This message has been ROT-13 encrypted
\ / against HTML in | twice for higher security.
X e-mail and news, |
/ \ and unneeded MIME | http://home.arcor.de/luethje/prog/
20. Re: fast bit rotation
Me wrote:
<big snip>
> -----------------------------[ begin code ]-----------------------------
> -- needs Euphoria 2.4 or newer
>
> include machine.e
> include dll.e
>
> -- Define the function ROTATE_LEFT();
> -- use define_c_func() to allow parameters to be passed to machine code.
> constant
> ROL_CODE = {
> -- first int argument is at stack offset +4, 2nd int is at +8
> #8B, #44, #24, #04, -- mov eax, [esp+4]
> #8A, #4C, #24, #08, -- mov cl, [esp+8]
> #D3, #C0, -- rol eax, cl
> #C2, #08, #00 -- ret 8 -- pop 8 bytes off the stack
> },
> ROL_SPACE = allocate(length(ROL_CODE))
>
> poke(ROL_SPACE, ROL_CODE)
>
> global constant
> ROTATE_LEFT = define_c_func("", ROL_SPACE, {C_INT, C_INT}, C_INT)
>
>
> -- Demo
> integer n, count, res
>
> n = 9
> count = 2
> res = c_func(ROTATE_LEFT, {n, count})
> printf(1, "The result of 'ROTATE_LEFT(%d,%d)' is: %d\n", {n, count, res})
> ------------------------------[ end code ]------------------------------
>
> It seems to work, but there is no warranty on the above code.
> I have some experience with writing 16 bit ASM code, but unfortunately
> not too much regarding 32 bit ASM code.
> Especially I'm not sure, whether the statement "mov cl, [esp+8]" is
> correct.
I know now, that it is correct.
Regards,
Juergen
--
/"\ ASCII ribbon campain | This message has been ROT-13 encrypted
\ / against HTML in | twice for higher security.
X e-mail and news, |
/ \ and unneeded MIME | http://home.arcor.de/luethje/prog/
21. Re: fast bit rotation
Juergen Luethje wrote:
> However, the main disadvantage of Tommy's routines IMHO is the point,
> that they use a fixed count of '1' for any rotation. A versatile library
> routine IMHO should allow the user to specify the count as parameter, so
> that it would look somehow like:
> function rotate_left(value, count)
Juergen, I do agree, but that slows down the speed considerably. By the
way, do you know a web resource for looking up 386 machine instructions?
I don't have a book here.
Rolf
22. Re: fast bit rotation
>From: Rolf Schr=F6der <rolf at rschr.de>
>Subject: Re: fast bit rotation
>
>Juergen Luethje wrote:
>
> > However, the main disadvantage of Tommy's routines IMHO is the point,
> > that they use a fixed count of '1' for any rotation. A versatile librar=
y
> > routine IMHO should allow the user to specify the count as parameter, s=
o
> > that it would look somehow like:
> > function rotate_left(value, count)
>
>Juergen, I do agree, but that slows down the speed considerably. By the
>way, do you know a web resource for looking up 386 machine instructions?
>I don't have a book here.
>
>Rolf
>
Try something like sandpile http://www.sandpile.org/ or even the Intel=
=20
website.
23. Re: fast bit rotation
Juergen Luethje wrote:
> BTW, Rob:
> Is there a typo in 'callmach.ex' Example #2, or do I misunderstand
> something?
>
> Concerning add_code(), it reads in the comment:
> "12 (4+8) (hex C) bytes are popped off the stack before returning",
> but the statement that you use is:
> "#C2, #08, #00 -- ret 8 -- pop 8 bytes off the stack"
Yes. Delete the comment about 12 (4+8) bytes.
I think it was from an earlier version of the code.
Thanks,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
24. Re: fast bit rotation
Tommy wrote:
> Juergen Luethje wrote:
>>
>> Rolf wrote:
>>
>>> I programmed 'true euphorian' rotation functions rotl(objekt x) and
>>> rotr(object x), see include file ROTATION.E and test file ROTATION.EXW
>>> of attached zip file. To run it under DOS, rename *.EXW to *.EX.
>>>
>>> On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW
>>> 1.91 micro seconds for rotr() and
>>> 1.63 micro seconds for rotl() per call.
>>
>> You can even gain speed by omitting 'pusha' and 'popa'.
>>
>> I like the idea of wrapping the ASM functions by 'true euphorian'
>> functions, especially in order to be able to apply them to any Euphoria
>> object.
>>
>> However, the main disadvantage of Tommy's routines IMHO is the point,
>> that they use a fixed count of '1' for any rotation. A versatile library
>> routine IMHO should allow the user to specify the count as parameter, so
>> that it would look somehow like:
>> function rotate_left(value, count)
>
> Of course it only rotates a fixed count of '1': I didn't mean to create
> a versatile library routine, just the fastest possible solution to
> jiri's problem, which was integer rotation (left or right) of fixed
> count 1 (I think :D).
Sorry, I didn't want to offend you or something. I thought Jiri wanted
some versatile library routines.
> And I believe that if you want the highest
> performance possible, my routine (no real Euphoria-function) is perhaps
> a bit faster: no stack popping, just 1 poke, 2 moves, a rotation, a
> return and 1 peek.
> Haven't compared the performance of the 2 routines, but I don't need to.
> I know they both perform very well and I also know that maintainability,
> ease of use and readable code are much more important than that tiny bit
> of extra performance.
> So Rolf and Juergen, your routines are in general much better than what
> I pulled together in 5 minutes.
Please don't forget: I was somehow inspired by your code, and Rolf's
routines use yours as a basis.
> BTW: wouldn't it be nice if someone made a library of such very general
> high-performance routines?
I already started modifying your ASM routines, so that they take 'count'
as a second parameter. Then I'll wrap them by pure euphoria functions,
in order to be able to apply them to any Euphoria object -- very similar
to what Rolf did.
I'll do the same with my ASM routines, compare the speed, make some
tweaks here and there ...
I also plan do write some SHIFT functions.
> Or perhaps Robert could add some of those
> routines to machine.e or one of the general Euphoria-libraries? If you
> see how many people could use shifts, rotates and other routines like
> that.
That would be nice.
<snip>
Regards,
Juergen
--
/"\ ASCII ribbon campain | This message has been ROT-13 encrypted
\ / against HTML in | twice for higher security.
X e-mail and news, |
/ \ and unneeded MIME | http://home.arcor.de/luethje/prog/
25. Re: fast bit rotation
Tommy Carlier wrote:
> Robert, couldn't you consider making get_bytes a
> low-level, builtin routine to dramatically improve its performance? I
> think get_bytes is a really standard routine that is probably quite
> frequently used.
Yes, I'll consider it, but I don't think it
would "dramatically" improve performance.
Thanks,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com