1. fast bit rotation
- Posted by jiri babor <jbabor at paradise.net.nz> Dec 31, 2003
- 489 views
Happy New Year to you all - it's already 2004 (1.40 am) here in the Godzone! I am appealing to all speed merchants and optimization wizards (and witches?, for Kat's sake): I need a really fast *single bit* rotation of an integer, in either direction. Please... jiri
2. Re: fast bit rotation
- Posted by "Carl R. White" <euphoria at cyreksoft.yorks.com> Dec 31, 2003
- 446 views
jiri babor wrote: > Happy New Year to you all - it's already 2004 (1.40 am) here in the > Godzone! Blimey. I have a whole ten-and-a-bit hours to go yet. > I am appealing to all speed merchants and optimization wizards (and > witches?, for Kat's sake): I need a really fast *single bit* rotation > of an integer, in either direction. Please... I'm not sure there's a fast native way. How about: function rotate_left_1(integer n, integer bitwidth) atom mask mask = power(2,bitwidth) return and_bits(n*2,mask-1)+(0!=and_bits(n,floor(mask/2))) end function function rotate_right_1(integer n, integer bitwidth) atom mask mask = power(2,bitwidth) return and_bits(floor(n/2),mask-1)+floor(mask/2)*and_bits(n,1) end function They simplify right down if you have a fixed bit width. e.g.: -- for 8 bits width: function rotate8_left_1(integer n) return and_bits(n*2,255)+(0!=and_bits(n,128)) end function function rotate8_right_1(integer n, integer bitwidth) return and_bits(floor(n/2),255)+128*and_bits(n,1) end function Carl -- [ Carl R White == aka () = The Domain of Cyrek = ] [ Cyrek the Illogical /\ www.cyreksoft.yorks.com ]
3. Re: fast bit rotation
- Posted by jiri babor <jbabor at paradise.net.nz> Dec 31, 2003
- 433 views
Thanks, Carl, I'll play with it. jiri ----- Original Message ----- From: "Carl W." <euphoria at cyreksoft.yorks.com> To: <EUforum at topica.com> Sent: 01 January 2004 2:14 AM Subject: Re: fast bit rotation > > > jiri babor wrote: > > > Happy New Year to you all - it's already 2004 (1.40 am) here in the > > Godzone! > > Blimey. I have a whole ten-and-a-bit hours to go yet. > > > I am appealing to all speed merchants and optimization wizards (and > > witches?, for Kat's sake): I need a really fast *single bit* rotation > > of an integer, in either direction. Please... > > I'm not sure there's a fast native way. How about: > > function rotate_left_1(integer n, integer bitwidth) > atom mask mask = power(2,bitwidth) > return and_bits(n*2,mask-1)+(0!=and_bits(n,floor(mask/2))) > end function > > function rotate_right_1(integer n, integer bitwidth) > atom mask mask = power(2,bitwidth) > return and_bits(floor(n/2),mask-1)+floor(mask/2)*and_bits(n,1) > end function > > They simplify right down if you have a fixed bit width. e.g.: > > -- for 8 bits width: > function rotate8_left_1(integer n) > return and_bits(n*2,255)+(0!=and_bits(n,128)) > end function > > function rotate8_right_1(integer n, integer bitwidth) > return and_bits(floor(n/2),255)+128*and_bits(n,1) > end function > > Carl > > -- > [ Carl R White == aka () = The Domain of Cyrek = ] > [ Cyrek the Illogical /\ www.cyreksoft.yorks.com ]
4. Re: fast bit rotation
- Posted by Tommy Carlier <tommy.carlier at pandora.be> Dec 31, 2003
- 432 views
------- <code> include machine.e constant rotate_left = allocate(15), rotate_left_param = rotate_left + 2 poke(rotate_left, { #60, -- 0: pusha #B8,#00,#00,#00,#00, -- 1: mov eax, dword param (2) #D1,#C0, -- 6: rol eax, 1 #A3,#6A,#0A,#82,#82, -- 8: mov [@param], eax #61, -- D: popa #C3}) -- E: ret poke4(rotate_left + 9, rotate_left_param) -- @param constant rotate_right = allocate(15), rotate_right_param = rotate_right + 2 poke(rotate_right, { #60, -- 0: pusha #B8,#00,#00,#00,#00, -- 1: mov eax, dword param (2) #D1,#C8, -- 6: ror eax, 1 #A3,#6A,#3A,#C2,#83, -- 8: mov [@param], eax #61, -- D: popa #C3}) -- E: ret poke4(rotate_right + 9, rotate_right_param) -- @param -- To use: poke4(rotate_left_param, i) -- i = an integer call(rotate_left) i = peek4s(rotate_left_param) poke4(rotate_right_param, j) -- j = an integer call(rotate_right) j = peek4s(rotate_right_param) -- Before closing your program, you should call: free(rotate_left) free(rotate_right) ------- </code> Perhaps there are faster methods than this, but on my laptop (Pentium II 433MHz, 64MB RAM), I can do 10_000_000 (ten million) rotations in 19.11 seconds, which is more than 500_000 rotations per second. -- Tommy Carlier tommy online: http://users.pandora.be/tommycarlier
5. Re: fast bit rotation
- Posted by "Lucius L. Hilley III" <L3Euphoria at bellsouth.net> Dec 31, 2003
- 436 views
hello jiri, Do you want object operations possible? example: sequence s s = {34, 234, 12, {105}, ...} -- set of integers. could be any depth. left_rotate_all(s, 5) -- rotate each integer by 5? What I have below is without object operations. If you want object operations then it looks like a modification of l4() would be the best fit. Others are welcome to out do me. I only did left rotations below... rights would be similar or you could simply do inverse left instead. Example: shift right 1 = shift left 8-1 or 7 ----------------------------------- -- 8-bit, Bit rotation to the left. function l1(integer i, integer r) integer result result = i for A = 1 to r do result *= 2 if (result > 255) then result -= 255 end if end for return result end function function l2(integer i, integer r) integer result result = i for A = 1 to r do result *= 2 end for if (result > 255) then result += (and_bits(result, #FF00)/#100) result = and_bits(result, #FF) end if return result end function function l3(integer i, integer r) integer result result = i * power(2, r) if (result > 255) then result += (and_bits(result, #FF00)/#100) result = and_bits(result, #FF) end if return result end function sequence rotation_table rotation_table = repeat(repeat(0, 7), 256) for i = 0 to 255 do for r = 1 to 7 do rotation_table[i+1][r] = l1(i, r) end for end for constant POWER_TABLE = {2, 4, 8, 16, 32, 64, 128, 256}, ROTATION_TABLE = rotation_table rotation_table = repeat(repeat(0, 256), 7) for i = 0 to 255 do for r = 1 to 7 do rotation_table[r][i+1] = l1(i, r) end for end for constant ROTATION_TABLE2 = rotation_table function l4(integer i, integer r) integer result result = i * POWER_TABLE[r] if (result > 255) then result += (and_bits(result, #FF00)/#100) result = and_bits(result, #FF) end if return result end function function l5(integer i, integer r) return ROTATION_TABLE[i+1][r] end function function l6(integer i, integer r) return ROTATION_TABLE2[r][i+1] end function ----------------------- atom t integer j, c, d, l l = 0 d = 5 c = 0 t = time() + d while (t > time()) do for i = 0 to 255 do for A = 1 to 7 do j = l1(i, A) end for end for c += 1 end while l += 1 ? {l, c/d} c = 0 t = time() + d while (t > time()) do for i = 0 to 255 do for A = 1 to 7 do j = l2(i, A) end for end for c += 1 end while l += 1 ? {l, c/d} c = 0 t = time() + d while (t > time()) do for i = 0 to 255 do for A = 1 to 7 do j = l3(i, A) end for end for c += 1 end while l += 1 ? {l, c/d} c = 0 t = time() + d while (t > time()) do for i = 0 to 255 do for A = 1 to 7 do j = l4(i, A) end for end for c += 1 end while l += 1 ? {l, c/d} c = 0 t = time() + d while (t > time()) do for i = 0 to 255 do for A = 1 to 7 do j = l5(i, A) end for end for c += 1 end while l += 1 ? {l, c/d} c = 0 t = time() + d while (t > time()) do for i = 0 to 255 do for A = 1 to 7 do j = l6(i, A) end for end for c += 1 end while l += 1 ? {l, c/d} machine_proc(26, 1) ------------------------------------- Lucius L. Hilley III ----- Original Message ----- From: "jiri babor" <jbabor at paradise.net.nz> To: <EUforum at topica.com> Sent: Wednesday, December 31, 2003 07:41 AM Subject: fast bit rotation > > > Happy New Year to you all - it's already 2004 (1.40 am) here in the Godzone! > > I am appealing to all speed merchants and optimization wizards (and > witches?, for Kat's sake): I need a really fast *single bit* rotation of an > integer, in either direction. Please... > > jiri > > > > TOPICA - Start your own email discussion group. FREE! > >
6. Re: fast bit rotation
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Dec 31, 2003
- 476 views
Another thing you could try is to build tables of pre-computed shifts, eg: sequence shl, shr shl=repeat(0,256) shr=repeat(0,256) for i = 0 to 255 do if i<=127 then shl[i+1]=i*2 else shl[i+1]=and_bits(i,#7F)*2+1 end if shr[shl[i+1]+1]=i end for Use k=shl[k+1] when k is 0..255, to avoid zero index problems. You can do your own test/timings, though I got 3.9 million/sec on a 233MHz PII with 48Mb ram, sorry Tommy, CNR ) Pete PS No warranty on the above code, btw.
7. Re: fast bit rotation
- Posted by Tommy Carlier <tommy.carlier at pandora.be> Dec 31, 2003
- 453 views
Pete wrote: > Another thing you could try is to build tables of pre-computed shifts, > eg: > > sequence shl, shr > shl=repeat(0,256) > shr=repeat(0,256) > for i = 0 to 255 do > if i<=127 then > shl[i+1]=i*2 > else > shl[i+1]=and_bits(i,#7F)*2+1 > end if > shr[shl[i+1]+1]=i > end for > > Use k=shl[k+1] when k is 0..255, to avoid zero index problems. > > You can do your own test/timings, though I got 3.9 million/sec on a > 233MHz PII with 48Mb ram, sorry Tommy, CNR ) Of course lookup tables are a lot faster. But the sequences above only work on 8-bit values between 0 and 255, and shift instead of rotate, and I think that jiri asked for integer (=32-bit) rotations. If you only need 8-bit rotations, you should definitely go for the pre-computed lookup sequences. Perhaps my machine-code version (which BTW only works on Intel-compatible microprocessors) could even be tweaked a little further: does anybody know if it's OK to replace PUSHA/POPA with PUSH EAX/POP EAX, if I only modify EAX? I think it would definitely be faster. > > Pete > PS No warranty on the above code, btw. -- Tommy Carlier tommy online: http://users.pandora.be/tommycarlier
8. Re: fast bit rotation
- Posted by "Lucius L. Hilley III" <L3Euphoria at bellsouth.net> Dec 31, 2003
- 452 views
Using a precomputed table is 2 of my examples. One arrangement of 7x256 another of 256x7. Except for the precomputed tables, any of my examples could be modified for 32-bit integers and extended to work on sequences to rotate all of the values within. Making the assumption that all values within the sequence are not floats. Lucius L. Hilley III ----- Original Message ----- From: "Pete Lomax" <petelomax at blueyonder.co.uk> To: <EUforum at topica.com> Sent: Wednesday, December 31, 2003 10:59 AM Subject: Re: fast bit rotation > > > Another thing you could try is to build tables of pre-computed shifts, > eg: > > sequence shl, shr > shl=repeat(0,256) > shr=repeat(0,256) > for i = 0 to 255 do > if i<=127 then > shl[i+1]=i*2 > else > shl[i+1]=and_bits(i,#7F)*2+1 > end if > shr[shl[i+1]+1]=i > end for > > Use k=shl[k+1] when k is 0..255, to avoid zero index problems. > > You can do your own test/timings, though I got 3.9 million/sec on a > 233MHz PII with 48Mb ram, sorry Tommy, CNR ) > > Pete > PS No warranty on the above code, btw. >
9. Re: fast bit rotation
- Posted by "Kat" <gertie at visionsix.com> Dec 31, 2003
- 445 views
On 31 Dec 2003, at 15:59, Pete Lomax wrote: > > > Another thing you could try is to build tables of pre-computed shifts, > eg: > > sequence shl, shr > shl=repeat(0,256) > shr=repeat(0,256) > for i = 0 to 255 do > if i<=127 then > shl[i+1]=i*2 > else > shl[i+1]=and_bits(i,#7F)*2+1 > end if > shr[shl[i+1]+1]=i > end for > > Use k=shl[k+1] when k is 0..255, to avoid zero index problems. > > You can do your own test/timings, though I got 3.9 million/sec on a > 233MHz PII with 48Mb ram, sorry Tommy, CNR ) But those are bytes, not 31 (or 32) bit full integers. When scaled up: sequence shl, shr atom time1, time2, int int = 65535 * 256 ---- 16776960 shl=repeat(0,65536*256) shr=repeat(0,65536*256) time1 = time() for i = 0 to int do if i<=127 then shl[i+1]=i*2 else shl[i+1]=and_bits(i,#7F)*2+1 end if shr[shl[i+1]+1]=i end for time2 = time() puts(1,sprintf("%d",time2-time1)) time1 = getc(0) i got 29 sec for that run, or 578,515 shifts per sec, on a K6-2-233 with 384 megs of ram. Good thing i had the ram, too, because the program used over 131 megabytes of ram. It used 87% of cpu clocks running on win95B. Kat
10. Re: fast bit rotation
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Dec 31, 2003
- 445 views
On Wed, 31 Dec 2003 17:41:14 +0100, Tommy Carlier <tommy.carlier at pandora.be> wrote: >Pete wrote: >But the sequences above only >work on 8-bit values between 0 and 255 Oops. That was rather stupid of me. Pete
11. Re: fast bit rotation
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Dec 31, 2003
- 451 views
On Wed, 31 Dec 2003 12:26:34 -0600, Kat <gertie at visionsix.com> wrote: >But those are bytes, not 31 (or 32) bit full integers. When scaled up: >shl=repeat(0,65536*256) Whoops, looks like some of my stupidity rubbed off: that's only 3 bytes or 24-bit integers, btw. I don't think you can buy a 32-bit machine with the needed 32GB memory, so don't bother trying it >i got 29 sec for that run, or 578,515 shifts per sec Yeah, it's not relevant now, but I was actually timing table lookup, not the initialise. I'll get me coat... and me dunce's hat. Pete
12. Re: fast bit rotation
- Posted by jiri babor <jbabor at paradise.net.nz> Dec 31, 2003
- 439 views
Thanks, guys, for your tremendous response. Tommy's machine code looks most promising. Currently I am looking at 24 bit integers, later I might even be forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and similar 8-bit solutions are interesting, but not of much practical use. And thank you, Robert, for your explanation. I still think any self-respecting language should have fast shifts and rotations built into it. jiri
13. Re: fast bit rotation
- Posted by "Derek Parnell" <ddparnell at bigpond.com> Jan 01, 2004
- 444 views
----- Original Message ----- From: "jiri babor" <jbabor at paradise.net.nz> To: <EUforum at topica.com> Subject: Re: fast bit rotation > > > Thanks, guys, for your tremendous response. Tommy's machine code looks most > promising. Currently I am looking at 24 bit integers, later I might even be > forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and > similar 8-bit solutions are interesting, but not of much practical use. Doesn't the INTEGER type only support 30-bit integers? I make the range for INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It seems that Euphoria uses 30-bits to hold the value, one bit to hold the sign and the 32nd bit to hold the flag to tell if its an integer or something else? Just guessing. > And thank you, Robert, for your explanation. I still think any > self-respecting language should have fast shifts and rotations built into > it. Why is this Jiri? I would have thought that they are only a few programming problems where bit shifting/rotating is of much use. Low-level device interfacing, cryptography, image/audio manipulating, low-RAM environments (8-bit embedded systems), ... What are you working on that needs it? -- Derek
14. Re: fast bit rotation
- Posted by jiri babor <jbabor at paradise.net.nz> Jan 01, 2004
- 452 views
Derek wrote: >Doesn't the INTEGER type only support 30-bit integers? I make the range for INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It seems that Euphoria uses 30-bits to hold the value, one bit to hold the sign and the 32nd bit to hold the flag to tell if its an integer or something else? Just guessing. I guess it's just semantics. Commonly we talk about 32-bit integers, not 31, even when all 32 bits are used to hold *signed* integers in just about any other language. >And thank you, Robert, for your explanation. I still think any self-respecting language should have fast shifts and rotations built into it. Why is this Jiri? I would have thought that they are only a few programming problems where bit shifting/rotating is of much use. Low-level device interfacing, cryptography, image/audio manipulating, low-RAM environments (8-bit embedded systems), ... I suppose it depends on your hobby. One of mine happens to be data compression, which you left out of your otherwise good account. In many algorithms, especially for lossless compression, you fiddle bits all the time. I know it's a well trodden turf, but still pretty interesting. >What are you working on that needs it? After a couple of years I resurrected my 'dream language' project that I am sketching in Euphoria. For that I need fast tables. In the past I used sorted tables with binary searches. Now I want to try hashed tables, and at the moment I am evaluating several faster hashing algorithms. Many of them use shifts and/or rotations to introduce some extra noise into the hashing functions to achieve more even bucket distribution. I have already one or two really interesting and reasonably flexible schemes going, probably good enough for general consumption - but only after a bit more testing... jiri
15. Re: fast bit rotation
- Posted by "Juergen Luethje" <j.lue at gmx.de> Jan 01, 2004
- 444 views
Derek wrote: > ----- Original Message ----- > From: "jiri babor" > >> Thanks, guys, for your tremendous response. Tommy's machine code looks most >> promising. Currently I am looking at 24 bit integers, later I might even be >> forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and >> similar 8-bit solutions are interesting, but not of much practical use. > > Doesn't the INTEGER type only support 30-bit integers? I make the range > for INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It > seems that Euphoria uses 30-bits to hold the value, one bit to hold the > sign I think the sign bit must be counted, too. So 30 bits + 1 bit = 31 bits. If we add power(2,30) to both the minimum value and the maximum value that you mentioned above, that does not alter the range. Then we get an unsigned integer type from 0 to (power(2,31)-1. Now it's pretty clear, that the range consists of 31 bit. Whether or not one bit is used to hold the sign, is a different question, IMHO. > and the 32nd bit to hold the flag to tell if its an integer or > something else? Just guessing. <snip> Regards, Juergen -- /"\ ASCII ribbon campain | Money is the root of all evil. \ / against HTML in | Send 20 Dollars for more info. X e-mail and news, | / \ and unneeded MIME | http://home.arcor.de/luethje/prog/
16. Re: fast bit rotation
- Posted by "Juergen Luethje" <j.lue at gmx.de> Jan 01, 2004
- 443 views
Tommy wrote: <snip> > Perhaps my machine-code version (which BTW only works on > Intel-compatible microprocessors) I think Euphoria itself "only" runs on Intel-compatible processors, no? > could even be tweaked a little further: > does anybody know if it's OK to replace PUSHA/POPA with PUSH EAX/POP EAX, > if I only modify EAX? I think it would definitely be faster. AFAIK it's not necessary to preserve the registers EAX, ECX and EDX. When you look at the demo program 'callmach.ex' (Euphoria 2.4), you'll see in Example #2, that RDS also uses EAX without preserving it. Your code only uses EAX, so I think it's not necessary to preserve any register at all. To try it out, just replace #60 and #61 with #90 (NOP) in your code. It works fine for me. BTW, Rob: Is there a typo in 'callmach.ex' Example #2, or do I misunderstand something? Concerning add_code(), it reads in the comment: "12 (4+8) (hex C) bytes are popped off the stack before returning", but the statement that you use is: "#C2, #08, #00 -- ret 8 -- pop 8 bytes off the stack" Regards, Juergen -- /"\ ASCII ribbon campain | This message has been ROT-13 encrypted \ / against HTML in | twice for higher security. X e-mail and news, | / \ and unneeded MIME | http://home.arcor.de/luethje/prog/
17. Re: fast bit rotation
- Posted by "Juergen Luethje" <j.lue at gmx.de> Jan 01, 2004
- 438 views
Tommy wrote: > ------- <code> > include machine.e > > constant rotate_left = allocate(15), rotate_left_param = rotate_left + 2 > poke(rotate_left, { > #60, -- 0: pusha > #B8,#00,#00,#00,#00, -- 1: mov eax, dword param (2) > #D1,#C0, -- 6: rol eax, 1 > #A3,#6A,#0A,#82,#82, -- 8: mov [@param], eax > #61, -- D: popa > #C3}) -- E: ret > poke4(rotate_left + 9, rotate_left_param) -- @param > > constant rotate_right = allocate(15), rotate_right_param = rotate_right + 2 > poke(rotate_right, { > #60, -- 0: pusha > #B8,#00,#00,#00,#00, -- 1: mov eax, dword param (2) > #D1,#C8, -- 6: ror eax, 1 > #A3,#6A,#3A,#C2,#83, -- 8: mov [@param], eax > #61, -- D: popa > #C3}) -- E: ret > poke4(rotate_right + 9, rotate_right_param) -- @param > > -- To use: > poke4(rotate_left_param, i) -- i = an integer > call(rotate_left) > i = peek4s(rotate_left_param) > > poke4(rotate_right_param, j) -- j = an integer > call(rotate_right) > j = peek4s(rotate_right_param) > > -- Before closing your program, you should call: > free(rotate_left) > free(rotate_right) > ------- </code> I do not think that it is necessary or important to free the allocated space for the code under normal circumstances: - When the program terminates, all allocated space is given back to the O/S anyhow. - The space that is occupied by normal Euphoria routines also is not freed during runtime of the program. Well, a machine-code routine could gain an advantage here, but that's not important unless the routine is big. - If the machine-code routine is defined inside an include file, the statement for freeing the code space is inside the main program file. IMHO this is not elegant coding style. These are just some remarks, so that nobody thinks that I've _forgotten_ to free the space in my code below. > Perhaps there are faster methods than this, but on my laptop (Pentium II > 433MHz, 64MB RAM), I can do 10_000_000 (ten million) rotations in 19.11 > seconds, which is more than 500_000 rotations per second. In Euphoria 2.4, it's now possible to define the parameters and return value for a machine-code routine. I like it, because in my opinion it's more elegant than using the "old" method, but I don't know which method is faster. Inspired by your code, I modified Example #2 of the demo program 'callmach.ex' by RDS: -----------------------------[ begin code ]----------------------------- -- needs Euphoria 2.4 or newer include machine.e include dll.e -- Define the function ROTATE_LEFT(); -- use define_c_func() to allow parameters to be passed to machine code. constant ROL_CODE = { -- first int argument is at stack offset +4, 2nd int is at +8 #8B, #44, #24, #04, -- mov eax, [esp+4] #8A, #4C, #24, #08, -- mov cl, [esp+8] #D3, #C0, -- rol eax, cl #C2, #08, #00 -- ret 8 -- pop 8 bytes off the stack }, ROL_SPACE = allocate(length(ROL_CODE)) poke(ROL_SPACE, ROL_CODE) global constant ROTATE_LEFT = define_c_func("", ROL_SPACE, {C_INT, C_INT}, C_INT) -- Demo integer n, count, res n = 9 count = 2 res = c_func(ROTATE_LEFT, {n, count}) printf(1, "The result of 'ROTATE_LEFT(%d,%d)' is: %d\n", {n, count, res}) ------------------------------[ end code ]------------------------------ It seems to work, but there is no warranty on the above code. I have some experience with writing 16 bit ASM code, but unfortunately not too much regarding 32 bit ASM code. Especially I'm not sure, whether the statement "mov cl, [esp+8]" is correct. Maybe it should be "mov ecx, [esp+8]" or something? Concerning preserving the registers see my other post, please. BTW: Who is the maintainer of the *very great* Euphoria assembler, Pete Eberlein or Mic? It seems that the assembler does not understand a command such as 'ret 8'. It would be very nice, if you could improve the assembler in this regard. Regards, Juergen -- /"\ ASCII ribbon campain | This message has been ROT-13 encrypted \ / against HTML in | twice for higher security. X e-mail and news, | / \ and unneeded MIME | http://home.arcor.de/luethje/prog/
18. Re: fast bit rotation
- Posted by Rolf =?iso-8859-1?Q?Schr=F6der?= <rolf at rschr.de> Jan 01, 2004
- 430 views
This is a multi-part message in MIME format. --------------A472BA832284E0E9474EDCC4 I programmed 'true euphorian' rotation functions rotl(objekt x) and rotr(object x), see include file ROTATION.E and test file ROTATION.EXW of attached zip file. To run it under DOS, rename *.EXW to *.EX. On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW 1.91 micro seconds for rotr() and 1.63 micro seconds for rotl() per call. Rolf --------------A472BA832284E0E9474EDCC4 Content-Type: application/x-zip-compressed; name="Rotation.zip"
19. Re: fast bit rotation
- Posted by "Juergen Luethje" <j.lue at gmx.de> Jan 01, 2004
- 433 views
Rolf wrote: > I programmed 'true euphorian' rotation functions rotl(objekt x) and > rotr(object x), see include file ROTATION.E and test file ROTATION.EXW > of attached zip file. To run it under DOS, rename *.EXW to *.EX. > > On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW > 1.91 micro seconds for rotr() and > 1.63 micro seconds for rotl() per call. You can even gain speed by omitting 'pusha' and 'popa'. I like the idea of wrapping the ASM functions by 'true euphorian' functions, especially in order to be able to apply them to any Euphoria object. However, the main disadvantage of Tommy's routines IMHO is the point, that they use a fixed count of '1' for any rotation. A versatile library routine IMHO should allow the user to specify the count as parameter, so that it would look somehow like: function rotate_left(value, count) Regards, Juergen -- /"\ ASCII ribbon campain | This message has been ROT-13 encrypted \ / against HTML in | twice for higher security. X e-mail and news, | / \ and unneeded MIME | http://home.arcor.de/luethje/prog/
20. Re: fast bit rotation
- Posted by "Juergen Luethje" <j.lue at gmx.de> Jan 01, 2004
- 433 views
Me wrote: <big snip> > -----------------------------[ begin code ]----------------------------- > -- needs Euphoria 2.4 or newer > > include machine.e > include dll.e > > -- Define the function ROTATE_LEFT(); > -- use define_c_func() to allow parameters to be passed to machine code. > constant > ROL_CODE = { > -- first int argument is at stack offset +4, 2nd int is at +8 > #8B, #44, #24, #04, -- mov eax, [esp+4] > #8A, #4C, #24, #08, -- mov cl, [esp+8] > #D3, #C0, -- rol eax, cl > #C2, #08, #00 -- ret 8 -- pop 8 bytes off the stack > }, > ROL_SPACE = allocate(length(ROL_CODE)) > > poke(ROL_SPACE, ROL_CODE) > > global constant > ROTATE_LEFT = define_c_func("", ROL_SPACE, {C_INT, C_INT}, C_INT) > > > -- Demo > integer n, count, res > > n = 9 > count = 2 > res = c_func(ROTATE_LEFT, {n, count}) > printf(1, "The result of 'ROTATE_LEFT(%d,%d)' is: %d\n", {n, count, res}) > ------------------------------[ end code ]------------------------------ > > It seems to work, but there is no warranty on the above code. > I have some experience with writing 16 bit ASM code, but unfortunately > not too much regarding 32 bit ASM code. > Especially I'm not sure, whether the statement "mov cl, [esp+8]" is > correct. I know now, that it is correct. Regards, Juergen -- /"\ ASCII ribbon campain | This message has been ROT-13 encrypted \ / against HTML in | twice for higher security. X e-mail and news, | / \ and unneeded MIME | http://home.arcor.de/luethje/prog/
21. Re: fast bit rotation
- Posted by Rolf =?iso-8859-1?Q?Schr=F6der?= <rolf at rschr.de> Jan 01, 2004
- 444 views
Juergen Luethje wrote: > However, the main disadvantage of Tommy's routines IMHO is the point, > that they use a fixed count of '1' for any rotation. A versatile library > routine IMHO should allow the user to specify the count as parameter, so > that it would look somehow like: > function rotate_left(value, count) Juergen, I do agree, but that slows down the speed considerably. By the way, do you know a web resource for looking up 386 machine instructions? I don't have a book here. Rolf
22. Re: fast bit rotation
- Posted by "Elliott S. de Andrade" <quantum_analyst at hotmail.com> Jan 01, 2004
- 423 views
>From: Rolf Schr=F6der <rolf at rschr.de> >Subject: Re: fast bit rotation > >Juergen Luethje wrote: > > > However, the main disadvantage of Tommy's routines IMHO is the point, > > that they use a fixed count of '1' for any rotation. A versatile librar= y > > routine IMHO should allow the user to specify the count as parameter, s= o > > that it would look somehow like: > > function rotate_left(value, count) > >Juergen, I do agree, but that slows down the speed considerably. By the >way, do you know a web resource for looking up 386 machine instructions? >I don't have a book here. > >Rolf > Try something like sandpile http://www.sandpile.org/ or even the Intel= =20 website.
23. Re: fast bit rotation
- Posted by Robert Craig <rds at RapidEuphoria.com> Jan 02, 2004
- 448 views
Juergen Luethje wrote: > BTW, Rob: > Is there a typo in 'callmach.ex' Example #2, or do I misunderstand > something? > > Concerning add_code(), it reads in the comment: > "12 (4+8) (hex C) bytes are popped off the stack before returning", > but the statement that you use is: > "#C2, #08, #00 -- ret 8 -- pop 8 bytes off the stack" Yes. Delete the comment about 12 (4+8) bytes. I think it was from an earlier version of the code. Thanks, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
24. Re: fast bit rotation
- Posted by "Juergen Luethje" <j.lue at gmx.de> Jan 02, 2004
- 452 views
Tommy wrote: > Juergen Luethje wrote: >> >> Rolf wrote: >> >>> I programmed 'true euphorian' rotation functions rotl(objekt x) and >>> rotr(object x), see include file ROTATION.E and test file ROTATION.EXW >>> of attached zip file. To run it under DOS, rename *.EXW to *.EX. >>> >>> On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW >>> 1.91 micro seconds for rotr() and >>> 1.63 micro seconds for rotl() per call. >> >> You can even gain speed by omitting 'pusha' and 'popa'. >> >> I like the idea of wrapping the ASM functions by 'true euphorian' >> functions, especially in order to be able to apply them to any Euphoria >> object. >> >> However, the main disadvantage of Tommy's routines IMHO is the point, >> that they use a fixed count of '1' for any rotation. A versatile library >> routine IMHO should allow the user to specify the count as parameter, so >> that it would look somehow like: >> function rotate_left(value, count) > > Of course it only rotates a fixed count of '1': I didn't mean to create > a versatile library routine, just the fastest possible solution to > jiri's problem, which was integer rotation (left or right) of fixed > count 1 (I think :D). Sorry, I didn't want to offend you or something. I thought Jiri wanted some versatile library routines. > And I believe that if you want the highest > performance possible, my routine (no real Euphoria-function) is perhaps > a bit faster: no stack popping, just 1 poke, 2 moves, a rotation, a > return and 1 peek. > Haven't compared the performance of the 2 routines, but I don't need to. > I know they both perform very well and I also know that maintainability, > ease of use and readable code are much more important than that tiny bit > of extra performance. > So Rolf and Juergen, your routines are in general much better than what > I pulled together in 5 minutes. Please don't forget: I was somehow inspired by your code, and Rolf's routines use yours as a basis. > BTW: wouldn't it be nice if someone made a library of such very general > high-performance routines? I already started modifying your ASM routines, so that they take 'count' as a second parameter. Then I'll wrap them by pure euphoria functions, in order to be able to apply them to any Euphoria object -- very similar to what Rolf did. I'll do the same with my ASM routines, compare the speed, make some tweaks here and there ... I also plan do write some SHIFT functions. > Or perhaps Robert could add some of those > routines to machine.e or one of the general Euphoria-libraries? If you > see how many people could use shifts, rotates and other routines like > that. That would be nice. <snip> Regards, Juergen -- /"\ ASCII ribbon campain | This message has been ROT-13 encrypted \ / against HTML in | twice for higher security. X e-mail and news, | / \ and unneeded MIME | http://home.arcor.de/luethje/prog/
25. Re: fast bit rotation
- Posted by Robert Craig <rds at RapidEuphoria.com> Jan 02, 2004
- 455 views
Tommy Carlier wrote: > Robert, couldn't you consider making get_bytes a > low-level, builtin routine to dramatically improve its performance? I > think get_bytes is a really standard routine that is probably quite > frequently used. Yes, I'll consider it, but I don't think it would "dramatically" improve performance. Thanks, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com