1. fast bit rotation

Happy New Year to you all - it's already 2004 (1.40 am) here in the Godzone!

I am appealing to all speed merchants and optimization wizards (and
witches?,  for Kat's sake): I need a really fast *single bit* rotation of an
integer, in either direction. Please...

jiri

new topic     » topic index » view message » categorize

2. Re: fast bit rotation

jiri babor wrote:

> Happy New Year to you all - it's already 2004 (1.40 am) here in the
 > Godzone!

Blimey. I have a whole ten-and-a-bit hours to go yet.

> I am appealing to all speed merchants and optimization wizards (and
> witches?,  for Kat's sake): I need a really fast *single bit* rotation
> of an integer, in either direction. Please...

I'm not sure there's a fast native way. How about:

function rotate_left_1(integer n, integer bitwidth)
     atom mask mask = power(2,bitwidth)
     return and_bits(n*2,mask-1)+(0!=and_bits(n,floor(mask/2)))
end function

function rotate_right_1(integer n, integer bitwidth)
     atom mask mask = power(2,bitwidth)
     return and_bits(floor(n/2),mask-1)+floor(mask/2)*and_bits(n,1)
end function

They simplify right down if you have a fixed bit width. e.g.:

-- for 8 bits width:
function rotate8_left_1(integer n)
     return and_bits(n*2,255)+(0!=and_bits(n,128))
end function

function rotate8_right_1(integer n, integer bitwidth)
     return and_bits(floor(n/2),255)+128*and_bits(n,1)
end function

Carl

-- 
[ Carl R White == aka () = The Domain of Cyrek = ]
[ Cyrek the Illogical /\ www.cyreksoft.yorks.com ]

new topic     » goto parent     » topic index » view message » categorize

3. Re: fast bit rotation

Thanks, Carl, I'll play with it. jiri


----- Original Message ----- 
From: "Carl W." <euphoria at cyreksoft.yorks.com>
To: <EUforum at topica.com>
Sent: 01 January 2004 2:14 AM
Subject: Re: fast bit rotation


> 
> 
> jiri babor wrote:
> 
> > Happy New Year to you all - it's already 2004 (1.40 am) here in the
>  > Godzone!
> 
> Blimey. I have a whole ten-and-a-bit hours to go yet.
> 
> > I am appealing to all speed merchants and optimization wizards (and
> > witches?,  for Kat's sake): I need a really fast *single bit* rotation
> > of an integer, in either direction. Please...
> 
> I'm not sure there's a fast native way. How about:
> 
> function rotate_left_1(integer n, integer bitwidth)
>      atom mask mask = power(2,bitwidth)
>      return and_bits(n*2,mask-1)+(0!=and_bits(n,floor(mask/2)))
> end function
> 
> function rotate_right_1(integer n, integer bitwidth)
>      atom mask mask = power(2,bitwidth)
>      return and_bits(floor(n/2),mask-1)+floor(mask/2)*and_bits(n,1)
> end function
> 
> They simplify right down if you have a fixed bit width. e.g.:
> 
> -- for 8 bits width:
> function rotate8_left_1(integer n)
>      return and_bits(n*2,255)+(0!=and_bits(n,128))
> end function
> 
> function rotate8_right_1(integer n, integer bitwidth)
>      return and_bits(floor(n/2),255)+128*and_bits(n,1)
> end function
> 
> Carl
> 
> -- 
> [ Carl R White == aka () = The Domain of Cyrek = ]
> [ Cyrek the Illogical /\ www.cyreksoft.yorks.com ]

new topic     » goto parent     » topic index » view message » categorize

4. Re: fast bit rotation

------- <code>
include machine.e

constant rotate_left = allocate(15), rotate_left_param = rotate_left + 2
poke(rotate_left, {
     #60,                    --    0: pusha
     #B8,#00,#00,#00,#00,    --    1: mov eax, dword param (2)
     #D1,#C0,                --    6: rol eax, 1
     #A3,#6A,#0A,#82,#82,    --    8: mov [@param], eax
     #61,                    --    D: popa
     #C3})                   --    E: ret
poke4(rotate_left + 9, rotate_left_param) -- @param

constant rotate_right = allocate(15), rotate_right_param = rotate_right + 2
poke(rotate_right, {
     #60,                    --    0: pusha
     #B8,#00,#00,#00,#00,    --    1: mov eax, dword param (2)
     #D1,#C8,                --    6: ror eax, 1
     #A3,#6A,#3A,#C2,#83,    --    8: mov [@param], eax
     #61,                    --    D: popa
     #C3})                   --    E: ret
poke4(rotate_right + 9, rotate_right_param) -- @param

-- To use:
poke4(rotate_left_param, i) -- i = an integer
call(rotate_left)
i = peek4s(rotate_left_param)

poke4(rotate_right_param, j) -- j = an integer
call(rotate_right)
j = peek4s(rotate_right_param)

-- Before closing your program, you should call:
free(rotate_left)
free(rotate_right)
------- </code>

Perhaps there are faster methods than this, but on my laptop (Pentium II 
433MHz, 64MB RAM), I can do 10_000_000 (ten million) rotations in 19.11 
seconds, which is more than 500_000 rotations per second.
-- 

Tommy Carlier
tommy online: http://users.pandora.be/tommycarlier

new topic     » goto parent     » topic index » view message » categorize

5. Re: fast bit rotation

hello jiri,

    Do you want object operations possible?
example:
sequence s
s = {34, 234, 12, {105}, ...} -- set of integers.  could be any depth.
left_rotate_all(s, 5) -- rotate each integer by 5?

What I have below is without object operations.
If you want object operations then it looks like a modification of l4()
would be the best fit.  Others are welcome to out do me.

I only did left rotations below... rights would be similar or
you could simply do inverse left instead.
Example:  shift right 1 = shift left 8-1 or 7

-----------------------------------
-- 8-bit, Bit rotation to the left.
function l1(integer i, integer r)
  integer result

  result = i
  for A = 1 to r do
    result *= 2
    if (result > 255) then
      result -= 255
    end if
  end for

  return result
end function

function l2(integer i, integer r)
  integer result

  result = i
  for A = 1 to r do
    result *= 2
  end for

  if (result > 255) then
    result += (and_bits(result, #FF00)/#100)
    result = and_bits(result, #FF)
  end if

  return result
end function

function l3(integer i, integer r)
  integer result

  result = i * power(2, r)

  if (result > 255) then
    result += (and_bits(result, #FF00)/#100)
    result = and_bits(result, #FF)
  end if

  return result
end function

sequence rotation_table

rotation_table = repeat(repeat(0, 7), 256)

for i = 0 to 255 do
  for r = 1 to 7 do
    rotation_table[i+1][r] = l1(i, r)
  end for
end for

constant
  POWER_TABLE = {2, 4, 8, 16, 32, 64, 128, 256},
  ROTATION_TABLE = rotation_table

rotation_table = repeat(repeat(0, 256), 7)

for i = 0 to 255 do
  for r = 1 to 7 do
    rotation_table[r][i+1] = l1(i, r)
  end for
end for

constant
  ROTATION_TABLE2 = rotation_table

function l4(integer i, integer r)
  integer result

  result = i * POWER_TABLE[r]

  if (result > 255) then
    result += (and_bits(result, #FF00)/#100)
    result = and_bits(result, #FF)
  end if

  return result
end function

function l5(integer i, integer r)
  return ROTATION_TABLE[i+1][r]
end function

function l6(integer i, integer r)
  return ROTATION_TABLE2[r][i+1]
end function

-----------------------
atom t
integer j, c, d, l

l = 0
d = 5
c = 0
t = time() + d
while (t > time()) do
  for i = 0 to 255 do
    for A = 1 to 7 do
      j = l1(i, A)
    end for
  end for
  c += 1
end while
l += 1
? {l, c/d}

c = 0
t = time() + d
while (t > time()) do
  for i = 0 to 255 do
    for A = 1 to 7 do
      j = l2(i, A)
    end for
  end for
  c += 1
end while
l += 1
? {l, c/d}

c = 0
t = time() + d
while (t > time()) do
  for i = 0 to 255 do
    for A = 1 to 7 do
      j = l3(i, A)
    end for
  end for
  c += 1
end while
l += 1
? {l, c/d}

c = 0
t = time() + d
while (t > time()) do
  for i = 0 to 255 do
    for A = 1 to 7 do
      j = l4(i, A)
    end for
  end for
  c += 1
end while
l += 1
? {l, c/d}

c = 0
t = time() + d
while (t > time()) do
  for i = 0 to 255 do
    for A = 1 to 7 do
      j = l5(i, A)
    end for
  end for
  c += 1
end while
l += 1
? {l, c/d}

c = 0
t = time() + d
while (t > time()) do
  for i = 0 to 255 do
    for A = 1 to 7 do
      j = l6(i, A)
    end for
  end for
  c += 1
end while
l += 1
? {l, c/d}

machine_proc(26, 1)
-------------------------------------

        Lucius L. Hilley III

----- Original Message ----- 
From: "jiri babor" <jbabor at paradise.net.nz>
To: <EUforum at topica.com>
Sent: Wednesday, December 31, 2003 07:41 AM
Subject: fast bit rotation


>
>
> Happy New Year to you all - it's already 2004 (1.40 am) here in the
Godzone!
>
> I am appealing to all speed merchants and optimization wizards (and
> witches?,  for Kat's sake): I need a really fast *single bit* rotation of
an
> integer, in either direction. Please...
>
> jiri
>
>
>
> TOPICA - Start your own email discussion group. FREE!
>
>

new topic     » goto parent     » topic index » view message » categorize

6. Re: fast bit rotation

Another thing you could try is to build tables of pre-computed shifts,
eg:

sequence shl, shr
shl=repeat(0,256)
shr=repeat(0,256)
for i = 0 to 255 do
	if i<=127 then
		shl[i+1]=i*2
	else
		shl[i+1]=and_bits(i,#7F)*2+1
	end if
	shr[shl[i+1]+1]=i
end for

Use k=shl[k+1] when k is 0..255, to avoid zero index problems.

You can do your own test/timings, though I got 3.9 million/sec on a
233MHz PII with 48Mb ram, sorry Tommy, CNR blink)

Pete
PS No warranty on the above code, btw.

new topic     » goto parent     » topic index » view message » categorize

7. Re: fast bit rotation

Pete wrote:
> Another thing you could try is to build tables of pre-computed shifts,
> eg:
>
> sequence shl, shr
> shl=repeat(0,256)
> shr=repeat(0,256)
> for i = 0 to 255 do
> if i<=127 then
> shl[i+1]=i*2
> else
> shl[i+1]=and_bits(i,#7F)*2+1
> end if
> shr[shl[i+1]+1]=i
> end for
>
> Use k=shl[k+1] when k is 0..255, to avoid zero index problems.
>
> You can do your own test/timings, though I got 3.9 million/sec on a
> 233MHz PII with 48Mb ram, sorry Tommy, CNR blink)

Of course lookup tables are a lot faster. But the sequences above only 
work on 8-bit values between 0 and 255, and shift instead of rotate, and I 
think that jiri asked for integer (=32-bit) rotations. If you only need 
8-bit rotations, you should definitely go for the pre-computed lookup 
sequences. Perhaps my machine-code version (which BTW only works on 
Intel-compatible microprocessors) could even be tweaked a little further: 
does anybody know if it's OK to replace PUSHA/POPA with PUSH EAX/POP EAX, 
if I only modify EAX? I think it would definitely be faster.

>
> Pete
> PS No warranty on the above code, btw.

-- 

Tommy Carlier
tommy online: http://users.pandora.be/tommycarlier

new topic     » goto parent     » topic index » view message » categorize

8. Re: fast bit rotation

Using a precomputed table is 2 of my examples.
One arrangement of 7x256 another of 256x7.

Except for the precomputed tables, any of my examples could
be modified for 32-bit integers and extended to work on sequences
to rotate all of the values within.  Making the assumption that
all values within the sequence are not floats.

        Lucius L. Hilley III

----- Original Message ----- 
From: "Pete Lomax" <petelomax at blueyonder.co.uk>
To: <EUforum at topica.com>
Sent: Wednesday, December 31, 2003 10:59 AM
Subject: Re: fast bit rotation


> 
> 
> Another thing you could try is to build tables of pre-computed shifts,
> eg:
> 
> sequence shl, shr
> shl=repeat(0,256)
> shr=repeat(0,256)
> for i = 0 to 255 do
> if i<=127 then
> shl[i+1]=i*2
> else
> shl[i+1]=and_bits(i,#7F)*2+1
> end if
> shr[shl[i+1]+1]=i
> end for
> 
> Use k=shl[k+1] when k is 0..255, to avoid zero index problems.
> 
> You can do your own test/timings, though I got 3.9 million/sec on a
> 233MHz PII with 48Mb ram, sorry Tommy, CNR blink)
> 
> Pete
> PS No warranty on the above code, btw.
>

new topic     » goto parent     » topic index » view message » categorize

9. Re: fast bit rotation

On 31 Dec 2003, at 15:59, Pete Lomax wrote:

> 
> 
> Another thing you could try is to build tables of pre-computed shifts,
> eg:
> 
> sequence shl, shr
> shl=repeat(0,256)
> shr=repeat(0,256)
> for i = 0 to 255 do
>  if i<=127 then
>   shl[i+1]=i*2
>  else
>   shl[i+1]=and_bits(i,#7F)*2+1
>  end if
>  shr[shl[i+1]+1]=i
> end for
> 
> Use k=shl[k+1] when k is 0..255, to avoid zero index problems.
> 
> You can do your own test/timings, though I got 3.9 million/sec on a
> 233MHz PII with 48Mb ram, sorry Tommy, CNR blink)

But those are bytes, not 31 (or 32) bit full integers. When scaled up:

sequence shl, shr
atom time1, time2, int
int = 65535 * 256 ---- 16776960
shl=repeat(0,65536*256)
shr=repeat(0,65536*256)

time1 = time()
for i = 0 to int do
 if i<=127 then
  shl[i+1]=i*2
 else
  shl[i+1]=and_bits(i,#7F)*2+1
 end if
 shr[shl[i+1]+1]=i
end for
time2 = time()

puts(1,sprintf("%d",time2-time1))
time1 = getc(0)

i got 29 sec for that run, or 578,515 shifts per sec, on a K6-2-233 with 384 
megs of ram. Good thing i had the ram, too, because the program used over 
131 megabytes of ram. It used 87% of cpu clocks running on win95B.

Kat

new topic     » goto parent     » topic index » view message » categorize

10. Re: fast bit rotation

On Wed, 31 Dec 2003 17:41:14 +0100, Tommy Carlier
<tommy.carlier at pandora.be> wrote:

>Pete wrote:
>But the sequences above only 
>work on 8-bit values between 0 and 255
Oops. That was rather stupid of me.

Pete

new topic     » goto parent     » topic index » view message » categorize

11. Re: fast bit rotation

On Wed, 31 Dec 2003 12:26:34 -0600, Kat <gertie at visionsix.com> wrote:

>But those are bytes, not 31 (or 32) bit full integers. When scaled up:
>shl=repeat(0,65536*256)
Whoops, looks like some of my stupidity rubbed off: that's only 3
bytes or 24-bit integers, btw. I don't think you can buy a 32-bit
machine with the needed 32GB memory, so don't bother trying it blink
>i got 29 sec for that run, or 578,515 shifts per sec
Yeah, it's not relevant now, but I was actually timing table lookup,
not the initialise.

I'll get me coat... and me dunce's hat.
Pete

new topic     » goto parent     » topic index » view message » categorize

12. Re: fast bit rotation

Thanks, guys, for your tremendous response. Tommy's machine code looks most
promising. Currently I am looking at 24 bit integers, later I might even be
forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and
similar 8-bit solutions are interesting, but not of much practical use.
And thank you, Robert, for your explanation. I still think any
self-respecting language should have fast shifts and rotations built into
it.

jiri

new topic     » goto parent     » topic index » view message » categorize

13. Re: fast bit rotation

----- Original Message ----- 
From: "jiri babor" <jbabor at paradise.net.nz>
To: <EUforum at topica.com>
Subject: Re: fast bit rotation


> 
> 
> Thanks, guys, for your tremendous response. Tommy's machine code looks most
> promising. Currently I am looking at 24 bit integers, later I might even be
> forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and
> similar 8-bit solutions are interesting, but not of much practical use.

Doesn't the INTEGER type only support 30-bit integers? I make the range for
INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It seems that
Euphoria uses 30-bits to hold the value, one bit to hold the sign and the 32nd
bit to hold the flag to tell if its an integer or something else? Just guessing.

> And thank you, Robert, for your explanation. I still think any
> self-respecting language should have fast shifts and rotations built into
> it.

Why is this Jiri? I would have thought that they are only a few programming
problems where bit shifting/rotating is of much use. Low-level device
interfacing, cryptography, image/audio manipulating, low-RAM environments (8-bit
embedded systems), ... What are you working on that needs it?

-- 
Derek

new topic     » goto parent     » topic index » view message » categorize

14. Re: fast bit rotation

Derek wrote:

>Doesn't the INTEGER type only support 30-bit integers? I make the range for
INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It seems that
Euphoria uses 30-bits to hold the value, one bit to hold the sign and the
32nd bit to hold the flag to tell if its an integer or something else? Just
guessing.

I guess it's just semantics. Commonly we talk about 32-bit integers, not 31,
even when all 32 bits are used to hold *signed* integers in just about any
other language.

>And thank you, Robert, for your explanation. I still think any
self-respecting language should have fast shifts and rotations built into
it.

Why is this Jiri? I would have thought that they are only a few programming
problems where bit shifting/rotating is of much use. Low-level device
interfacing, cryptography, image/audio manipulating, low-RAM environments
(8-bit embedded systems), ...

I suppose it depends on your hobby. One of mine happens to be data
compression, which you left out of your otherwise good account. In many
algorithms, especially for lossless compression, you fiddle bits all the
time. I know it's a well trodden turf, but still pretty interesting.

>What are you working on that needs it?

After a couple of years I resurrected my 'dream language' project that I am
sketching in Euphoria. For that I need fast tables. In the past I used
sorted tables with binary searches. Now I want to try hashed tables, and at
the moment I am evaluating several faster hashing algorithms. Many of them
use shifts and/or rotations to introduce some extra noise into the hashing
functions to achieve more even bucket distribution. I have already one or
two really interesting and reasonably flexible schemes going, probably good
enough for general consumption - but only after a bit more testing...

jiri

new topic     » goto parent     » topic index » view message » categorize

15. Re: fast bit rotation

Derek wrote:

> ----- Original Message ----- 
> From: "jiri babor"
>
>> Thanks, guys, for your tremendous response. Tommy's machine code looks most
>> promising. Currently I am looking at 24 bit integers, later I might even be
>> forced beyond the Euphoria's 31-bit limit. So, sorry, look up tables and
>> similar 8-bit solutions are interesting, but not of much practical use.
>
> Doesn't the INTEGER type only support 30-bit integers? I make the range
> for INTEGER types to be from (-power(2,30)) to (power(2,30) - 1). It
> seems that Euphoria uses 30-bits to hold the value, one bit to hold the
> sign

I think the sign bit must be counted, too. So 30 bits + 1 bit = 31 bits.
If we add power(2,30) to both the minimum value and the maximum value
that you mentioned above, that does not alter the range. Then we get an
unsigned integer type from 0 to (power(2,31)-1.
Now it's pretty clear, that the range consists of 31 bit. Whether or not
one bit is used to hold the sign, is a different question, IMHO.

> and the 32nd bit to hold the flag to tell if its an integer or
> something else? Just guessing.

<snip>

Regards,
   Juergen

-- 
 /"\  ASCII ribbon campain  |  Money is the root of all evil.
 \ /  against HTML in       |  Send 20 Dollars for more info.
  X   e-mail and news,      |
 / \  and unneeded MIME     |  http://home.arcor.de/luethje/prog/

new topic     » goto parent     » topic index » view message » categorize

16. Re: fast bit rotation

Tommy wrote:

<snip>

> Perhaps my machine-code version (which BTW only works on
> Intel-compatible microprocessors)

I think Euphoria itself "only" runs on Intel-compatible processors, no?

> could even be tweaked a little further:
> does anybody know if it's OK to replace PUSHA/POPA with PUSH EAX/POP EAX,
> if I only modify EAX? I think it would definitely be faster.

AFAIK it's not necessary to preserve the registers EAX, ECX and EDX.
When you look at the demo program 'callmach.ex' (Euphoria 2.4), you'll
see in Example #2, that RDS also uses EAX without preserving it.
Your code only uses EAX, so I think it's not necessary to preserve any
register at all.
To try it out, just replace #60 and #61 with #90 (NOP) in your code.
It works fine for me.


BTW, Rob:
Is there a typo in 'callmach.ex' Example #2, or do I misunderstand
something?

Concerning add_code(), it reads in the comment:
"12 (4+8) (hex C) bytes are popped off the stack before returning",
but the statement that you use is:
"#C2, #08, #00              -- ret 8  -- pop 8 bytes off the stack"


Regards,
   Juergen

-- 
 /"\  ASCII ribbon campain  |  This message has been ROT-13 encrypted
 \ /  against HTML in       |  twice for higher security.
  X   e-mail and news,      |
 / \  and unneeded MIME     |  http://home.arcor.de/luethje/prog/

new topic     » goto parent     » topic index » view message » categorize

17. Re: fast bit rotation

Tommy wrote:

> ------- <code>
> include machine.e
>
> constant rotate_left = allocate(15), rotate_left_param = rotate_left + 2
> poke(rotate_left, {
>      #60,                    --    0: pusha
>      #B8,#00,#00,#00,#00,    --    1: mov eax, dword param (2)
>      #D1,#C0,                --    6: rol eax, 1
>      #A3,#6A,#0A,#82,#82,    --    8: mov [@param], eax
>      #61,                    --    D: popa
>      #C3})                   --    E: ret
> poke4(rotate_left + 9, rotate_left_param) -- @param
>
> constant rotate_right = allocate(15), rotate_right_param = rotate_right + 2
> poke(rotate_right, {
>      #60,                    --    0: pusha
>      #B8,#00,#00,#00,#00,    --    1: mov eax, dword param (2)
>      #D1,#C8,                --    6: ror eax, 1
>      #A3,#6A,#3A,#C2,#83,    --    8: mov [@param], eax
>      #61,                    --    D: popa
>      #C3})                   --    E: ret
> poke4(rotate_right + 9, rotate_right_param) -- @param
>
> -- To use:
> poke4(rotate_left_param, i) -- i = an integer
> call(rotate_left)
> i = peek4s(rotate_left_param)
>
> poke4(rotate_right_param, j) -- j = an integer
> call(rotate_right)
> j = peek4s(rotate_right_param)
>
> -- Before closing your program, you should call:
> free(rotate_left)
> free(rotate_right)
> ------- </code>

I do not think that it is necessary or important to free the allocated
space for the code under normal circumstances:
- When the program terminates, all allocated space is given back to the
  O/S anyhow.
- The space that is occupied by normal Euphoria routines also is not
  freed during runtime of the program.
  Well, a machine-code routine could gain an advantage here, but that's
  not important unless the routine is big.
- If the machine-code routine is defined inside an include file, the
  statement for freeing the code space is inside the main program file.
  IMHO this is not elegant coding style.

These are just some remarks, so that nobody thinks that I've _forgotten_
to free the space in my code below. smile

> Perhaps there are faster methods than this, but on my laptop (Pentium II
> 433MHz, 64MB RAM), I can do 10_000_000 (ten million) rotations in 19.11
> seconds, which is more than 500_000 rotations per second.


In Euphoria 2.4, it's now possible to define the parameters and return
value for a machine-code routine. I like it, because in my opinion it's
more elegant than using the "old" method, but I don't know which method
is faster.

Inspired by your code, I modified Example #2 of the demo program
'callmach.ex' by RDS:

-----------------------------[ begin code ]-----------------------------
-- needs Euphoria 2.4 or newer

include machine.e
include dll.e

-- Define the function ROTATE_LEFT();
-- use define_c_func() to allow parameters to be passed to machine code.
constant
   ROL_CODE = {
       -- first int argument is at stack offset +4, 2nd int is at +8
      #8B, #44, #24, #04,      -- mov  eax, [esp+4]
      #8A, #4C, #24, #08,      -- mov  cl,  [esp+8]
      #D3, #C0,                -- rol  eax, cl
      #C2, #08, #00            -- ret 8  -- pop 8 bytes off the stack
   },
   ROL_SPACE = allocate(length(ROL_CODE))

poke(ROL_SPACE, ROL_CODE)

global constant
   ROTATE_LEFT = define_c_func("", ROL_SPACE, {C_INT, C_INT}, C_INT)


-- Demo
integer n, count, res

n = 9
count = 2
res = c_func(ROTATE_LEFT, {n, count})
printf(1, "The result of 'ROTATE_LEFT(%d,%d)' is: %d\n", {n, count, res})
------------------------------[ end code ]------------------------------

It seems to work, but there is no warranty on the above code. smile
I have some experience with writing 16 bit ASM code, but unfortunately
not too much regarding 32 bit ASM code.
Especially I'm not sure, whether the statement  "mov cl, [esp+8]" is
correct. Maybe it should be  "mov ecx, [esp+8]" or something?

Concerning preserving the registers see my other post, please.


BTW:
Who is the maintainer of the *very great* Euphoria assembler, Pete
Eberlein or Mic? It seems that the assembler does not understand a
command such as 'ret 8'.
It would be very nice, if you could improve the assembler in this regard.

Regards,
   Juergen

-- 
 /"\  ASCII ribbon campain  |  This message has been ROT-13 encrypted
 \ /  against HTML in       |  twice for higher security.
  X   e-mail and news,      |
 / \  and unneeded MIME     |  http://home.arcor.de/luethje/prog/

new topic     » goto parent     » topic index » view message » categorize

18. Re: fast bit rotation

This is a multi-part message in MIME format.
--------------A472BA832284E0E9474EDCC4

I programmed 'true euphorian' rotation functions rotl(objekt x) and
rotr(object x), see include file ROTATION.E and test file ROTATION.EXW
of attached zip file. To run it under DOS, rename *.EXW to *.EX.

On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW
1.91 micro seconds for rotr() and
1.63 micro seconds for rotl() per call.

Rolf
--------------A472BA832284E0E9474EDCC4
Content-Type: application/x-zip-compressed;
 name="Rotation.zip"

new topic     » goto parent     » topic index » view message » categorize

19. Re: fast bit rotation

Rolf wrote:

> I programmed 'true euphorian' rotation functions rotl(objekt x) and
> rotr(object x), see include file ROTATION.E and test file ROTATION.EXW
> of attached zip file. To run it under DOS, rename *.EXW to *.EX.
>
> On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW
> 1.91 micro seconds for rotr() and
> 1.63 micro seconds for rotl() per call.

You can even gain speed by omitting 'pusha' and 'popa'. smile

I like the idea of wrapping the ASM functions by 'true euphorian'
functions, especially in order to be able to apply them to any Euphoria
object.

However, the main disadvantage of Tommy's routines IMHO is the point,
that they use a fixed count of '1' for any rotation. A versatile library
routine IMHO should allow the user to specify the count as parameter, so
that it would look somehow like:
   function rotate_left(value, count)

Regards,
   Juergen

-- 
 /"\  ASCII ribbon campain  |  This message has been ROT-13 encrypted
 \ /  against HTML in       |  twice for higher security.
  X   e-mail and news,      |
 / \  and unneeded MIME     |  http://home.arcor.de/luethje/prog/

new topic     » goto parent     » topic index » view message » categorize

20. Re: fast bit rotation

Me wrote:

<big snip>

> -----------------------------[ begin code ]-----------------------------
> -- needs Euphoria 2.4 or newer
>
> include machine.e
> include dll.e
>
> -- Define the function ROTATE_LEFT();
> -- use define_c_func() to allow parameters to be passed to machine code.
> constant
>    ROL_CODE = {
>        -- first int argument is at stack offset +4, 2nd int is at +8
>       #8B, #44, #24, #04,      -- mov  eax, [esp+4]
>       #8A, #4C, #24, #08,      -- mov  cl,  [esp+8]
>       #D3, #C0,                -- rol  eax, cl
>       #C2, #08, #00            -- ret 8  -- pop 8 bytes off the stack
>    },
>    ROL_SPACE = allocate(length(ROL_CODE))
>
> poke(ROL_SPACE, ROL_CODE)
>
> global constant
>    ROTATE_LEFT = define_c_func("", ROL_SPACE, {C_INT, C_INT}, C_INT)
>
>
> -- Demo
> integer n, count, res
>
> n = 9
> count = 2
> res = c_func(ROTATE_LEFT, {n, count})
> printf(1, "The result of 'ROTATE_LEFT(%d,%d)' is: %d\n", {n, count, res})
> ------------------------------[ end code ]------------------------------
>
> It seems to work, but there is no warranty on the above code. smile
> I have some experience with writing 16 bit ASM code, but unfortunately
> not too much regarding 32 bit ASM code.
> Especially I'm not sure, whether the statement  "mov cl, [esp+8]" is
> correct.

I know now, that it is correct. smile

Regards,
   Juergen

-- 
 /"\  ASCII ribbon campain  |  This message has been ROT-13 encrypted
 \ /  against HTML in       |  twice for higher security.
  X   e-mail and news,      |
 / \  and unneeded MIME     |  http://home.arcor.de/luethje/prog/

new topic     » goto parent     » topic index » view message » categorize

21. Re: fast bit rotation

Juergen Luethje wrote:
 
> However, the main disadvantage of Tommy's routines IMHO is the point,
> that they use a fixed count of '1' for any rotation. A versatile library
> routine IMHO should allow the user to specify the count as parameter, so
> that it would look somehow like:
>    function rotate_left(value, count)

Juergen, I do agree, but that slows down the speed considerably. By the
way, do you know a web resource for looking up 386 machine instructions?
I don't have a book here.

Rolf

new topic     » goto parent     » topic index » view message » categorize

22. Re: fast bit rotation

>From: Rolf Schr=F6der <rolf at rschr.de>
>Subject: Re: fast bit rotation
>
>Juergen Luethje wrote:
>
> > However, the main disadvantage of Tommy's routines IMHO is the point,
> > that they use a fixed count of '1' for any rotation. A versatile librar=
y
> > routine IMHO should allow the user to specify the count as parameter, s=
o
> > that it would look somehow like:
> >    function rotate_left(value, count)
>
>Juergen, I do agree, but that slows down the speed considerably. By the
>way, do you know a web resource for looking up 386 machine instructions?
>I don't have a book here.
>
>Rolf
>
   Try something like sandpile http://www.sandpile.org/ or even the Intel=
=20
website.

new topic     » goto parent     » topic index » view message » categorize

23. Re: fast bit rotation

Juergen Luethje wrote:
> BTW, Rob:
> Is there a typo in 'callmach.ex' Example #2, or do I misunderstand
> something?
> 
> Concerning add_code(), it reads in the comment:
> "12 (4+8) (hex C) bytes are popped off the stack before returning",
> but the statement that you use is:
> "#C2, #08, #00              -- ret 8  -- pop 8 bytes off the stack"

Yes. Delete the comment about 12 (4+8) bytes.
I think it was from an earlier version of the code.

Thanks,
    Rob Craig
    Rapid Deployment Software
    http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

24. Re: fast bit rotation

Tommy wrote:

> Juergen Luethje wrote:
>>
>> Rolf wrote:
>>
>>> I programmed 'true euphorian' rotation functions rotl(objekt x) and
>>> rotr(object x), see include file ROTATION.E and test file ROTATION.EXW
>>> of attached zip file. To run it under DOS, rename *.EXW to *.EX.
>>>
>>> On my AMD K6-2, 400MHz I found (under NT 4.0) with ROTATION.EXW
>>> 1.91 micro seconds for rotr() and
>>> 1.63 micro seconds for rotl() per call.
>>
>> You can even gain speed by omitting 'pusha' and 'popa'. smile
>>
>> I like the idea of wrapping the ASM functions by 'true euphorian'
>> functions, especially in order to be able to apply them to any Euphoria
>> object.
>>
>> However, the main disadvantage of Tommy's routines IMHO is the point,
>> that they use a fixed count of '1' for any rotation. A versatile library
>> routine IMHO should allow the user to specify the count as parameter, so
>> that it would look somehow like:
>>    function rotate_left(value, count)
>
> Of course it only rotates a fixed count of '1': I didn't mean to create
> a versatile library routine, just the fastest possible solution to
> jiri's problem, which was integer rotation (left or right) of fixed
> count 1 (I think :D).

Sorry, I didn't want to offend you or something. I thought Jiri wanted
some versatile library routines.

> And I believe that if you want the highest
> performance possible, my routine (no real Euphoria-function) is perhaps
> a bit faster: no stack popping, just 1 poke, 2 moves, a rotation, a
> return and 1 peek.
> Haven't compared the performance of the 2 routines, but I don't need to.
> I know they both perform very well and I also know that maintainability,
> ease of use and readable code are much more important than that tiny bit
> of extra performance.
> So Rolf and Juergen, your routines are in general much better than what
> I pulled together in 5 minutes.

Please don't forget: I was somehow inspired by your code, and Rolf's
routines use yours as a basis. smile

> BTW: wouldn't it be nice if someone made a library of such very general
> high-performance routines?

I already started modifying your ASM routines, so that they take 'count'
as a second parameter. Then I'll wrap them by pure euphoria functions,
in order to be able to apply them to any Euphoria object -- very similar
to what Rolf did.
I'll do the same with my ASM routines, compare the speed, make some
tweaks here and there ...
I also plan do write some SHIFT functions.

> Or perhaps Robert could add some of those
> routines to machine.e or one of the general Euphoria-libraries? If you
> see how many people could use shifts, rotates and other routines like
> that.

That would be nice.

<snip>

Regards,
   Juergen

-- 
 /"\  ASCII ribbon campain  |  This message has been ROT-13 encrypted
 \ /  against HTML in       |  twice for higher security.
  X   e-mail and news,      |
 / \  and unneeded MIME     |  http://home.arcor.de/luethje/prog/

new topic     » goto parent     » topic index » view message » categorize

25. Re: fast bit rotation

Tommy Carlier wrote:
> Robert, couldn't you consider making get_bytes a 
> low-level, builtin routine to dramatically improve its performance? I 
> think get_bytes is a really standard routine that is probably quite 
> frequently used.

Yes, I'll consider it, but I don't think it
would "dramatically" improve performance.

Thanks,
    Rob Craig
    Rapid Deployment Software
    http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu