1. bottlenecks :)

What do you do when your bottlenecks are
peek()
poke()
and_bits()
or_bits()

I am doing peek() and poke() on a maximize
area of 320 bytes at a time.

I am doing and_bits and or_bits on 2-D sequences.
The max size of the 2-D sequences is 320x200.
similiar in shape as:
sequence seq
seq = repeat(repeat(0, 320), 200)

my and_bits routine is thus:
all sequences are 2-D SUCH AS:
{{1, 2}, {3, 4}}

and_bits({0's, 255's,...}, {0 to 255,...})
or_bits({0 to 255,...}, {0 to 255,...})
{0's,255's} = a sequence that has only 0's and 255's
{0 to 255,...} = a sequence that has many values
        of which each value can be
        anything from 0 to 255.

I am not use to thinking of something as fast as these
commands as being bottlenecks.
Any help would be appreciated.

--Lucius Lamar Hilley III
--  E-mail at luciuslhilleyiii at juno.com
--  I support transferring of files less than 60K.
--  I can Decode both UU and Base64 format.

new topic     » topic index » view message » categorize

2. Re: bottlenecks :)

I suspect that the problem is the conversion to and from Euphoria data
types. Depending on what you are doing, you may be able to skip past a lot
of Euphoria's type casting.

You can now use mem_copy() to move raw memory from one buffer location to
the next. This will speed up peek() and poke(), since there is no conversion
to sequences in between. I did that to my graphic screen save and restore,
and got a phenomenal speed increase.

There is also a mem_fill() type command, for initializing memory (instead of
building arrays).

You might also consider writing your own assembly AND and OR routines that
access the memory buffer directly, and again bypass the conversion to
Euphoria data types. You might find a prototype in some of Jacques' code.

If you are using disk read/writes to read and store the data (I'm thinking
this is related to your compression program), you might also consider
Jacques disk code to write directly to your buffers, again skipping
Euphoria's type conversion routines.

I haven't tried running or timing this, but often another option you have is
to build a table lookup for your functions (ANDs and ORs). It would be 256 *
256 bytes long per table, but once built, you could use the two value to
index into the table. For example, something like this (i haven't bothered
to test the code):

 -- START OF CODE

 -- build OR lookup table
   sequence or_table
   or_table = repeat( 256, repeat( 256, 0 ) )
   for i = 0 to 255 do
      for j = 0 to 255 do
         or_table[i+i][j+i] = or_bits( i, j )
      end for
   end for


 -- build AND lookup table
   sequence and_table
   or_table = repeat( 256, repeat( 256, 0 ) )
   for i = 0 to 255 do
      for j = 0 to 255 do
         and_table[i+i][j+i] = and_bits( i, j )
      end for
   end for

 -- example of usage
 -- AND( 3 + 0 )
   x = and_table[3+1][0+1]

 -- END OF CODE

Notice that you have to add 1 to each index, since an index of zero in
Euphoria will not work.

If this turned out to give you the speed you wanted, you could always load
the table to disk instead of calculating it. You could then load it into a
buffer, and use something like:

     x = peek( and_table_buffer + ( x1 * 256 ) + x2 )

You could probably optimize the * 256 using a series of * 2's, which I think
Euphoria performs as shifts automatically (I think...).

Hope this helps.

 -- David Cuny

new topic     » goto parent     » topic index » view message » categorize

3. Re: bottlenecks :)

David Cuny writes:

<several plausible suggestions for speeding up Hilley's code>

> I haven't tried running or timing this, but often another option you have is
> to build a table lookup for your functions (ANDs and ORs). It would be 256 *
> 256 bytes long per table, but once built, you could use the two value to
> index into the table. For example, something like this (i haven't bothered
> to test the code):

I really doubt that this will buy you anything.
and_bits, or_bits etc. are really quite fast. They're
on a par with + or - of integers. They're probably
slightly faster than a single subscript operation on a sequence,
let alone two subscript operations.

Mr. Hilley: I would suggest that before you try
any radical solutions, such as suggested by David,
that you first put "with profile_time" at the top
of your main .ex file, and let us know which statements
have the highest percentages. How do you know that
bitwise operations and peek/poke are the "bottleneck"?


Regards,
  Rob Craig
  Rapid Deployment Software

P.S. The Euphoria Web page has not been accessible
for the past 2 days due to CompuServe's "ourworld"
server being broken again. I sent a nasty message to CompuServe
and we're preparing to move the page and files over to AOL.
I'll let you know when it's ready. We'll probably maintain
both the CompuServe and the AOL URL's for Euphoria, with almost
all the files and pages being hosted on AOL. It's my impression
that AOL will provide a more reliable service, for Web pages at least.
The new URL will be: http://members.aol.com/FilesEu/
but it won't be ready for a day or so.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu