Re: Faster lower() command

new topic     » goto parent     » topic index » view thread      » older message » newer message

Pete Lomax wrote:
> jacques deschĂȘnes wrote:
> > Why euphoria interpreter should create 5 or 6 * 1000000 long sequence?
> >   return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
> }}}
<eucode>
> tmp1 = (o>='A')  -- a 1000000 sequence of 0 and 1
> tmp2 = (o<='Z')  -- a 1000000 sequence of 0 and 1
> tmp1 = tmp1 and tmp2  -- ""                  ""
> tmp1 *= 32            -- "" of 0 and 32
> tmp1 = o+tmp1         -- "" (the result)
> return tmp1
> </eucode>
{{{


I took a quick look at the IL.
(First bind the program. Then run 
ex \euphoria\source\showil lower.exe
and see icode.lst)
It does only use two temps, so it isn't wasteful in that regard.

> I get far more consistent results when called in a for loop, and I have
> noticed
> tests like this tend to stongly favour whatever comes second, best to run them
> in separate programs in the name of equality.
> There might be a margin of say 5% for the 1-liner, but it is nothing like 4
> or 5 times faster as you might at first expect. The more you play with the
> above
> two code snippets, the more you realise they are essentially the same.

It shouldn't make much difference if you have one
complicated statement, or 5 simple ones. The IL
will be about the same.
 
> > I don't know how the loop is implemented inside euuphoria interpreter but
> > there is place for optimization!
> 
> I could enter a long rant about why sequence ops are such a bad idea, but I
> think I'll spare you today.

I think a major hidden factor here, is the on-chip CPU cache.
The sequence-ops version can't make good use of the cache.
It reads and writes 1000000-long sequences many times,
and the data is far too big to be stored in the on-chip data cache.
The sequence data must always be stored/retrieved from RAM, since
even a secondary cache would not be big enough in this case.

The for-loop version does not make good use of the cache either,
but it only goes through the sequence data once.

I suspect the sequence-ops version would look much
better if the text sequences were of the length of 
typical text files, such as 80 characters or less.
However, if you have very small sequences, the overhead
of allocating and deallocating space in the sequence-ops
version will become significant. And when you have
really huge sequences, the operating system might have to
struggle a bit to find that much contiguous RAM, at least
on the first iteration. So there is probably a 
happy medium somewhere.

If you studied this carefully, with a range of sizes,
you might find that the sequence ops version has sharp 
degradation at the point where the data can't fit in the 
on-chip (32K-bytes?) cache anymore, and another degradation at the
point where the secondary cache (1Mb?) becomes too small
for all the data.

I personally use the for-loop version for
scanning typical text files, which average 
maybe 25-40 characters per line. The Translator
can speed up the for-loop version more than
it can the sequence-ops version.
 
Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu