1. Faster lower() command

It's been a while since i've posted here. I recently put this together, from
code inside of the guru.ex file, and the current lower() command.

The one inside of guru.ex is used to handle sequence lowering, and atom
lowering is handled by the current function. I chose to use the current
command for atom lowering using the guru.ex method is quite slow with atoms.
function lower1( sequence x )
    integer c
    for i = 1 to length(x) do
        c = x[i]
        if c <= 'Z' then
            if c >= 'A' then
                x[i] = c + 32
            end if
        end if
    end for
return x
end function

function lower(object x)
    if sequence(x) then
        return lower1(x)
    else
        return x + (x >= 'A' and x <= 'Z') * TO_LOWER
    end if
end function


I put the sequence first, since I expect that lower() will mostly be used on
sequences. 

Any comments/questions/etc.?

new topic     » topic index » view message » categorize

2. Re: Faster lower() command

here is the power of euphoria

function lower(object o)
  return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
end function



regards,
Jacques D.


Jesse Adkins wrote:
> 
> It's been a while since i've posted here. I recently put this together, from
> code inside of the guru.ex file, and the current lower() command.
> 
> The one inside of guru.ex is used to handle sequence lowering, and atom
> lowering is handled by the current function. I chose to use the current
> command for atom lowering using the guru.ex method is quite slow with atoms.
> }}}
<eucode>
> function lower1( sequence x )
>     integer c
>     for i = 1 to length(x) do
>         c = x[i]
>         if c <= 'Z' then
>             if c >= 'A' then
>                 x[i] = c + 32
>             end if
>         end if
>     end for
> return x
> end function
> 
> function lower(object x)
>     if sequence(x) then
>         return lower1(x)
>     else
>         return x + (x >= 'A' and x <= 'Z') * TO_LOWER
>     end if
> end function
> </eucode>
{{{

> 
> I put the sequence first, since I expect that lower() will mostly be used on
> sequences. 
> 
> Any comments/questions/etc.?

new topic     » goto parent     » topic index » view message » categorize

3. Re: Faster lower() command

jacques deschênes wrote:
> 
> here is the power of euphoria
> 
> }}}
<eucode>
> function lower(object o)
>   return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
> end function
> </eucode>
{{{

> 
> 
> regards,
> Jacques D.

I'm well aware of that command, since it's what's in place already. The
command that I put up for display processes a sequence much faster than the
current code for lower(). A test I ran, with all letters in caps, ran about 
twice as fast as the current code.

I kept the original code for doing atoms, since the guru.ex code doesn't work
so well with atoms for some reason

I'm hoping the code I posted earlier could become part of Euphoria some day, 
despite the fact that it's a good deal larger than the current lower()
command.

-- Jesse Adkins.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Faster lower() command

Hi didn't run any test but I'm surprised that a for loop written in euphoria is
faster than a loop internal to the interpreter!


Jesse Adkins wrote:
> 
> jacques deschênes wrote:
> > 
> > here is the power of euphoria
> > 
> > }}}
<eucode>
> > function lower(object o)
> >   return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
> > end function
> > </eucode>
{{{

> > 
> > 
> > regards,
> > Jacques D.
> 
> I'm well aware of that command, since it's what's in place already. The
> command that I put up for display processes a sequence much faster than the
> current code for lower(). A test I ran, with all letters in caps, ran about
> 
> twice as fast as the current code.
> 
> I kept the original code for doing atoms, since the guru.ex code doesn't work
> so well with atoms for some reason
> 
> I'm hoping the code I posted earlier could become part of Euphoria some day,
> 
> despite the fact that it's a good deal larger than the current lower()
> command.
> 
> -- Jesse Adkins.

new topic     » goto parent     » topic index » view message » categorize

5. Re: Faster lower() command

Hi Jesse,
As I was quite skeptic I made my own test here
function lower1( sequence x )
    integer c
    for i = 1 to length(x) do
        c = x[i]
        if c <= 'Z' then
            if c >= 'A' then
                x[i] = c + 32
            end if
        end if
    end for
return x
end function

function lower(object o)
  return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
end function


atom t
constant txt=repeat('A',1000000)
sequence l

t = time()
l = lower1(txt)
? time()-t
t = time()
l = lower(txt)
? time()-t

on my laptop
lower1() give .12sec
lower()  give .09sec
As I expected the for loop is slower than the interpreter internal loop.
I can't figure how lower1() can be faster on your machine.

regards,
Jacques Deschênes




Jesse Adkins wrote:
> 
> jacques deschênes wrote:
> > 
> > here is the power of euphoria
> > 
> > }}}
<eucode>
> > function lower(object o)
> >   return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
> > end function
> > </eucode>
{{{

> > 
> > 
> > regards,
> > Jacques D.
> 
> I'm well aware of that command, since it's what's in place already. The
> command that I put up for display processes a sequence much faster than the
> current code for lower(). A test I ran, with all letters in caps, ran about
> 
> twice as fast as the current code.
> 
> I kept the original code for doing atoms, since the guru.ex code doesn't work
> so well with atoms for some reason
> 
> I'm hoping the code I posted earlier could become part of Euphoria some day,
> 
> despite the fact that it's a good deal larger than the current lower()
> command.
> 
> -- Jesse Adkins.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Faster lower() command

jacques deschênes wrote:
> 
> Hi Jesse,
> As I was quite skeptic I made my own test here
> }}}
<eucode>
> function lower1( sequence x )
>     integer c
>     for i = 1 to length(x) do
>         c = x[i]
>         if c <= 'Z' then
>             if c >= 'A' then
>                 x[i] = c + 32
>             end if
>         end if
>     end for
> return x
> end function
> 
> function lower(object o)
>   return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
> end function
> 
> 
> atom t
> constant txt=repeat('A',1000000)
> sequence l
> 
> t = time()
> l = lower1(txt)
> ? time()-t
> t = time()
> l = lower(txt)
> ? time()-t
> 
> </eucode>
{{{

> on my laptop
> lower1() give .12sec
> lower()  give .09sec
> As I expected the for loop is slower than the interpreter internal loop.
> I can't figure how lower1() can be faster on your machine.
> 
> regards,
> Jacques Deschênes
> 
> 

I can confirm Jesse Adkins findings.
On my machine I get the following results:
Time for lower1()  .06 seconds
Time for lower()   .24 seconds

Reversing lower() and lower1() in the program made little difference.

I tried changing the statement "constant tct=repeat('A',1000000)" to
"constant tct=repeat('A',10000000)" to get more accurate timings. The time for
lower1() was reported as .61 seconds as expected. The time for lower() was taking
much longer than expected, then windows reported I was low on virtual memory. It
eventually finished after 82 seconds. There was almost continuous disk activity
during this time. Possibly not related but it would seem that lower() requires
considerably more memory. An incident like this is very rare for me.

This is with a 1.3GHZ Pentium4, Windows2000 Professional, 256MB RAM.

The lower1() function is very similar to lower_case() found in wild.e by Colin
Taylor back in Nov 25 2000. Search for "Faster Wildcard".

I have done similar tests in the past, all with similar results.

Larry Miller

new topic     » goto parent     » topic index » view message » categorize

7. Re: Faster lower() command

reran the test with a 100 times longer string (10000000)
and got
0.66 sec for lower1()
0.94 sec for lower()

effectively the for loop is faster on very long string!

new topic     » goto parent     » topic index » view message » categorize

8. Re: Faster lower() command

jacques deschênes wrote:
> on my laptop
> lower1() give .12sec
> lower()  give .09sec
I not only agree with other posters but am downright jaw-dropping astonished
that you got lower() to appear faster that lower1() in ANY circumstances!

My analysis was that lower() must create 5 or 6 * 1,000,000 - long sequences
before finally calculating the result.

> I can't figure how lower1() can be faster on your machine.

Are you by any chance on a Linux box while we're all on Windows?

Puzzled,
Pete

new topic     » goto parent     » topic index » view message » categorize

9. Re: Faster lower() command

To clarify for everyone, this is what I used for benchmarking.
constant ITERATIONS = 1000000
integer p
atom t0, loop_overhead

t0 = time()
for i = 1 to ITERATIONS do
    -- time an empty loop
end for
loop_overhead = time() - t0

t0 = time()
for i = 1 to ITERATIONS do
    p = power(2, 20)
end for
? (time() - t0 - loop_overhead)/ITERATIONS
-- calculates time (in seconds) for one call to power

It's the code directly from the documentation under the time() command. I
swapped out 'p = power(2, 20)' for my own command.

Quoting the mail I sent to Rob Craig...


Converting the sequence 
("A REALLY LONG SEQUENCE WITH JUNK INSIDE.")

Old -> (2.26 to 2.28 e-06)
New -> (1.34 to 1.36 e-06)


The benchmarks were done on Windows XP Home.


-- Jesse Adkins

new topic     » goto parent     » topic index » view message » categorize

10. Re: Faster lower() command

I'm using windows xp pro running on DELL inpiron 1501 with 1Go Ram
CPU AMD turion 64x2 1.60Ghz

Why euphoria interpreter should create 5 or 6 * 1000000 long sequence?
each character is represented by an integer so its a 4000000 bytes array

new test
function lower1( sequence x )
    integer c
    for i = 1 to length(x) do
        c = x[i]
        if c <= 'Z' then
            if c >= 'A' then
                x[i] = c + 32
            end if
        end if
    end for
return x
end function

function lower(object o)
  return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
end function



atom t
constant txt=repeat('A',1000000)
sequence l

t = time()
l = lower1(txt)
? time()-t
t = time()
l = lower(txt)
? time()-t

t = time()
for i = 1 to 1000 do
  l = lower1(txt)
end for
? time() - t

t = time()
for i = 1 to 1000 do
  l = lower(txt)
end for
? time() - t



result
0.13
0.11
64.2
95.41

This time its clear that lower1() is faster than lower()
I don't know how the loop is implemented inside euuphoria interpreter but
there is place for optimization!

regards,
Jacques Deschênes


Pete Lomax wrote:
> 
> jacques deschênes wrote:
> > on my laptop
> > lower1() give .12sec
> > lower()  give .09sec
> I not only agree with other posters but am downright jaw-dropping astonished
> that you got lower() to appear faster that lower1() in ANY circumstances!
> 
> My analysis was that lower() must create 5 or 6 * 1,000,000 - long sequences
> before finally calculating the result.
> 
> > I can't figure how lower1() can be faster on your machine.
> 
> Are you by any chance on a Linux box while we're all on Windows?
> 
> Puzzled,
> Pete

new topic     » goto parent     » topic index » view message » categorize

11. Re: Faster lower() command

Just a note--timing stuff on my laptop is unreliable because the hardware and
software adjusts the CPU speed depending on load.

Try it on a desktop and see if you get a difference.

--
"Any programming problem can be solved by adding a level of indirection."
--anonymous
"Any performance problem can be solved by removing a level of indirection."
--M. Haertel
"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare
j.

new topic     » goto parent     » topic index » view message » categorize

12. Re: Faster lower() command

jacques deschênes wrote:
> 
> I'm using windows xp pro running on DELL inpiron 1501 with 1Go Ram
> CPU AMD turion 64x2 1.60Ghz
> 
> Why euphoria interpreter should create 5 or 6 * 1000000 long sequence?
> each character is represented by an integer so its a 4000000 bytes array

let x be "ABC"
and our expression is:
  x = x + ((x >= 'A') and (x <= 'Z')) * 32

Break down the expression into simplest terms..

tmp1 = (x >= 'A')      -- {1,1,1} = ({65,66,67} >= 65)
tmp2 = (x <= 'Z')      -- {1,1,1} = ({65,66,67} <= 90)
tmp1 = (tmp1 and tmp2) -- {1,1,1} = ({1,1,1} and {1,1,1})
tmp1 = tmp1 * 32       -- {32,32,32} = {1,1,1} * 32
x = x + tmp1           -- {97,98,99} = {65,66,67} + {32,32,32}

2 tmp sequences would be required.

Chris Bensler
~ The difference between ordinary and extraordinary is that little extra ~
http://empire.iwireweb.com - Empire for Euphoria

new topic     » goto parent     » topic index » view message » categorize

13. Re: Faster lower() command

jacques deschênes wrote:
> Why euphoria interpreter should create 5 or 6 * 1000000 long sequence?
>   return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
tmp1 = (o>='A')  -- a 1000000 sequence of 0 and 1
tmp2 = (o<='Z')  -- a 1000000 sequence of 0 and 1
tmp1 = tmp1 and tmp2  -- ""                  ""
tmp1 *= 32            -- "" of 0 and 32
tmp1 = o+tmp1         -- "" (the result)
return tmp1


I get far more consistent results when called in a for loop, and I have noticed
tests like this tend to stongly favour whatever comes second, best to run them in
separate programs in the name of equality.
There might be a margin of say 5% for the 1-liner, but it is nothing like 4 or 5
times faster as you might at first expect. The more you play with the above two
code snippets, the more you realise they are essentially the same.

> I don't know how the loop is implemented inside euuphoria interpreter but
> there is place for optimization!

I could enter a long rant about why sequence ops are such a bad idea, but I
think I'll spare you today.

Chris Bensler wrote:
>Pete: peek() will accept peek({lp,0}) which will result in a null string
Thanks, I'll edit my copy of that routine.

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

14. Re: Faster lower() command

Pete Lomax wrote:
> jacques deschênes wrote:
> > Why euphoria interpreter should create 5 or 6 * 1000000 long sequence?
> >   return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence
> }}}
<eucode>
> tmp1 = (o>='A')  -- a 1000000 sequence of 0 and 1
> tmp2 = (o<='Z')  -- a 1000000 sequence of 0 and 1
> tmp1 = tmp1 and tmp2  -- ""                  ""
> tmp1 *= 32            -- "" of 0 and 32
> tmp1 = o+tmp1         -- "" (the result)
> return tmp1
> </eucode>
{{{


I took a quick look at the IL.
(First bind the program. Then run 
ex \euphoria\source\showil lower.exe
and see icode.lst)
It does only use two temps, so it isn't wasteful in that regard.

> I get far more consistent results when called in a for loop, and I have
> noticed
> tests like this tend to stongly favour whatever comes second, best to run them
> in separate programs in the name of equality.
> There might be a margin of say 5% for the 1-liner, but it is nothing like 4
> or 5 times faster as you might at first expect. The more you play with the
> above
> two code snippets, the more you realise they are essentially the same.

It shouldn't make much difference if you have one
complicated statement, or 5 simple ones. The IL
will be about the same.
 
> > I don't know how the loop is implemented inside euuphoria interpreter but
> > there is place for optimization!
> 
> I could enter a long rant about why sequence ops are such a bad idea, but I
> think I'll spare you today.

I think a major hidden factor here, is the on-chip CPU cache.
The sequence-ops version can't make good use of the cache.
It reads and writes 1000000-long sequences many times,
and the data is far too big to be stored in the on-chip data cache.
The sequence data must always be stored/retrieved from RAM, since
even a secondary cache would not be big enough in this case.

The for-loop version does not make good use of the cache either,
but it only goes through the sequence data once.

I suspect the sequence-ops version would look much
better if the text sequences were of the length of 
typical text files, such as 80 characters or less.
However, if you have very small sequences, the overhead
of allocating and deallocating space in the sequence-ops
version will become significant. And when you have
really huge sequences, the operating system might have to
struggle a bit to find that much contiguous RAM, at least
on the first iteration. So there is probably a 
happy medium somewhere.

If you studied this carefully, with a range of sizes,
you might find that the sequence ops version has sharp 
degradation at the point where the data can't fit in the 
on-chip (32K-bytes?) cache anymore, and another degradation at the
point where the secondary cache (1Mb?) becomes too small
for all the data.

I personally use the for-loop version for
scanning typical text files, which average 
maybe 25-40 characters per line. The Translator
can speed up the for-loop version more than
it can the sequence-ops version.
 
Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu