Re: Comparison of Euphoria vs Perl, Python, PHP
- Posted by Matt Lewis <matthewwalkerlewis at gma?l.c?m> Jun 12, 2008
- 813 views
c.k.lester wrote: > > Can anybody create a Euphoria program based on the Perl/Python/PHP version > found on this page: <a > href="http://www.skitoy.com/p/performance-of-python-php-and-perl/160">http://www.skitoy.com/p/performance-of-python-php-and-perl/160</a> > > It would be interesting to see how Euphoria fares against them these days. Not so good. Now, I didn't do anything special to try to optimize. I just used map.e and regex.e. I suspect that some of the slow down may be the conversion that has to happen between a euphoria sequence, and the C-string before it gets passed to PCRE. Also, the native perl hash and python dictionaries probably give them an edge. Here were my results: perl: real 0m26.361s user 0m25.726s sys 0m0.580s python: real 0m38.613s user 0m37.818s sys 0m0.760s euphoria: real 1m29.448s user 1m28.722s sys 0m0.620s translated euphoria: real 0m55.930s user 0m55.231s sys 0m0.636s Here's the code:
#!/usr/bin/exu include regex.e as re include map.e as map integer in sequence cmd cmd = command_line() in = open( cmd[3], "r" ) map:map first first = map:new(1000) integer FULL FULL = open( "full.txt", "w" ) regex MULTI_TOKEN, SINGLE_TOKEN MULTI_TOKEN = re:new( "^__MULTI_TOKEN__\\s+(\\S+)\\s+(.*)\\t?\\s*(\\d+)\\s*$" ) SINGLE_TOKEN = re:new( "^__SINGLE_TOKEN__\\s+(\\S+)\\s*\\t?\\s*(\\d+)\\s*$" ) object line object matches object one, two, three, three_plus while sequence( line ) entry do line = trim_tail( line ) matches = re:search( MULTI_TOKEN, line ) if sequence( matches ) then one = line[matches[2][1]..matches[2][2]] two = line[matches[3][1]..matches[3][2]] three = value(line[matches[4][1]..matches[4][2]]) three = three[2] three_plus = three + map:get( first, one, 0 ) first = map:put( first, one, three_plus ) printf( FULL, "%s %s\t%d\n", {one, two, three}) else matches = re:search( SINGLE_TOKEN, line ) if sequence( matches ) then one = line[matches[2][1]..matches[2][2]] two = value(line[matches[3][1]..matches[3][2]]) first = map:put( first, one, map:get( first, one, 0 ) + two[2] ) else printf( 1, "Unknown: {%s}\n", {line} ) end if end if entry line = gets(in) end while close( FULL ) close( in ) integer FIRST FIRST = open( "first.txt", "w" ) sequence keys keys = map:keys( first ) for i = 1 to length( keys ) do printf( FIRST, "%s\t%d\n", {keys[i], map:get( first, keys[i], 0)}) end for close( FIRST )
And here's how I generated the test data:
include machine.e set_rand( 271828183 ) function make_word() integer len sequence word len = rand( 20 ) + 1 word = "" for i = 1 to len do word &= 'a' + rand(26) - 1 end for return word end function sequence words words = repeat( {}, 1000 ) for k = 1 to 1000 do words[k] = make_word() end for function get_word() return words[rand(1000)] end function integer fn fn = open( "line_test.txt", "w" ) for i = 1 to 5000000 do if rand(2) = 1 then -- multi token printf( fn, "__MULTI_TOKEN__ %s %s %s\t%d\n", {get_word(), get_word(), get_word(), rand(20)} ) else -- single token printf( fn, "__SINGLE_TOKEN__ %s \t%d\n", {get_word(), rand(20)} ) end if end for close( fn )