Re: Comparison of Euphoria vs Perl, Python, PHP

new topic     » goto parent     » topic index » view thread      » older message » newer message

c.k.lester wrote:
> 
> Can anybody create a Euphoria program based on the Perl/Python/PHP version
> found on this page: <a
> href="http://www.skitoy.com/p/performance-of-python-php-and-perl/160">http://www.skitoy.com/p/performance-of-python-php-and-perl/160</a>
> 
> It would be interesting to see how Euphoria fares against them these days.

Not so good.  Now, I didn't do anything special to try to optimize.  I 
just used map.e and regex.e.  I suspect that some of the slow down may
be the conversion that has to happen between a euphoria sequence, and
the C-string before it gets passed to PCRE.  Also, the native perl hash
and python dictionaries probably give them an edge.
 
Here were my results:

perl:
real    0m26.361s
user    0m25.726s
sys     0m0.580s

python:
real    0m38.613s
user    0m37.818s
sys     0m0.760s

euphoria:
real    1m29.448s
user    1m28.722s
sys     0m0.620s

translated euphoria:
real    0m55.930s
user    0m55.231s
sys     0m0.636s

Here's the code:
#!/usr/bin/exu

include regex.e as re
include map.e as map

integer in
sequence cmd
cmd = command_line()
in = open( cmd[3], "r" )

map:map first
first = map:new(1000)

integer FULL
FULL = open( "full.txt", "w" )

regex MULTI_TOKEN, SINGLE_TOKEN

MULTI_TOKEN  = re:new( "^__MULTI_TOKEN__\\s+(\\S+)\\s+(.*)\\t?\\s*(\\d+)\\s*$" )
SINGLE_TOKEN = re:new( "^__SINGLE_TOKEN__\\s+(\\S+)\\s*\\t?\\s*(\\d+)\\s*$" )

object line
object matches
object one, two, three, three_plus

while sequence( line ) entry do
	line = trim_tail( line )
	matches = re:search( MULTI_TOKEN, line )
	if sequence( matches ) then
		one   = line[matches[2][1]..matches[2][2]]
		two   = line[matches[3][1]..matches[3][2]]
		three = value(line[matches[4][1]..matches[4][2]])
		three = three[2]
		three_plus = three + map:get( first, one, 0 )
		
		first = map:put( first, one, three_plus )
		printf( FULL, "%s %s\t%d\n", {one, two, three})
		
	else
		matches = re:search( SINGLE_TOKEN, line )
		if sequence( matches ) then
			one   = line[matches[2][1]..matches[2][2]]
			two   = value(line[matches[3][1]..matches[3][2]])
			first = map:put( first, one, map:get( first, one, 0 ) + two[2] )
			
		else
			printf( 1, "Unknown: {%s}\n", {line} )
		end if
	end if
entry
	line = gets(in)
end while

close( FULL )
close( in )

integer FIRST

FIRST = open( "first.txt", "w" )
sequence keys
keys = map:keys( first )

for i = 1 to length( keys ) do
	printf( FIRST, "%s\t%d\n", {keys[i], map:get( first, keys[i], 0)})
end for
close( FIRST )


And here's how I generated the test data:
include machine.e

set_rand( 271828183 )
function make_word()
	integer len
	sequence word
	len = rand( 20 ) + 1
	word = ""
	for i = 1 to len do
		word &= 'a' + rand(26) - 1
	end for
	return word
end function
sequence words
words = repeat( {}, 1000 )
for k = 1 to 1000 do
	words[k] = make_word()
end for

function get_word()
	return words[rand(1000)]
end function

integer fn
fn = open( "line_test.txt", "w" )
for i = 1 to 5000000 do
	if rand(2) = 1 then
		-- multi token
printf( fn, "__MULTI_TOKEN__ %s %s %s\t%d\n", {get_word(), get_word(),
get_word(), rand(20)} )
		
	else
		-- single token
		printf( fn, "__SINGLE_TOKEN__ %s \t%d\n", {get_word(), rand(20)} )
		
	end if
end for
close( fn )


new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu