Re: Constructing sequences

new topic     » goto parent     » topic index » view thread      » older message » newer message

This is similar to I problem I had. It seems best to avoid using some kinds 
of very large data structures in memory and use clever file manipulation or 
the fast Euphoria Database System.
Try running the code below, incrementing MEMORY constant: 1000, 10000, 100000.
You can also have Task Manager open and watch Mem Usage.
Here is what I got, notice that the parsing functions do not return the same
(By the way has anyone a simple trick to get rid of the return at the
end og fthe string and the last comma):

mySplit (my own)
0.06
1000
0.87
10000
87.93
100000
"abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","","abc",

Kparse (kparse.e)
0.06
1000
3.46
10000
384.81
100000
"abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","","abc
",

parse (Strtok-v2-1.e)
0.05
1000
3.51
10000
385.47
100000
"abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","abc
",

And here is the code. Notice my way of naming variables, o for objects, s for
strings,
n for integers, l for lists, t for tables or trees (lists containing lists). Any
comment?
Also it is the append function that evaluates the parsing function so:
	l_line = mySplit(o_line, TAB)
	t_buffer = append(t_buffer, l_line)
is same as:
	t_buffer = append(t_buffer, mySplit(o_line, TAB))
You can uncomment all functions and it is the last one that runs.

include kparse.e
include Strtok-v2-1.e
include file.e
include types.e

without warning

sequence s_line, l_line, t_buffer
integer n_file
object o_line
atom t
constant TRUE = 1
constant FALSE = 0
constant TAB = "\t"
constant MEMORY = 1000


function mySplit(string s_input, object o_limiter)
	sequence l_return, s_return, s_limiter, s_tail
	integer n_start, n_stop
	
	if atom(o_limiter) then
		s_limiter = {o_limiter}
	else
		s_limiter = o_limiter
	end if
	l_return = {}
	n_start = 1
	s_tail = s_input
	while TRUE do
		n_stop = match(s_limiter, s_tail)
		if n_stop then
			s_return = s_input[n_start..n_stop-1]
			l_return = append(l_return, s_return)
			s_tail = s_tail[n_stop + length(s_limiter)..$]
		else
			if length(s_tail) > 1 then
				l_return = append(l_return, s_tail[n_start..$-1])
			end if
			exit
		end if
	end while
	return l_return
end function

-- create file
n_file = open("big.txt", "wb")
s_line = "abc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\t\tabc"
for i = 1 to MEMORY do
	puts(n_file, s_line)
	puts(n_file, "\n")
end for
close(n_file)

t = time()
n_file = open("big.txt", "rb")
t_buffer = {}
while 1 do
	o_line = gets(n_file)
	if atom(o_line) then
		exit
	end if
	--l_line = Kparse(o_line, TAB)
	--l_line = parse(o_line, TAB)
	l_line = mySplit(o_line, TAB)
	t_buffer = append(t_buffer, l_line)
end while
close(n_file)
? time() - t
? length(t_buffer)
? length(t_buffer[$-1])
for i = 1 to length(t_buffer[$-1]) do
	puts(1,"\"" & t_buffer[$-1][i] & "\",")
end for


new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu