Re: Constructing sequences
- Posted by Haflidi Asgrimsson <haflidi at prokaria.com> Jun 29, 2005
- 623 views
This is similar to I problem I had. It seems best to avoid using some kinds of very large data structures in memory and use clever file manipulation or the fast Euphoria Database System. Try running the code below, incrementing MEMORY constant: 1000, 10000, 100000. You can also have Task Manager open and watch Mem Usage. Here is what I got, notice that the parsing functions do not return the same (By the way has anyone a simple trick to get rid of the return at the end og fthe string and the last comma): mySplit (my own) 0.06 1000 0.87 10000 87.93 100000 "abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","","abc", Kparse (kparse.e) 0.06 1000 3.46 10000 384.81 100000 "abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","","abc ", parse (Strtok-v2-1.e) 0.05 1000 3.51 10000 385.47 100000 "abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","abc ", And here is the code. Notice my way of naming variables, o for objects, s for strings, n for integers, l for lists, t for tables or trees (lists containing lists). Any comment? Also it is the append function that evaluates the parsing function so: l_line = mySplit(o_line, TAB) t_buffer = append(t_buffer, l_line) is same as: t_buffer = append(t_buffer, mySplit(o_line, TAB)) You can uncomment all functions and it is the last one that runs.
include kparse.e include Strtok-v2-1.e include file.e include types.e without warning sequence s_line, l_line, t_buffer integer n_file object o_line atom t constant TRUE = 1 constant FALSE = 0 constant TAB = "\t" constant MEMORY = 1000 function mySplit(string s_input, object o_limiter) sequence l_return, s_return, s_limiter, s_tail integer n_start, n_stop if atom(o_limiter) then s_limiter = {o_limiter} else s_limiter = o_limiter end if l_return = {} n_start = 1 s_tail = s_input while TRUE do n_stop = match(s_limiter, s_tail) if n_stop then s_return = s_input[n_start..n_stop-1] l_return = append(l_return, s_return) s_tail = s_tail[n_stop + length(s_limiter)..$] else if length(s_tail) > 1 then l_return = append(l_return, s_tail[n_start..$-1]) end if exit end if end while return l_return end function -- create file n_file = open("big.txt", "wb") s_line = "abc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\t\tabc" for i = 1 to MEMORY do puts(n_file, s_line) puts(n_file, "\n") end for close(n_file) t = time() n_file = open("big.txt", "rb") t_buffer = {} while 1 do o_line = gets(n_file) if atom(o_line) then exit end if --l_line = Kparse(o_line, TAB) --l_line = parse(o_line, TAB) l_line = mySplit(o_line, TAB) t_buffer = append(t_buffer, l_line) end while close(n_file) ? time() - t ? length(t_buffer) ? length(t_buffer[$-1]) for i = 1 to length(t_buffer[$-1]) do puts(1,"\"" & t_buffer[$-1][i] & "\",") end for