1. too much memory use!
- Posted by Kat <gertie at PELL.NET> Feb 19, 2002
- 531 views
Eu took 29 minutes, 36 sec to execute the following program, and used 142.7Megs of memory. The file it was reading is 12.1 megabytes. data = {} datafile = open(data_noun,"u") readline = gets(datafile) -- get a line while not atom(readline) do while find(readline[length(readline)],{10,13,32}) do readline = readline[1..length(readline)-1] end while junk_s = parse(readline,32) data = data & {junk_s} readline = gets(datafile) -- get another line end while close(datafile) trace(1) -- to hold the program while getting memory use data abort(0) What am i doing that runs a 12meg file up to 142.7megabytes? and takes 1/2 hour to do it? How can i say data = glom(file) ? Kat
2. Re: too much memory use!
- Posted by euman at bellsouth.net Feb 19, 2002
- 583 views
Hey Kat, lets see, your looking for these {10,13,32} or {Line Feed, Carriage Return, Space} right? What if you tried using find like this for starters: loc = find({10,13,32},readline[1..length(readline)]) readline = readline[1..loc] instead of: find(readline[length(readline)],{10,13,32}) readline = readline[1..length(readline)-1] second, I have no idea what "parse" does eg, junk_s = parse(readline,32) third, you sure are making alot of sequence copy, concat operations in this routine. finally, this is a bit strange coming from your caliber Kat. while find(readline[length(readline)],{10,13,32}) do readline = readline[1..length(readline)-1] end while your saying while I find this {10,13,32} at the end of readline copy readline[1..to length(readline)]-1 until I dont find it.....hmmmm Kat think about it! Euman euman at bellsouth.net Q: Are we monetarily insane? A: YES ----- Original Message ----- From: "Kat" <gertie at PELL.NET> To: "EUforum" <EUforum at topica.com> Sent: Wednesday, February 20, 2002 12:17 AM Subject: too much memory use! > > Eu took 29 minutes, 36 sec to execute the following program, and used > 142.7Megs of memory. The file it was reading is 12.1 megabytes. > > data = {} > datafile = open(data_noun,"u") > readline = gets(datafile) -- get a line > while not atom(readline) do > while find(readline[length(readline)],{10,13,32}) do readline = > readline[1..length(readline)-1] end while > junk_s = parse(readline,32) > data = data & {junk_s} > readline = gets(datafile) -- get another line > end while > close(datafile) > trace(1) -- to hold the program while getting memory use data > abort(0) > > What am i doing that runs a 12meg file up to 142.7megabytes? and takes > 1/2 hour to do it? > > How can i say data = glom(file) ? > > Kat > > > >
3. Re: too much memory use!
- Posted by Kat <gertie at PELL.NET> Feb 19, 2002
- 513 views
On 20 Feb 2002, at 1:59, euman at bellsouth.net wrote: > > Hey Kat, > > lets see, your looking for these > > {10,13,32} or {Line Feed, Carriage Return, Space} > > right? > > What if you tried using find like this for starters: > > loc = find({10,13,32},readline[1..length(readline)]) > readline = readline[1..loc] How do you know which of {10,13,32} is returned? The first match? I don't know, but it's likely to be the 32, not the 10. > instead of: > > find(readline[length(readline)],{10,13,32}) > readline = readline[1..length(readline)-1] > > second, I have no idea what "parse" does > eg, junk_s = parse(readline,32) parse() is in strtok.e , in this case, it's breaking "1 2 3 4" into {"1","2","3","4"}. I need the bits later on, and it's better to break it once now than to break it repeatedly later. > third, you sure are making alot of sequence copy, > concat operations in this routine. > > finally, > > this is a bit strange coming from your caliber Kat. Not when coding at 1am till 3am ! > while find(readline[length(readline)],{10,13,32}) do > readline = readline[1..length(readline)-1] > end while > > your saying while I find this {10,13,32} at the end > of readline copy readline[1..to length(readline)]-1 > until I dont find it.....hmmmm Kat > > think about it! I did, i still don't see where the extra 80megs of memory is being used. Just gets()ing the file till -1, and adding it to data is using 60 megs of memory, altho it's fast, about 20 seconds. Even 60 megs sounds like a lot for a 12meg file, since 32bits per char is 48megs. I am now watching memory use scale up slowly while parse() is running after gets()ing the entire file first. I'd shut it down and change the code, but i already have 4 of those leftover virtual machines cluttering up memory, and i don't want to add more. Kat > Euman > euman at bellsouth.net > > Q: Are we monetarily insane? > A: YES > > ----- Original Message ----- > From: "Kat" <gertie at PELL.NET> > To: "EUforum" <EUforum at topica.com> > Sent: Wednesday, February 20, 2002 12:17 AM > Subject: too much memory use! > > > > Eu took 29 minutes, 36 sec to execute the following program, and used > > 142.7Megs of memory. The file it was reading is 12.1 megabytes. > > > > data = {} > > datafile = open(data_noun,"u") > > readline = gets(datafile) -- get a line > > while not atom(readline) do > > while find(readline[length(readline)],{10,13,32}) do readline = > > readline[1..length(readline)-1] end while > > junk_s = parse(readline,32) > > data = data & {junk_s} > > readline = gets(datafile) -- get another line > > end while > > close(datafile) > > trace(1) -- to hold the program while getting memory use data > > abort(0) > > > > What am i doing that runs a 12meg file up to 142.7megabytes? and takes > > 1/2 hour to do it? > > > > How can i say data = glom(file) ? > > > > Kat > > > > > > >
4. Re: too much memory use!
- Posted by "Carl R. White" <euphoria at carlw.legend.uk.com> Feb 20, 2002
- 521 views
----- Original Message ----- From: "Kat" <gertie at PELL.NET> To: "EUforum" <EUforum at topica.com> Subject: too much memory use! > > Eu took 29 minutes, 36 sec to execute the following program, and used > 142.7Megs of memory. The file it was reading is 12.1 megabytes. > > data = {} > datafile = open(data_noun,"u") > readline = gets(datafile) -- get a line Euphoria stores bytes unpacked in 4-byte integers. This means there's usually four times as much memory used as the number of bytes read in. So that increases your memory usage to 48 Meg already. > while not atom(readline) do > while find(readline[length(readline)],{10,13,32}) do readline = > readline[1..length(readline)-1] end while I can't see why this would burn too much memory. It's quite a clever piece of code, much like something I used to write whan I still had a brain. It has a bug though. What if readline is all spaces? Eventually there'll be an access of readline[0]. This is equivalent, and might not chew as much memory: integer i i = length(readline) while i > 0 and find(readline[i], "\n\r ") do i -= 1 end while readline = readline[1..i] > junk_s = parse(readline,32) parse is an unknown quantity to me. Is it optimal for your needs, or would you benefit from writing your own splitter? > data = data & {junk_s} I'm not sure about whether Euphoria optimises this, so possibly use data &= {junk_s} or data = append(data, junk_s) -- which I know _is_ optimised instead. Regardless; By the time you've read all your data in, the 48Meg plus overheads for all the subsequences will quite easily add up to 60Meg. > readline = gets(datafile) -- get another line > end while > close(datafile) > trace(1) -- to hold the program while getting memory use data It could well be that the program uses less memory whan trace mode is off - The opposite of the old "My program only works when debug mode is on!" > abort(0) > > What am i doing that runs a 12meg file up to 142.7megabytes? and takes > 1/2 hour to do it? It depends whether Euphoria decides to swap to virtual memory. If it does, it will seriously increase the runtime. On my system, it's 19 times more time-expensive to read an integer from a sequence that overlaps disk than it is from a sequence that's memory-only. If the four-fold increase for reading bytes into integers bothers you, you can always read your whole file into allocated memory and parse from there instead, but that's a whole new can of worms :) Carl
5. Re: too much memory use!
- Posted by Jiri Babor <jbabor at PARADISE.NET.NZ> Feb 20, 2002
- 506 views
Kat, try the following quick fix instead of your strtok code. Let me know if it's any better, I have not got big enough text files to test it. jiri constant false = 0, true = 1 sequence data,word atom t integer c,f,inword data = {} word = {} inword = false -- flag f = open("kat1.eml", "rb") c = getc(f) while c != -1 do if find(c, {32,13,10}) then if inword then data = append(data, word) inword = false word = {} end if else inword = true word &= c end if c = getc(f) end while close(f) if inword then -- flush data = append(data, word) end if ----- Original Message ----- From: "Kat" <gertie at PELL.NET> To: "EUforum" <EUforum at topica.com> Sent: Wednesday, February 20, 2002 6:17 PM Subject: too much memory use! > > Eu took 29 minutes, 36 sec to execute the following program, and used > 142.7Megs of memory. The file it was reading is 12.1 megabytes. > > data = {} > datafile = open(data_noun,"u") > readline = gets(datafile) -- get a line > while not atom(readline) do > while find(readline[length(readline)],{10,13,32}) do readline = > readline[1..length(readline)-1] end while > junk_s = parse(readline,32) > data = data & {junk_s} > readline = gets(datafile) -- get another line > end while > close(datafile) > trace(1) -- to hold the program while getting memory use data > abort(0) > > What am i doing that runs a 12meg file up to 142.7megabytes? and takes > 1/2 hour to do it? > > How can i say data = glom(file) ? > > Kat > > > >
6. Re: too much memory use!
- Posted by Jiri Babor <jbabor at PARADISE.NET.NZ> Feb 20, 2002
- 520 views
Kat, if you have a memory challenged system and want to avoid hard disk thrashing, there is a number of different schemes you can use, one of them is outlined below: ---------------------------------------------------------------------- Select a suitable word delimiter. I chose 'space' (ascii 32). Determine size of the data file. Allocate sufficient memory for output. In a while loop isolate individual words poke them into the reserved memory separate them with your chosen delimiter update the memory pointer and keep at it until the end of input file is reached. Flush the system. ---------------------------------------------------------------------- I am not sure what you intend to do with the data, but it is basically trivial to read it back into a sequence and write it to a file, as a whole or even as individual tokens. I hope this makes some sense to you. jiri ---------------------------------------------------------------------- include machine.e -- allocate include file.e -- seek, where constant false = 0, true = 1 constant d = 32 -- space chosen as delimiter sequence word atom a,p integer c,e,f,inword,size f = open("kat1.eml", "rb") -- get file size e = seek(f, -1) -- go to end of file size = where(f) e = seek(f, 0) -- go to back to start of file -- allocate memory for output a = allocate(size + 1) if a = 0 then puts(1, "Memory allocation failed...\n") abort(1) end if -- initialize word = {} inword = false -- flag p = a -- current memory pointer -- main loop c = getc(f) while c != -1 do if find(c, {32,13,10}) then if inword then poke(p, word & d) p += length(word) + 1 inword = false word = {} end if else inword = true word &= c end if c = getc(f) end while close(f) -- flush if inword then poke(p, word & d) p += length(word) + 1 end if -- a little test - DO NOT try it with a 12 Mb file ! puts(1, peek({a, p-a}))
7. Re: too much memory use!
- Posted by Kat <gertie at PELL.NET> Feb 20, 2002
- 562 views
On 21 Feb 2002, at 2:30, Jiri Babor wrote: > > Kat, > > try the following quick fix instead of your strtok code. Let me know if it's > any > better, I have not got big enough text files to test it. > > jiri > > constant false = 0, true = 1 > sequence data,word > atom t > integer c,f,inword > > data = {} > word = {} > inword = false -- flag > f = open("kat1.eml", "rb") > c = getc(f) > while c != -1 do > if find(c, {32,13,10}) then > if inword then > data = append(data, word) > inword = false > word = {} > end if > else > inword = true > word &= c > end if > c = getc(f) > end while > close(f) > if inword then -- flush > data = append(data, word) > end if Jiri, i gave it a try, it ran 1 hour and 24 minutes, and use 160megs of memory. Modified a bit as follows: data = {} -- the bulk file contents readline = "" -- one of the lines in the file word = "" datafile = open(data_noun,"rb") -- 12megs -- <Jiri's code> -- corrected for Jiri's "data" vs my "readline" c = getc(datafile) while c != -1 do if equal(c,32) then if inword then readline = append(readline, word) inword = false inline = true word = {} end if elsif find(c,{10,13}) and ( inline = true ) then data = append(data,readline) readline = "" inline = false -- do not land here if there is a 13 and a 10 on the same line!! inword = false else inword = true inline = true word &= c end if c = getc(datafile) end while -- <end Jiri's code> close(datafile) I modified it because i need the data arranged like: {data -- one sequence {readline},{readline}, -- 75,000 of them {word},{word}, -- 5..1000 of them per readline } } } {indexes -- one sequence {readline},{readline}, -- 145,000 of them {word},{word}, -- 5..1000 of them per readline } } } Then i re-index the whole mess. The indexes are the 8-digit words below, which need to be longer too, but that's another story. In pascal (on dos), i would have done this with a ramdrive and file pointers. The first Eu program did the same way, but no ramdrive (i figured it would confuse win95) and at the end of the first day, it was about line 1000 on the 75,000 line file. Since windoze won't run 75 days, this way of re-indexing was not only too slow, but too <expletive> slow, and meant syncing the files in memory with those on the drive periodically, and determining after reboots where it left off before the reboot. If i can't get this to run better, i will either forget indexing and do bruteforce searches (making for 30second lookup times), or look at the allocated memory schemes and any libs in the Eu archives to wrap them. One file is arranged like (two lines picked at random): 00252962 04 n 04 decrease 0 diminution 0 reduction 0 step-down 0 027 @ 00252809 n 0000 ! 00260981 n 0101 ~ 00253565 n 0000 ~ 00254314 n 0000 ~ 00254503 n 0000 ~ 00254762 n 0000 ~ 00254954 n 0000 ~ 00255044 n 0000 ~ 00255167 n 0000 ~ 00255414 n 0000 ~ 00255692 n 0000 ~ 00256172 n 0000 ~ 00256313 n 0000 ~ 00258079 n 0000 ~ 00259254 n 0000 ~ 00259472 n 0000 ~ 00260041 n 0000 ~ 00260158 n 0000 ~ 00260295 n 0000 ~ 00260392 n 0000 ~ 00262205 n 0000 ~ 00309963 n 0000 ~ 00814987 n 0000 ~ 00833309 n 0000 ~ 11005389 n 0000 ~ 11007893 n 0000 ~ 11061536 n 0000 | the act of decreasing or reducing something 00253565 04 n 01 cut 5 006 @ 00252962 n 0000 ~ 00253801 n 0000 ~ 00253899 n 0000 ~ 00253992 n 0000 ~ 00254075 n 0000 ~ 00254227 n 0000 | the act of reducing the amount or number; "the mayor proposed extensive cuts in the city budget" The other is arranged like (two lines picked at random): amorousness n 2 2 @ ~ 2 0 06129685 06087784 amorpha n 1 3 @ ~ #m 1 0 10220950 One line may end in {32,32,10}, and the next may be {32,10} and the next may end in {}, so i must allow for anything. There are no null lines, or lines that trim() down to {}. Kat > ----- Original Message ----- > From: "Kat" <gertie at PELL.NET> > To: "EUforum" <EUforum at topica.com> > Sent: Wednesday, February 20, 2002 6:17 PM > Subject: too much memory use! > > > > Eu took 29 minutes, 36 sec to execute the following program, and used > > 142.7Megs of memory. The file it was reading is 12.1 megabytes. > > > > data = {} > > datafile = open(data_noun,"u") > > readline = gets(datafile) -- get a line > > while not atom(readline) do > > while find(readline[length(readline)],{10,13,32}) do readline = > > readline[1..length(readline)-1] end while > > junk_s = parse(readline,32) > > data = data & {junk_s} > > readline = gets(datafile) -- get another line > > end while > > close(datafile) > > trace(1) -- to hold the program while getting memory use data > > abort(0) > > > > What am i doing that runs a 12meg file up to 142.7megabytes? and takes > > 1/2 hour to do it? > > > > How can i say data = glom(file) ? > > > > Kat > > > > > > >
8. Re: too much memory use!
- Posted by petelomax at blueyonder.co.uk Feb 20, 2002
- 562 views
Kat, I found the very simple: <CODE> object t, t1 t=time() t1=t+1 for j=1to length(f2s) do for i=1 to length(f1s) do ...... out_diff(r) if t1<time() then printf(1,"\rPacktime[%d][%d]: %d Duration : %d ",{i,j,packtime,time()-t}) t1=time()+1 end if end for t=time()-t printf(1,"\rPacktime: %d Duration : %d \n",{packtime,t}) end for </CODE> very helpful to understand when/why/what long processes were doing & display in realtime any sizes etc I wanted to monitor. Pete