1. too much memory use!

Eu took 29 minutes, 36 sec to execute the following program, and used 
142.7Megs of memory. The file it was reading is 12.1 megabytes. 

data = {}
datafile = open(data_noun,"u")
readline = gets(datafile) -- get a line
while not atom(readline) do
  while find(readline[length(readline)],{10,13,32}) do readline = 
readline[1..length(readline)-1] end while
  junk_s = parse(readline,32)
  data = data & {junk_s}
  readline = gets(datafile) -- get another line
end while
close(datafile)
trace(1) -- to hold the program while getting memory use data
abort(0)

What am i doing that runs a 12meg file up to 142.7megabytes? and takes 
1/2 hour to do it?

How can i say data = glom(file) ?

Kat

new topic     » topic index » view message » categorize

2. Re: too much memory use!

Hey Kat,

lets see, your looking for these 

{10,13,32} or {Line Feed, Carriage Return, Space}

right?

What if you tried using find like this for starters:

loc = find({10,13,32},readline[1..length(readline)])
readline = readline[1..loc]

instead of:

find(readline[length(readline)],{10,13,32}) 
readline = readline[1..length(readline)-1]

second, I have no idea what "parse" does
eg, junk_s = parse(readline,32)

third, you sure are making alot of sequence copy,
concat operations in this routine.

finally,

this is a bit strange coming from your caliber Kat.

while find(readline[length(readline)],{10,13,32}) do 
   readline = readline[1..length(readline)-1] 
end while

your saying while I find this {10,13,32} at the end
of readline copy readline[1..to length(readline)]-1
until I dont find it.....hmmmm Kat

think about it!

Euman
euman at bellsouth.net

Q: Are we monetarily insane?
A: YES

----- Original Message ----- 
From: "Kat" <gertie at PELL.NET>
To: "EUforum" <EUforum at topica.com>
Sent: Wednesday, February 20, 2002 12:17 AM
Subject: too much memory use!


> 
> Eu took 29 minutes, 36 sec to execute the following program, and used 
> 142.7Megs of memory. The file it was reading is 12.1 megabytes. 
> 
> data = {}
> datafile = open(data_noun,"u")
> readline = gets(datafile) -- get a line
> while not atom(readline) do
>   while find(readline[length(readline)],{10,13,32}) do readline = 
> readline[1..length(readline)-1] end while
>   junk_s = parse(readline,32)
>   data = data & {junk_s}
>   readline = gets(datafile) -- get another line
> end while
> close(datafile)
> trace(1) -- to hold the program while getting memory use data
> abort(0)
> 
> What am i doing that runs a 12meg file up to 142.7megabytes? and takes 
> 1/2 hour to do it?
> 
> How can i say data = glom(file) ?
> 
> Kat
> 
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

3. Re: too much memory use!

On 20 Feb 2002, at 1:59, euman at bellsouth.net wrote:

> 
> Hey Kat,
> 
> lets see, your looking for these 
> 
> {10,13,32} or {Line Feed, Carriage Return, Space}
> 
> right?
> 
> What if you tried using find like this for starters:
> 
> loc = find({10,13,32},readline[1..length(readline)])
> readline = readline[1..loc]

How do you know which of {10,13,32} is returned? The first match? I don't 
know, but it's likely to be the 32, not the 10.

> instead of:
> 
> find(readline[length(readline)],{10,13,32}) 
> readline = readline[1..length(readline)-1]
> 
> second, I have no idea what "parse" does
> eg, junk_s = parse(readline,32)

parse() is in strtok.e , in this case, it's breaking "1 2 3 4" into 
{"1","2","3","4"}. I need the bits later on, and it's better to break it once
now
than to break it repeatedly later.
 
> third, you sure are making alot of sequence copy,
> concat operations in this routine.
> 
> finally,
> 
> this is a bit strange coming from your caliber Kat.

Not when coding at 1am till 3am !

> while find(readline[length(readline)],{10,13,32}) do 
>    readline = readline[1..length(readline)-1] 
> end while
> 
> your saying while I find this {10,13,32} at the end
> of readline copy readline[1..to length(readline)]-1
> until I dont find it.....hmmmm Kat
> 
> think about it!

I did, i still don't see where the extra 80megs of memory is being used. Just 
gets()ing the file till -1, and adding it to data is using 60 megs of memory, 
altho it's fast, about 20 seconds. Even 60 megs sounds like a lot for a 
12meg file, since 32bits per char is 48megs. I am now watching memory use 
scale up slowly while parse() is running after gets()ing the entire file first.
I'd
shut it down and change the code, but i already have 4 of those leftover 
virtual machines cluttering up memory, and i don't want to add more.

Kat
 
> Euman
> euman at bellsouth.net
> 
> Q: Are we monetarily insane?
> A: YES
> 
> ----- Original Message ----- 
> From: "Kat" <gertie at PELL.NET>
> To: "EUforum" <EUforum at topica.com>
> Sent: Wednesday, February 20, 2002 12:17 AM
> Subject: too much memory use!
> 
> 
> > Eu took 29 minutes, 36 sec to execute the following program, and used 
> > 142.7Megs of memory. The file it was reading is 12.1 megabytes. 
> > 
> > data = {}
> > datafile = open(data_noun,"u")
> > readline = gets(datafile) -- get a line
> > while not atom(readline) do
> >   while find(readline[length(readline)],{10,13,32}) do readline = 
> > readline[1..length(readline)-1] end while
> >   junk_s = parse(readline,32)
> >   data = data & {junk_s}
> >   readline = gets(datafile) -- get another line
> > end while
> > close(datafile)
> > trace(1) -- to hold the program while getting memory use data
> > abort(0)
> > 
> > What am i doing that runs a 12meg file up to 142.7megabytes? and takes 
> > 1/2 hour to do it?
> > 
> > How can i say data = glom(file) ?
> > 
> > Kat
> > 
> > 
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

4. Re: too much memory use!

----- Original Message -----
From: "Kat" <gertie at PELL.NET>
To: "EUforum" <EUforum at topica.com>
Subject: too much memory use!


>
> Eu took 29 minutes, 36 sec to execute the following program, and used
> 142.7Megs of memory. The file it was reading is 12.1 megabytes.
>
> data = {}
> datafile = open(data_noun,"u")
> readline = gets(datafile) -- get a line

Euphoria stores bytes unpacked in 4-byte integers. This means there's
usually four times as much memory used as the number of bytes read in. So
that increases your memory usage to 48 Meg already.

> while not atom(readline) do
>   while find(readline[length(readline)],{10,13,32}) do readline =
> readline[1..length(readline)-1] end while

I can't see why this would burn too much memory. It's quite a clever piece
of code, much like something I used to write whan I still had a brain.

It has a bug though. What if readline is all spaces? Eventually there'll be
an access of readline[0].

This is equivalent, and might not chew as much memory:

integer i
i = length(readline)
while i > 0 and find(readline[i], "\n\r ") do
    i -= 1
end while
readline = readline[1..i]

>   junk_s = parse(readline,32)

parse is an unknown quantity to me. Is it optimal for your needs, or would
you benefit from writing your own splitter?

>   data = data & {junk_s}

I'm not sure about whether Euphoria optimises this, so possibly use
  data &= {junk_s}
or
  data = append(data, junk_s) -- which I know _is_ optimised
instead.

Regardless; By the time you've read all your data in, the 48Meg plus
overheads for all the subsequences will quite easily add up to 60Meg.

>   readline = gets(datafile) -- get another line
> end while
> close(datafile)
> trace(1) -- to hold the program while getting memory use data

It could well be that the program uses less memory whan trace mode is off -
The opposite of the old "My program only works when debug mode is on!"

> abort(0)
>
> What am i doing that runs a 12meg file up to 142.7megabytes? and takes
> 1/2 hour to do it?

It depends whether Euphoria decides to swap to virtual memory. If it does,
it will seriously increase the runtime. On my system, it's 19 times more
time-expensive to read an integer from a sequence that overlaps disk than it
is from a sequence that's memory-only.

If the four-fold increase for reading bytes into integers bothers you, you
can always read your whole file into allocated memory and parse from there
instead, but that's a whole new can of worms :)

Carl

new topic     » goto parent     » topic index » view message » categorize

5. Re: too much memory use!

Kat,

try the following quick fix instead of your strtok code. Let me know if it's
any better, I have not got big enough text files to test it.

jiri

constant false = 0, true = 1
sequence data,word
atom t
integer c,f,inword

data = {}
word = {}
inword = false                  -- flag
f = open("kat1.eml", "rb")
c = getc(f)
while c != -1 do
    if find(c, {32,13,10}) then
        if inword then
            data = append(data, word)
            inword = false
            word = {}
        end if
    else
        inword = true
        word &= c
    end if
    c = getc(f)
end while
close(f)
if inword then                  -- flush
    data = append(data, word)
end if

----- Original Message -----
From: "Kat" <gertie at PELL.NET>
To: "EUforum" <EUforum at topica.com>
Sent: Wednesday, February 20, 2002 6:17 PM
Subject: too much memory use!


>
> Eu took 29 minutes, 36 sec to execute the following program, and used
> 142.7Megs of memory. The file it was reading is 12.1 megabytes.
>
> data = {}
> datafile = open(data_noun,"u")
> readline = gets(datafile) -- get a line
> while not atom(readline) do
>   while find(readline[length(readline)],{10,13,32}) do readline =
> readline[1..length(readline)-1] end while
>   junk_s = parse(readline,32)
>   data = data & {junk_s}
>   readline = gets(datafile) -- get another line
> end while
> close(datafile)
> trace(1) -- to hold the program while getting memory use data
> abort(0)
>
> What am i doing that runs a 12meg file up to 142.7megabytes? and takes
> 1/2 hour to do it?
>
> How can i say data = glom(file) ?
>
> Kat
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

6. Re: too much memory use!

Kat,

if you have a memory challenged system and want to avoid hard disk
thrashing, there is a number of different schemes you can use, one of
them is outlined below:

----------------------------------------------------------------------
Select a suitable word delimiter. I chose 'space' (ascii 32).

Determine size of the data file.

Allocate sufficient memory for output.

In a while loop
    isolate individual words
    poke them into the reserved memory
    separate them with your chosen delimiter
    update the memory pointer
and keep at it until the end of input file is reached.

Flush the system.
----------------------------------------------------------------------

I am not sure what you intend to do with the data, but it is basically
trivial to read it back into a sequence and write it to a file, as a
whole or even as individual tokens.

I hope this makes some sense to you.

jiri

----------------------------------------------------------------------
include machine.e               -- allocate
include file.e                  -- seek, where

constant false = 0, true = 1
constant d = 32                 -- space chosen as delimiter
sequence word
atom a,p
integer c,e,f,inword,size

f = open("kat1.eml", "rb")

-- get file size
e = seek(f, -1)                 -- go to end of file
size = where(f)
e = seek(f, 0)                  -- go to back to start of file

-- allocate memory for output
a = allocate(size + 1)
if a = 0 then
    puts(1, "Memory allocation failed...\n")
    abort(1)
end if

-- initialize
word = {}
inword = false                  -- flag
p = a                           -- current memory pointer

-- main loop
c = getc(f)
while c != -1 do
    if find(c, {32,13,10}) then
        if inword then
            poke(p, word & d)
            p += length(word) + 1
            inword = false
            word = {}
        end if
    else
        inword = true
        word &= c
    end if
    c = getc(f)
end while
close(f)

-- flush
if inword then
    poke(p, word & d)
    p += length(word) + 1
end if

-- a little test - DO NOT try it with a 12 Mb file !
puts(1, peek({a, p-a}))

new topic     » goto parent     » topic index » view message » categorize

7. Re: too much memory use!

On 21 Feb 2002, at 2:30, Jiri Babor wrote:

> 
> Kat,
> 
> try the following quick fix instead of your strtok code. Let me know if it's
> any
> better, I have not got big enough text files to test it.
> 
> jiri
> 
> constant false = 0, true = 1
> sequence data,word
> atom t
> integer c,f,inword
> 
> data = {}
> word = {}
> inword = false                  -- flag
> f = open("kat1.eml", "rb")
> c = getc(f)
> while c != -1 do
>     if find(c, {32,13,10}) then
>         if inword then
>             data = append(data, word)
>             inword = false
>             word = {}
>         end if
>     else
>         inword = true
>         word &= c
>     end if
>     c = getc(f)
> end while
> close(f)
> if inword then                  -- flush
>     data = append(data, word)
> end if

Jiri, i gave it a try, it ran 1 hour and 24 minutes, and use 160megs of memory.


Modified a bit as follows:

data = {} -- the bulk file contents
readline = "" -- one of the lines in the file
word = ""
datafile = open(data_noun,"rb") -- 12megs

-- <Jiri's code>
-- corrected for Jiri's "data" vs my "readline"

c = getc(datafile)
while c != -1 do

  if equal(c,32)
    then
       if inword then
         readline = append(readline, word)
         inword = false
         inline = true
         word = {}
       end if
    elsif find(c,{10,13}) and ( inline = true )
      then
         data = append(data,readline)
         readline = ""
inline = false -- do not land here if there is a 13 and a 10 on the
         same
line!! 
         inword = false
    else
        inword = true
        inline = true
        word &= c
    end if

    c = getc(datafile)

end while

-- <end Jiri's code>

close(datafile)

I modified it because i need the data arranged like:

{data -- one sequence
   {readline},{readline}, -- 75,000 of them
     {word},{word}, -- 5..1000 of them per readline
} } }

{indexes -- one sequence
   {readline},{readline}, -- 145,000 of them
     {word},{word}, -- 5..1000 of them per readline
} } }


Then i re-index the whole mess. The indexes are the 8-digit words below, 
which need to be longer too, but that's another story. In pascal (on dos), i 
would have done this with a ramdrive and file pointers. The first Eu program 
did the same way, but no ramdrive (i figured it would confuse win95) and at 
the end of the first day, it was about line 1000 on the 75,000 line file. Since 
windoze won't run 75 days, this way of re-indexing was not only too slow, but 
too <expletive> slow, and meant syncing the files in memory with those on 
the drive periodically, and determining after reboots where it left off before
the
reboot. If i can't get this to run better, i will either forget indexing and do 
bruteforce searches (making for 30second lookup times), or look at the 
allocated memory schemes and any libs in the Eu archives to wrap them.

One file is arranged like (two lines picked at random):
00252962 04 n 04 decrease 0 diminution 0 reduction 0 step-down 0 027 @ 
00252809 n 0000 ! 00260981 n 0101 ~ 00253565 n 0000 ~ 00254314 n 0000 
~ 00254503 n 0000 ~ 00254762 n 0000 ~ 00254954 n 0000 ~ 00255044 n 
0000 ~ 00255167 n 0000 ~ 00255414 n 0000 ~ 00255692 n 0000 ~ 
00256172 n 0000 ~ 00256313 n 0000 ~ 00258079 n 0000 ~ 00259254 n 0000 
~ 00259472 n 0000 ~ 00260041 n 0000 ~ 00260158 n 0000 ~ 00260295 n 
0000 ~ 00260392 n 0000 ~ 00262205 n 0000 ~ 00309963 n 0000 ~ 
00814987 n 0000 ~ 00833309 n 0000 ~ 11005389 n 0000 ~ 11007893 n 0000 
~ 11061536 n 0000 | the act of decreasing or reducing something  
00253565 04 n 01 cut 5 006 @ 00252962 n 0000 ~ 00253801 n 0000 ~ 
00253899 n 0000 ~ 00253992 n 0000 ~ 00254075 n 0000 ~ 00254227 n 0000 
| the act of reducing the amount or number; "the mayor proposed extensive 
cuts in the city budget"  

The other is arranged like (two lines picked at random):
amorousness n 2 2 @ ~ 2 0 06129685 06087784  
amorpha n 1 3 @ ~ #m 1 0 10220950  

One line may end in {32,32,10}, and the next may be {32,10} and the next 
may end in {}, so i must allow for anything. There are no null lines, or lines 
that trim() down to {}.

Kat

 
> ----- Original Message -----
> From: "Kat" <gertie at PELL.NET>
> To: "EUforum" <EUforum at topica.com>
> Sent: Wednesday, February 20, 2002 6:17 PM
> Subject: too much memory use!
> 
> 
> > Eu took 29 minutes, 36 sec to execute the following program, and used
> > 142.7Megs of memory. The file it was reading is 12.1 megabytes.
> >
> > data = {}
> > datafile = open(data_noun,"u")
> > readline = gets(datafile) -- get a line
> > while not atom(readline) do
> >   while find(readline[length(readline)],{10,13,32}) do readline =
> > readline[1..length(readline)-1] end while
> >   junk_s = parse(readline,32)
> >   data = data & {junk_s}
> >   readline = gets(datafile) -- get another line
> > end while
> > close(datafile)
> > trace(1) -- to hold the program while getting memory use data
> > abort(0)
> >
> > What am i doing that runs a 12meg file up to 142.7megabytes? and takes
> > 1/2 hour to do it?
> >
> > How can i say data = glom(file) ?
> >
> > Kat
> >
> >
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

8. Re: too much memory use!

Kat, I found the very simple:

<CODE>
object t, t1

t=time() t1=t+1

for j=1to length(f2s) do
	for i=1 to length(f1s) do
		......
		out_diff(r)
		if t1<time() then
			printf(1,"\rPacktime[%d][%d]: %d Duration : %d
",{i,j,packtime,time()-t})
			t1=time()+1
		end if
	end for
	t=time()-t
	printf(1,"\rPacktime: %d Duration : %d          \n",{packtime,t})
end for
</CODE>

very helpful to understand when/why/what long processes were doing &
display in realtime any sizes etc I wanted to monitor.

Pete

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu