Out of memory error

new topic     » goto parent     » topic index » view thread      » older message » newer message

Larry D Poos writes:
> When I ran it in a DOS window with Win3.1 and it read the entire file
> but the memory swapping during sort is unacceptable. The program ran
> for 35 minutes (okay so I went to lunch during the run but I figured
> it would finish and the prompt would be waiting on me when I got back)
> before I killed it.
> If I reduced the size of the input file to around 1.4M it takes
> roughly 203 seconds to sort the data running from DOS. Haven't tried
> this in Windows yet.

The filesort.ex demo program (less than 20 lines of code) is pretty fast
at sorting data up to a "reasonable" size. If you want to push the limits
of your machine's memory you should consider the following:

   1. Euphoria stores all integer values in 4 bytes. i.e. characters
      are stored in 4 bytes (not 1). This might change in the future,
      but for now it makes the implementation simple and fast.

   2. filesort.ex makes a full copy of the input data, i.e.

             sorted_buffer = sort(buffer)

      creates a second sorted copy of the input data. If
      you were to use something like great_sort of allsorts.ex you
      could sort buffer in-place.

   3. The disk swapping feature of the DOS extender has its limitations,
      as you've discovered. Some algorithms, such as Euphoria's standard
      shell sort (sort.e) are not well suited to a least-recently-used swapping
      algorithm.If your algorithm sweeps through entire large sequences, doing
      just a little bit of work on each element, you will not benefit from LRU.
      You'll be frequently accessing the disk and performance can drop by orders
      of magnitude. I once experimented a bit with this, figuring that a
      "divide and conquer" type of sort, such as quick sort (allsorts.ex) might
      be better suited to LRU swapping. I think the answer was yes, but it was
      still pretty slow.

    4. For Euphoria Gurus Only: If you have lots of duplicate lines of input,
       you might try matching each incoming line against lines you've already
       read in. Then you can do:
             buffer = append(buffer, buffer[i])
       instead of:
             buffer = append(buffer, line)
       whenever the input line and buffer[i] are the same.
       In this way Euphoria will only create one copy of that line,
       and will point to it from many elements of buffer, saving memory.

       You'll need a very fast check for duplicates, or this will
       be slow.


Good Luck,
   Rob Craig
   Rapid Deployment Software

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu