Out of memory error
- Posted by Robert Craig <robert_craig at COMPUSERVE.COM> Apr 25, 1997
- 561 views
Larry D Poos writes: > When I ran it in a DOS window with Win3.1 and it read the entire file > but the memory swapping during sort is unacceptable. The program ran > for 35 minutes (okay so I went to lunch during the run but I figured > it would finish and the prompt would be waiting on me when I got back) > before I killed it. > If I reduced the size of the input file to around 1.4M it takes > roughly 203 seconds to sort the data running from DOS. Haven't tried > this in Windows yet. The filesort.ex demo program (less than 20 lines of code) is pretty fast at sorting data up to a "reasonable" size. If you want to push the limits of your machine's memory you should consider the following: 1. Euphoria stores all integer values in 4 bytes. i.e. characters are stored in 4 bytes (not 1). This might change in the future, but for now it makes the implementation simple and fast. 2. filesort.ex makes a full copy of the input data, i.e. sorted_buffer = sort(buffer) creates a second sorted copy of the input data. If you were to use something like great_sort of allsorts.ex you could sort buffer in-place. 3. The disk swapping feature of the DOS extender has its limitations, as you've discovered. Some algorithms, such as Euphoria's standard shell sort (sort.e) are not well suited to a least-recently-used swapping algorithm.If your algorithm sweeps through entire large sequences, doing just a little bit of work on each element, you will not benefit from LRU. You'll be frequently accessing the disk and performance can drop by orders of magnitude. I once experimented a bit with this, figuring that a "divide and conquer" type of sort, such as quick sort (allsorts.ex) might be better suited to LRU swapping. I think the answer was yes, but it was still pretty slow. 4. For Euphoria Gurus Only: If you have lots of duplicate lines of input, you might try matching each incoming line against lines you've already read in. Then you can do: buffer = append(buffer, buffer[i]) instead of: buffer = append(buffer, line) whenever the input line and buffer[i] are the same. In this way Euphoria will only create one copy of that line, and will point to it from many elements of buffer, saving memory. You'll need a very fast check for duplicates, or this will be slow. Good Luck, Rob Craig Rapid Deployment Software