Out of memory error
Larry D Poos writes:
> When I ran it in a DOS window with Win3.1 and it read the entire file
> but the memory swapping during sort is unacceptable. The program ran
> for 35 minutes (okay so I went to lunch during the run but I figured
> it would finish and the prompt would be waiting on me when I got back)
> before I killed it.
> If I reduced the size of the input file to around 1.4M it takes
> roughly 203 seconds to sort the data running from DOS. Haven't tried
> this in Windows yet.
The filesort.ex demo program (less than 20 lines of code) is pretty fast
at sorting data up to a "reasonable" size. If you want to push the limits
of your machine's memory you should consider the following:
1. Euphoria stores all integer values in 4 bytes. i.e. characters
are stored in 4 bytes (not 1). This might change in the future,
but for now it makes the implementation simple and fast.
2. filesort.ex makes a full copy of the input data, i.e.
sorted_buffer = sort(buffer)
creates a second sorted copy of the input data. If
you were to use something like great_sort of allsorts.ex you
could sort buffer in-place.
3. The disk swapping feature of the DOS extender has its limitations,
as you've discovered. Some algorithms, such as Euphoria's standard
shell sort (sort.e) are not well suited to a least-recently-used swapping
algorithm.If your algorithm sweeps through entire large sequences, doing
just a little bit of work on each element, you will not benefit from LRU.
You'll be frequently accessing the disk and performance can drop by orders
of magnitude. I once experimented a bit with this, figuring that a
"divide and conquer" type of sort, such as quick sort (allsorts.ex) might
be better suited to LRU swapping. I think the answer was yes, but it was
still pretty slow.
4. For Euphoria Gurus Only: If you have lots of duplicate lines of input,
you might try matching each incoming line against lines you've already
read in. Then you can do:
buffer = append(buffer, buffer[i])
instead of:
buffer = append(buffer, line)
whenever the input line and buffer[i] are the same.
In this way Euphoria will only create one copy of that line,
and will point to it from many elements of buffer, saving memory.
You'll need a very fast check for duplicates, or this will
be slow.
Good Luck,
Rob Craig
Rapid Deployment Software
|
Not Categorized, Please Help
|
|