Re: EDS database speed questions
- Posted by AlanO Sep 01, 2008
- 1109 views
OK, here is the background:
There is a database of mainframe tapes. The tape names are all fixed 6 char length, but typically its in "ranges" where a range might be the first 2 chars alpha, and the remaining 4 chars are numeric. The tapes themselves can contain multiple files; each file has a filename, filenumber, write date, time, etc quite a few fields. When a mainframe writes to tape, it is allowed to write to any tape thats part of the allowed range, where the particular tape is in "scratch" status. Scratch status means that the tape no longer contains data that has to be retained. What has to be scratched and when is the resposibility of the tape management software, of which there are a few software vendors (IBM, CA, BMC). Now, when a file thats being written, cannot fit onto the current tape, it overflows to another scratch tape. So any tape could potentially be multifile and/or multi-tape overflowed (called multi-volume in IBM speak). The tape database should have pointers for the multivol files, in that if file 17 on say tape X00020 overflowed onto tapes X01056 and X00055, then each of these tapes mentioned should have pointers indicating what the previous and next tapes are. In theory, these pointers should be correct! But they are not, and the customer is not going to fix it, just to make my programs' life (and mine) easier.
My program generates mainframe code (called JCL) to copy these tape files. The user clicks on a few tapes, and my program generates the JCL to copy all the files on those tapes. My program has to follow the next/previous pointers (called chains) and not allow a copy if there is a fault in the chain. Doing so could be a disaster - imagine mixing two different years' payroll into a single output file - what happens when this gets read into a application? I also keep logs and generate updates for the tape database so that the user knows what was copied, and to where. The total number of tape files is 2.2 million.
When I load the input file, (its a listing from the tape manager) I sort it first by tape name and file number. So finding the first file is easy... but due to the fact that the next tape in the chain could be anywhere in the listing, its a true random access type issue.
BTW, the need for all this is IBM dropping support for old (small capacity) tapes. The customers need to roll up thousands of small files onto single new tapes. The new tapes can be 1 terabyte (1024 Gigabyte) capacity. They are also expensive, so the customer wants to fit as much data on a tape as he can. The support is being dropped in December so there is a time factor.
So far my program is 3000 lines exactly, excluding win32lib etc. A lot of that code is (user and data) error checking.
I initially had the input file as a big sequence, but at 2.2 million records of 67 bytes each, I ran out of memory! My program does work, but obviously I want to make it as fast as possible.
If you are still awake, thanks ;))
Regards Alan