Re: internal storage
- Posted by Jim Hendricks <jim at bizcomputinginc.com> Sep 20, 2004
- 425 views
Michael Raley wrote: > > > Jim Hendricks wrote: > <snip> > > My question is are string sequences then stored as 4 byte atoms or as > > 1 byte atoms? > <snip> > Yes, string sequences as stated are really just numeric sequences. > {65,66,67,3265} is internally the same as "ABC" & 3265 > You can sorta obscure plain text passwords in source files by writting > them in sequence format > > i.e. pwd = {325,330,335}/5 > > There are two workarounds that I tried because I had to split 24 megabyte > revenue usage reports apart, creating a new subreport for each > revenue department key found in the page headers. This would run very slow > on the old win 95 machines it had to be run on, constantly chugging through > virtual memory. > > The first was to try to internally pack three atoms into one integer > (the library is in the archives) > so incompress({65,66,67}) would return {656667} Yes, that was suggested by someone else, but if I were to go with all the work of packing/unpacking I would stuff 4 chars per atom unless there's some limitation on use of all 32 bits of an integer atom. > which runs pretty slow too with really big sequences. > > The faster way was to create a 'bucket' routine to > handle the file input, which allows me to set a limit on > the amount of text held in memory, 100,000 lines or so. > > The splitter routine does not read the file directly, > it reads from a buffer sequence until it reaches the end of the bucket, > and calls the bucket fill procedure again. Yes, this approach is used in many apps as buffered IO. This rightly assumes that IO is the performance bottleneck and reading many bytes takes only slightly longer than reading 1 byte since the performance hit is in clearing the channel, positioning the head, etc. This is where buffering on a cluster boundary gives the best performance kick since clusters are contiguous on the HDD and therefore can be read in 1 pass. Of course this same situation exists for write only worse since write is a more costly operation than read. As I stated in a previous post, my app stores all the data in Memory primarily because some of the information I need to properly process the data is not known until the whole file has been read. I may do well to go with a multipass process whereby my first pass obtains the metadata info necessary to process the file and then the second pass does the actual processing. Jim