OpenEuphoria: Forum: Re: internal storage

Re: internal storage

new topic » goto parent » topic index » view thread » older message » newer message

Posted by Jim Hendricks <jim at bizcomputinginc.com> Sep 20, 2004
425 views

Michael Raley wrote:
> 
> 
> Jim Hendricks wrote:
> <snip>
> > My question is are string sequences then stored as 4 byte atoms or as 
> > 1 byte atoms?
> <snip>
> Yes, string sequences as stated are really just numeric sequences. 
> {65,66,67,3265} is internally the same as "ABC" & 3265
> You can sorta obscure plain text passwords in source files by writting 
> them in sequence format 
> 
>  i.e. pwd = {325,330,335}/5 
> 
> There are two workarounds that I tried because I had to split 24 megabyte 
> revenue usage reports apart, creating a new subreport for each 
> revenue department key found in the page headers. This would run very slow
> on the old win 95 machines it had to be run on, constantly chugging through 
> virtual memory. 
> 
> The first was to try to internally pack three atoms into one integer 
> (the library is in the archives) 
> so incompress({65,66,67}) would return {656667}
Yes, that was suggested by someone else, but if I were to go with all the
work of packing/unpacking I would stuff 4 chars per atom unless there's some
limitation on use of all 32 bits of an integer atom.
 
> which runs pretty slow too with really big sequences.
> 
> The faster way was to create a 'bucket' routine to 
> handle the file input, which allows me to set a limit on
> the amount of text held in memory, 100,000 lines or so.
> 
> The splitter routine does not read the file directly,
> it reads from a buffer sequence until it reaches the end of the bucket,
> and calls the bucket fill procedure again.
Yes, this approach is used in many apps as buffered IO.  This rightly 
assumes that IO is the performance bottleneck and reading many bytes 
takes only slightly longer than reading 1 byte since the performance hit is
in clearing the channel, positioning the head, etc.  This is where 
buffering on a cluster boundary gives the best performance kick since
clusters are contiguous on the HDD and therefore can be read in 1 pass. Of
course this same situation exists for write only worse since write is a 
more costly operation than read.

As I stated in a previous post, my app stores all the data in Memory 
primarily because some of the information I need to properly process the
data is not known until the whole file has been read. I may do well to go
with a multipass process whereby my first pass obtains the metadata info
necessary to process the file and then the second pass does the actual
processing.

Jim

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: internal storage

Search

Include:

Quick Links

User menu

Misc Menu