RE: 20gigabyte string searching

new topic     » goto parent     » topic index » view thread      » older message » newer message

Kat,

I am just replying in general to the things about your searching, no one 
particular post.

My idea/question is:

Could you use multiple text files for your lists? As you gather the
lists, put them into files such as a.txt, b.txt, ..., z.txt (or
whatever naming scheme that works.) Possibly(?) sort each file as you add
to it (you could keep file sizes down by breaking into smaller chunks
if needed, such as, ..., mm.txt, mz.txt, ...). But even if you do not sort
each file, you at least have a rough sort by first letter, which makes 
the size for an individual search smaller. The obvious advantage of sorting
the individual files is that you can use binary search algorithms.

Let your search alogorithms, select the file to read. Thus, you do not
have to search through 20 gb of text and some of your problems go away.
Further, much of your cpu time is done incrementally as the text is added
to the separate files.

I am sure that I don't fully grasp what you are doing, so I just offer
this idea in case it is of some use.

Terry Constant

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu