RE: 20gigabyte string searching
- Posted by Terry Constant <EUforum at terryconstant.com> Oct 02, 2004
- 449 views
Kat, I am just replying in general to the things about your searching, no one particular post. My idea/question is: Could you use multiple text files for your lists? As you gather the lists, put them into files such as a.txt, b.txt, ..., z.txt (or whatever naming scheme that works.) Possibly(?) sort each file as you add to it (you could keep file sizes down by breaking into smaller chunks if needed, such as, ..., mm.txt, mz.txt, ...). But even if you do not sort each file, you at least have a rough sort by first letter, which makes the size for an individual search smaller. The obvious advantage of sorting the individual files is that you can use binary search algorithms. Let your search alogorithms, select the file to read. Thus, you do not have to search through 20 gb of text and some of your problems go away. Further, much of your cpu time is done incrementally as the text is added to the separate files. I am sure that I don't fully grasp what you are doing, so I just offer this idea in case it is of some use. Terry Constant