Re: 20gigabyte string searching
- Posted by cklester <cklester at yahoo.com> Oct 02, 2004
- 478 views
People wrote: > > > Serious question: what's the best way to search twenty gigabytes of text > > > for a > > > 100byte substring, using Euphoria? > > Would it be easier to create an index (like a b-tree or something), > > then use that for searching? > Sure! Where? > 20gig in a index in an Eu sequence is 80gigs. I mean like this: { "This", "Is an example", "Might not be", "Accurate" } Index would be: { {"A", {4}}, {"B", {}}, ... {"I", {2}}, ... {"M", {3}}, ... {"T", {1}} } So if you search for "Might not be," it searches the index first for 'M', finds all instances in the list where the searchable items start with 'M' and finds one at 3, then just test the rest. Or, even an index like this: { "T", "IAE", "MNB", "A" } So if I search for "is an example," it looks for the first-letter-of-each- word pattern of "IAE" and finds it. Either one of these methods will reduce the required storage for a regular index of words, plus it will speed up the searches significantly. I think. I could be wrong. It's late. :) -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/