Re: 20gigabyte string searching
- Posted by "Kat" <gertie at visionsix.com> Oct 02, 2004
- 443 views
On 1 Oct 2004, at 19:15, cklester wrote: > > > posted by: cklester <cklester at yahoo.com> > > People wrote: > > > > > Serious question: what's the best way to search twenty gigabytes of text > > > > for a 100byte substring, using Euphoria? > > > Would it be easier to create an index (like a b-tree or something), > > > then use that for searching? > > Sure! Where? > > 20gig in a index in an Eu sequence is 80gigs. > > I mean like this: > > { "This", "Is an example", "Might not be", "Accurate" } > > Index would be: > > { > {"A", {4}}, > {"B", {}}, > ... > {"I", {2}}, > ... > {"M", {3}}, > ... > {"T", {1}} > } > > So if you search for "Might not be," it searches the index first for 'M', > finds all instances in the list where the searchable items start with 'M' > and finds one at 3, then just test the rest. > > Or, even an index like this: > > { "T", "IAE", "MNB", "A" } > > So if I search for "is an example," it looks for the first-letter-of-each- > word pattern of "IAE" and finds it. > > Either one of these methods will reduce the required storage for a > regular index of words, plus it will speed up the searches significantly. > I think. I could be wrong. It's late. :) Ok, you got the index, where's the 20gig file you are indexing into? Kat