Re: 20gigabyte string searching

new topic     » goto parent     » topic index » view thread      » older message » newer message

People wrote:

> > > Serious question: what's the best way to search twenty gigabytes of text
> > > for a
> > > 100byte substring, using Euphoria?
> > Would it be easier to create an index (like a b-tree or something),
> > then use that for searching?
> Sure! Where? 
> 20gig in a index in an Eu sequence is 80gigs.

I mean like this:

{ "This", "Is an example", "Might not be", "Accurate" }

Index would be:

{
 {"A", {4}},
 {"B", {}},
...
 {"I", {2}},
...
 {"M", {3}},
...
 {"T", {1}}
}

So if you search for "Might not be," it searches the index first for 'M',
finds all instances in the list where the searchable items start with 'M'
and finds one at 3, then just test the rest.

Or, even an index like this:

{ "T", "IAE", "MNB", "A" }

So if I search for "is an example," it looks for the first-letter-of-each-
word pattern of "IAE" and finds it.

Either one of these methods will reduce the required storage for a
regular index of words, plus it will speed up the searches significantly.
I think. I could be wrong. It's late. :)

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu