Re: 20gigabyte string searching

new topic     » goto parent     » topic index » view thread      » older message » newer message

On 1 Oct 2004, at 19:15, cklester wrote:

> 
> 
> posted by: cklester <cklester at yahoo.com>
> 
> People wrote:
> 
> > > > Serious question: what's the best way to search twenty gigabytes of text
> > > > for a 100byte substring, using Euphoria?
> > > Would it be easier to create an index (like a b-tree or something),
> > > then use that for searching?
> > Sure! Where? 
> > 20gig in a index in an Eu sequence is 80gigs.
> 
> I mean like this:
> 
> { "This", "Is an example", "Might not be", "Accurate" }
> 
> Index would be:
> 
> {
>  {"A", {4}},
>  {"B", {}},
> ...
>  {"I", {2}},
> ...
>  {"M", {3}},
> ...
>  {"T", {1}}
> }
> 
> So if you search for "Might not be," it searches the index first for 'M',
> finds all instances in the list where the searchable items start with 'M'
> and finds one at 3, then just test the rest.
> 
> Or, even an index like this:
> 
> { "T", "IAE", "MNB", "A" }
> 
> So if I search for "is an example," it looks for the first-letter-of-each-
> word pattern of "IAE" and finds it.
> 
> Either one of these methods will reduce the required storage for a
> regular index of words, plus it will speed up the searches significantly.
> I think. I could be wrong. It's late. :)

Ok, you got the index, where's the 20gig file you are indexing into?

Kat

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu