OpenEuphoria: Forum: Re: 20gigabyte string searching

Re: 20gigabyte string searching

new topic » goto parent » topic index » view thread » older message » newer message

Posted by "Kat" <gertie at visionsix.com> Oct 02, 2004
443 views

On 1 Oct 2004, at 19:15, cklester wrote:

> 
> 
> posted by: cklester <cklester at yahoo.com>
> 
> People wrote:
> 
> > > > Serious question: what's the best way to search twenty gigabytes of text
> > > > for a 100byte substring, using Euphoria?
> > > Would it be easier to create an index (like a b-tree or something),
> > > then use that for searching?
> > Sure! Where? 
> > 20gig in a index in an Eu sequence is 80gigs.
> 
> I mean like this:
> 
> { "This", "Is an example", "Might not be", "Accurate" }
> 
> Index would be:
> 
> {
>  {"A", {4}},
>  {"B", {}},
> ...
>  {"I", {2}},
> ...
>  {"M", {3}},
> ...
>  {"T", {1}}
> }
> 
> So if you search for "Might not be," it searches the index first for 'M',
> finds all instances in the list where the searchable items start with 'M'
> and finds one at 3, then just test the rest.
> 
> Or, even an index like this:
> 
> { "T", "IAE", "MNB", "A" }
> 
> So if I search for "is an example," it looks for the first-letter-of-each-
> word pattern of "IAE" and finds it.
> 
> Either one of these methods will reduce the required storage for a
> regular index of words, plus it will speed up the searches significantly.
> I think. I could be wrong. It's late. :)

Ok, you got the index, where's the 20gig file you are indexing into?

Kat

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: 20gigabyte string searching

Search

Include:

Quick Links

User menu

Misc Menu