20gigabyte string searching

new topic     » topic index » view thread      » older message » newer message

Serious question: what's the best way to search twenty gigabytes of text for 
a 100byte substring, using Euphoria? Keep in mind Eu will blow up 20gigs to 
80 in ram, and each copy made to munge it is another 80 gigs, so loading 
the file into a sequence isn't possible. I foresee tons of disk thrashing, as i 
don't have gigs of ram laying around..

The two gigs can be formed like this:
{"string-1","string-2"...."string-n"}
where each string-x is 20 to 120 chars long, most will be in the 100 
character neighborhood. Chances of them being sorted is low, as i don't see 
how Eu can be used to sort them in my lifetime.
or they can be like this:
{string-1\nstring-2\nstring3\n...string-n}

I have a list of 150 million string-xs laying here, and don't know the best way 
to put them together so solve the search problem. Flat sequence with 
separators, or nested sequence? Having a parallel sequence of what was 
found would be terribly nice, but would be equally huge in count (altho 
shouldn't be as big absolutely).

Kat

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu