20gigabyte string searching
- Posted by "Kat" <gertie at visionsix.com> Oct 02, 2004
- 491 views
Serious question: what's the best way to search twenty gigabytes of text for a 100byte substring, using Euphoria? Keep in mind Eu will blow up 20gigs to 80 in ram, and each copy made to munge it is another 80 gigs, so loading the file into a sequence isn't possible. I foresee tons of disk thrashing, as i don't have gigs of ram laying around.. The two gigs can be formed like this: {"string-1","string-2"...."string-n"} where each string-x is 20 to 120 chars long, most will be in the 100 character neighborhood. Chances of them being sorted is low, as i don't see how Eu can be used to sort them in my lifetime. or they can be like this: {string-1\nstring-2\nstring3\n...string-n} I have a list of 150 million string-xs laying here, and don't know the best way to put them together so solve the search problem. Flat sequence with separators, or nested sequence? Having a parallel sequence of what was found would be terribly nice, but would be equally huge in count (altho shouldn't be as big absolutely). Kat