Re: 20gigabyte string searching
- Posted by "Unkmar" <L3Euphoria at bellsouth.net> Oct 03, 2004
- 460 views
----- Original Message ----- From: "Pete Lomax" Sent: Sunday, October 03, 2004 5:38 AM Subject: Re: 20gigabyte string searching > > > On Sat, 2 Oct 2004 22:26:26 +0000, Mike <vulcan at win.co.nz> wrote: > > >Depending on the hash table quality there might be, say, 2 or 3 > >disk movements per URL search. > > > >Mike > > > >PS: This idea seems similar to the one Pete posted but the index is > >managed in RAM so performance should be much better (at the expense of > >huge amounts of RAM). > > I thought about that, with a hash table of a million entries on disk, > and 15 million urls, if the hash function does its job, the average > bucket size will be 15. Either way, (15 or 2) local disk accesses will > probably be faster than reading a 100-byte url from the interweb. > > Pete > What would be the total size of the hash table? Do you have to create the entire hash table in RAM or can it be a build to file as you go along? kat, I onced devised a sorting solution for such large lists. I've never posted the process or exposed it to anyone. Shortly after I come up with it I assumed it to be so simple in design that everyone must already know how to do it. I will post the idea later. I have other things to tend to. :/ unkmar