Re: 20gigabyte string searching
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Oct 03, 2004
- 441 views
On Sat, 2 Oct 2004 22:26:26 +0000, Mike <vulcan at win.co.nz> wrote: >Depending on the hash table quality there might be, say, 2 or 3 >disk movements per URL search. > >Mike > >PS: This idea seems similar to the one Pete posted but the index is >managed in RAM so performance should be much better (at the expense of >huge amounts of RAM). I thought about that, with a hash table of a million entries on disk, and 15 million urls, if the hash function does its job, the average bucket size will be 15. Either way, (15 or 2) local disk accesses will probably be faster than reading a 100-byte url from the interweb. Pete