Re: 20gigabyte string searching

new topic     » goto parent     » topic index » view thread      » older message » newer message

----- Original Message ----- 
From: "Pete Lomax"
Sent: Sunday, October 03, 2004 5:38 AM
Subject: Re: 20gigabyte string searching


> 
> 
> On Sat,  2 Oct 2004 22:26:26 +0000, Mike <vulcan at win.co.nz> wrote:
> 
> >Depending on the hash table quality there might be, say, 2 or 3 
> >disk movements per URL search.
> >
> >Mike
> >
> >PS: This idea seems similar to the one Pete posted but the index is 
> >managed in RAM so performance should be much better (at the expense of 
> >huge amounts of RAM).
> 
> I thought about that, with a hash table of a million entries on disk,
> and 15 million urls, if the hash function does its job, the average
> bucket size will be 15. Either way, (15 or 2) local disk accesses will
> probably be faster than reading a 100-byte url from the interweb.
> 
> Pete
> 

What would be the total size of the hash table?
Do you have to create the entire hash table in RAM or can it be a build
to file as you go along?

kat, I onced devised a sorting solution for such large lists.  I've never
posted the process or exposed it to anyone.  Shortly after I come up with
it I assumed it to be so simple in design that everyone must already know
how to do it.  I will post the idea later.  I have other things to tend to. :/

    unkmar

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu