Re: Question About Web Site Speed

new topic     » goto parent     » topic index » view thread      » older message » newer message

Kat wrote:
> I add some 500+ new files per day with one app (averaging about 750K bytes 
> per day), and 5,800 files with another app (averaging 15 megabytes per day). 
> And there's other apps running too, but are comparatively slower adding 
> things. The files are munged before saving, and duplicates are not added. No 
> problem indexing more than the "posted by" field here, i index ALL the words 
> by filename and count and frequency/file and global frequency. And the neat 
> thing is, a search is instant, and i don't need to load even ONE megabyte 
> into memory.
> 
> Perhaps the problem with using Eu as a web server database isn't with that 
> per se, it's more with loading a 100megabyte db in Eu (making it 400megs 
> for lack of a string type) each time someone does a search can quickly run 
> out of memory on any reasonable (/ly priced) server shell. And doing gets() 
> on a brute force search of a 100 meg file (RobC said his program does that!?) 
> takes too long. But what do i know, i play with a 14 gigabyte database and 
> want "goto" and string types added to Eu.

For the record: 

I don't use a database. (EDS didn't exist when I started
writing this thing.)

I use gets() to read in each line, but I only keep one line 
at a time in memory. There's no need to load the whole 
100Mb into memory. I don't even store the "hit" messages in memory.

I do have an index of the 1000 most-recently searched-for words.
If it's one of those words, I don't have to read any messages
(except new messages that came in after that word was last searched).

I also keep track of all 2-letter combinations in each message.
If a message doesn't have all the 2-letter combinations needed
by a search word, I don't search that message. I use seek() to skip ahead.
e.g. if searching for EUPHORIA, a message must contain somewhere:
EU UP PH HO OR RI and IA, or I won't look at it.

The worst case (and this happens fairly often)
is when someone searches for words that aren't in
the 1000-word index, and don't have any unusual letter combinations.
e.g. if you search for a number (e.g. 123456) that isn't in the 
1000-word cache, I will have to read every line of every message.
These days that only takes a few seconds.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu