Re: Question About Web Site Speed
Kat wrote:
> I add some 500+ new files per day with one app (averaging about 750K bytes
> per day), and 5,800 files with another app (averaging 15 megabytes per day).
> And there's other apps running too, but are comparatively slower adding
> things. The files are munged before saving, and duplicates are not added. No
> problem indexing more than the "posted by" field here, i index ALL the words
> by filename and count and frequency/file and global frequency. And the neat
> thing is, a search is instant, and i don't need to load even ONE megabyte
> into memory.
>
> Perhaps the problem with using Eu as a web server database isn't with that
> per se, it's more with loading a 100megabyte db in Eu (making it 400megs
> for lack of a string type) each time someone does a search can quickly run
> out of memory on any reasonable (/ly priced) server shell. And doing gets()
> on a brute force search of a 100 meg file (RobC said his program does that!?)
> takes too long. But what do i know, i play with a 14 gigabyte database and
> want "goto" and string types added to Eu.
For the record:
I don't use a database. (EDS didn't exist when I started
writing this thing.)
I use gets() to read in each line, but I only keep one line
at a time in memory. There's no need to load the whole
100Mb into memory. I don't even store the "hit" messages in memory.
I do have an index of the 1000 most-recently searched-for words.
If it's one of those words, I don't have to read any messages
(except new messages that came in after that word was last searched).
I also keep track of all 2-letter combinations in each message.
If a message doesn't have all the 2-letter combinations needed
by a search word, I don't search that message. I use seek() to skip ahead.
e.g. if searching for EUPHORIA, a message must contain somewhere:
EU UP PH HO OR RI and IA, or I won't look at it.
The worst case (and this happens fairly often)
is when someone searches for words that aren't in
the 1000-word index, and don't have any unusual letter combinations.
e.g. if you search for a number (e.g. 123456) that isn't in the
1000-word cache, I will have to read every line of every message.
These days that only takes a few seconds.
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
|
Not Categorized, Please Help
|
|