Re: Question About Web Site Speed
- Posted by Robert Craig <rds at RapidEuphoria.com> Jul 14, 2005
- 479 views
Kat wrote: > I add some 500+ new files per day with one app (averaging about 750K bytes > per day), and 5,800 files with another app (averaging 15 megabytes per day). > And there's other apps running too, but are comparatively slower adding > things. The files are munged before saving, and duplicates are not added. No > problem indexing more than the "posted by" field here, i index ALL the words > by filename and count and frequency/file and global frequency. And the neat > thing is, a search is instant, and i don't need to load even ONE megabyte > into memory. > > Perhaps the problem with using Eu as a web server database isn't with that > per se, it's more with loading a 100megabyte db in Eu (making it 400megs > for lack of a string type) each time someone does a search can quickly run > out of memory on any reasonable (/ly priced) server shell. And doing gets() > on a brute force search of a 100 meg file (RobC said his program does that!?) > takes too long. But what do i know, i play with a 14 gigabyte database and > want "goto" and string types added to Eu. For the record: I don't use a database. (EDS didn't exist when I started writing this thing.) I use gets() to read in each line, but I only keep one line at a time in memory. There's no need to load the whole 100Mb into memory. I don't even store the "hit" messages in memory. I do have an index of the 1000 most-recently searched-for words. If it's one of those words, I don't have to read any messages (except new messages that came in after that word was last searched). I also keep track of all 2-letter combinations in each message. If a message doesn't have all the 2-letter combinations needed by a search word, I don't search that message. I use seek() to skip ahead. e.g. if searching for EUPHORIA, a message must contain somewhere: EU UP PH HO OR RI and IA, or I won't look at it. The worst case (and this happens fairly often) is when someone searches for words that aren't in the 1000-word index, and don't have any unusual letter combinations. e.g. if you search for a number (e.g. 123456) that isn't in the 1000-word cache, I will have to read every line of every message. These days that only takes a few seconds. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com