RE: Anyone want to write an "intelligent" mail filter?
- Posted by "Ricardo Forno" <rmforno at tutopia.com> Nov 05, 2003
- 620 views
Rob: Is there some SPAM filter that not only identifies SPAM, but also avoids downloading them from the server and deletes them in the server? I'm asking this because I have a 56K connection, and SPAM consumes a big part of connection time (I have to pay for it). A few days ago, I had to resort to change my e-mail address from rforno to rmforno, in view of increasing SPAM. Regards. ----- Original Message ----- From: Robert Craig <rds at RapidEuphoria.com> To: <EUforum at topica.com> Sent: Wednesday, November 05, 2003 12:11 AM Subject: Re: Anyone want to write an "intelligent" mail filter? > > > Irv Mullins wrote: > > Every day I get more annoying SPAM e-mails. Currently it's running about 10 > > spams to every valid e-mail. > > > > I'm tired of wading thru them, and I'd rather not download them at all. > > My e-mail client can filter the messages by sender or subject, but most > > spams now are written to get around those filters. > > > > One thing I notice is that nearly 100% of the spams either contain the > > word "lagos" or long strings of "dictionary" words to confuse the filters: > > > > "indecisive constitute dakar summitry ajax beaver descendent withal > > circumlocution asocial voluble inquire convolution replete hitler > > commendation segregate cognition abstract eject disgustful" > > > > But very few or none of the more common shorter words that would likely > > appear in a valid e-mail: "a, and, or, if, you, we, I, to, for, the, this, > > that....." > > > > We should be able to come up with a routine which would analyze a given > > text string and rank it according to its likelyhood of being a 'meaningful' > > message. Then use that routine in an e-mail client to rank messages and > > only download from the server those which appear to be 'real'. > > > > Ideas? > > For the past few months I've been using the e-mail > client in Netscape 7.1. It has a "Bayesian" spam filter > that adapts to the streams of spam and normal mail > that you receive. It works pretty well. > > It keeps track of all the words in your incoming e-mail, > and notes how often each word appears > in spam vs normal mail. For example, the word "Euphoria" > might have appeared in 1 of my spam messages and > 99 of my normal messages, so if it sees "Euphoria" in a > message, that would indicate a 99% probability that this > is a normal message. But it doesn't just look at one word. > I believe it looks at the 20 or so words in each message > with the most extreme probabilities. It uses a formula from > Bayesian statistics to combine the probability indicated > by each word into a single overall probability. e.g. > if you had a word that indicated "90% likely to be spam" > and another that said "95% likely to be spam", the result of > combining those two words might be 97% (or something). > It will move a message out of your inbox into a spam folder > if the probability of it being spam is quite high, > something like 99%. Obviously you want to keep false > positives (real mail tagged as spam) to an absolute minimum. > > In practice, over a long period of time, > suppose I get 1000 messages of which 900 are spam. It will > probably move about 800 of the spams and 1 or 2 of the > non-spams into my spam folder. > > With each batch of incoming mail, I check the spam folder > for non-spams, but usually I can quickly see from the > subjects and senders that there aren't any non-spams, so I > click a button to quickly delete all the spams in one > operation. > > Whenever it tags a message incorrectly (usually spam > that it missed), you can click a button to tell it so. > This way it gradually learns and gets smarter. > > Also, e-mail from anyone in my address book is automatically > considered non-spam, so the false positives are quite low. > > Being able to delete a whole bunch of spams in one > operation saves time. It's also nice that it keeps > my inbox largely clear of distracting spam clutter. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > > TOPICA - Start your own email discussion group. FREE! > >