Re: Anyone want to write an "intelligent" mail filter?
- Posted by "Juergen Luethje" <j.lue at gmx.de> Nov 05, 2003
- 665 views
Irv wrote: > Every day I get more annoying SPAM e-mails. Currently it's running about 10 > spams to every valid e-mail. > > I'm tired of wading thru them, and I'd rather not download them at all. > My e-mail client can filter the messages by sender or subject, but most > spams now are written to get around those filters. > > One thing I notice is that nearly 100% of the spams either contain the > word "lagos" or long strings of "dictionary" words to confuse the filters: > > "indecisive constitute dakar summitry ajax beaver descendent withal > circumlocution asocial voluble inquire convolution replete hitler > commendation segregate cognition abstract eject disgustful" > > But very few or none of the more common shorter words that would likely > appear in a valid e-mail: "a, and, or, if, you, we, I, to, for, the, this, > that....." > > We should be able to come up with a routine which would analyze a given > text string and rank it according to its likelyhood of being a 'meaningful' > message. Then use that routine in an e-mail client to rank messages and > only download from the server those which appear to be 'real'. > > Ideas? I'm working at a program, that I actually call "MailFilter". It can change incoming and outgoing mails. The current version 0.40 has the following features: o incoming mail: - spam detection The program has 6 different built-in mechanisms for detecting spam. At least concerning the mails that I personally get, these mechanisms are better than the spam filters of my e-mail provider. A little "research" in September 2003 gave: # my mail provider: - 67% of the spam was correctly recognized as spam (= sensitivity) - 96% of the non-spam was correctly recognized as non-spam (= specificity) # MailFilter: - 77% sensitivity - 98% specificity Additionally to the built-in mechanisms (which I don't want to disclose in public at the moment, because I don't want that spammers know them), the user can put mail addresses of "good" senders in a whitelist, and mail addresses of spammers in a blacklist (wildcards allowed). There is also a list, where the user can enter her/his own spam keywords. An intelligent handling of these lists will increase sensitivity and specificity. If MailFilter has recognized a mail as spam, it can add a string of the user's choice to the beginning of the subject. Then your mail client e.g. can put such mails in a special folder. - web bugs If the user wants, MailFilter deletes so called web bugs. - security If the user wants, MailFilter deletes active contents of the mail (e.g. scripts). - HTML In "multipart/alternative" mail, MailFilter can delete unneeded HTML parts completely. - privacy MailFilter can delete the request for returning a receipt. - readability # MailFilter deletes garbage at the end of the mail, and of plain text MIME parts (blank lines, and lines that only contain ' ', '\t', '>', or '.'). # If the user specifies the beginning of 2 lines, then MailFilter searches for the last occurence of these lines in the mail. MailFilter deletes the 2 lines, and all lines between them. (I've specified "--^-,--^-" to delete the Topica footer). # MailFilter can repair broken subjects, e.g.: "RE: Re[3]: This and That" ==> "Re: This and That" # MailFilter can change any header field of the incoming mail. For example I use: ;-- Topica List-Help= List-Unsubscribe= X-Topica-Id= ;-- Ads X-HotPOP= ;-- Misc Errors-To= X-Accept-Language= X-Flags= X-MimeOLE= Because the right side of '=' is empty, the concerning header fields are deleted. This enhances the readability of the remaining header, and saves space on my hard disk. - safety MailFilter can make a backup copy of the original message, and also of the changed message. o outgoing mail: - MailFilter can change any header field of the outgoing mail. This can be used for privacy, e.g. if you don't want the whole world to know, what mail client you use. Or it can be used for special greetings (see the "X-Greetings" header field in my mails , or ... - backups MailFilter can make a backup copy of the original message, and also of the changed message. In summary, MailFilter is something like a mixture of parts of KorrNews <http://www.tglsoft.de/misc/hamtools_en.htm>, and parts of Benign <http://www.firetrust.com/products/benign/>. It runs stable on my Windows 98 system. Main disadvantages that I'm aware of: - MailFilter can handle any mail file on your hard disk, by calling e.g. MailFilter -in this.eml MailFilter -out that.eml But in order to handle your incoming and outgoing mail automatically, you have to install a local mail server (such as MorVer <http://www.morver.de/>). The local server then calls MailFilter. This also means that MailFilter cannot handle the mails on a remote mail server. Any mail must be downloaded to the hard disk first, so that MailFilter can access it. - Currently only partial support for mails, that contain *nested* mime parts. - No version for Linux/FreeBSD at the moment. - Some error messages currently in German. I'm already working at the documentation (in English). As soon as this and the translation of the error messages is finished, I'll offer MailFilter on my website for free download. Regards, Juergen -- /"\ ASCII ribbon campain | |\ _,,,---,,_ \ / against HTML in | /,`.-'`' -. ;-;;,_ X e-mail and news, | |,4- ) )-,_..;\ ( `'-' / \ and unneeded MIME | '---''(_/--' `-'\_)