Re: Anyone want to write an "intelligent" mail filter?

new topic     » goto parent     » topic index » view thread      » older message » newer message

Irv wrote:

> Every day I get more annoying SPAM e-mails. Currently it's running about 10
> spams to every valid e-mail.
>
> I'm tired of wading thru them, and I'd rather not download them at all.
> My e-mail client can filter the messages by sender or subject, but most
> spams now are written to get around those filters.
>
> One thing I notice is that nearly 100% of the spams either contain the
> word "lagos" or long strings of "dictionary" words to confuse the filters:
>
> "indecisive constitute dakar summitry ajax beaver descendent withal
> circumlocution asocial voluble inquire convolution replete hitler
> commendation segregate cognition abstract eject disgustful"
>
> But very few or none of the more common shorter words that would likely
> appear in a valid e-mail: "a, and, or, if, you, we, I, to, for, the, this,
> that....."
>
> We should be able to come up with a routine which would analyze a given
> text string and rank it according to its likelyhood of being a 'meaningful'
> message. Then use that routine in an e-mail client to rank messages and
> only download from the server those which appear to be 'real'.
>
> Ideas?

I'm working at a program, that I actually call "MailFilter". smile
It can change incoming and outgoing mails. The current version 0.40 has
the following features:

o incoming mail:
  - spam detection
    The program has 6 different built-in mechanisms for detecting spam.
    At least concerning the mails that I personally get, these mechanisms
    are better than the spam filters of my e-mail provider. smile
    A little "research" in September 2003 gave:
    # my mail provider:
      - 67% of the spam was correctly recognized as spam (= sensitivity)
      - 96% of the non-spam was correctly recognized as non-spam
        (= specificity)
    # MailFilter:
      - 77% sensitivity
      - 98% specificity

    Additionally to the built-in mechanisms (which I don't want to
    disclose in public at the moment, because I don't want that
    spammers know them), the user can put mail addresses of "good"
    senders in a whitelist, and mail addresses of spammers in a
    blacklist (wildcards allowed). There is also a list, where the
    user can enter her/his own spam keywords.
    An intelligent handling of these lists will increase sensitivity and
    specificity.
    If MailFilter has recognized a mail as spam, it can add a string of
    the user's choice to the beginning of the subject. Then your mail
    client e.g. can put such mails in a special folder.

  - web bugs
    If the user wants, MailFilter deletes so called web bugs.

  - security
    If the user wants, MailFilter deletes active contents of the mail
    (e.g. scripts).

  - HTML
    In "multipart/alternative" mail, MailFilter can delete unneeded HTML
    parts completely.

  - privacy
    MailFilter can delete the request for returning a receipt.

  - readability
    # MailFilter deletes garbage at the end of the mail, and of plain
      text MIME parts (blank lines, and lines that only contain ' ',
      '\t', '>', or '.').
    # If the user specifies the beginning of 2 lines, then MailFilter
      searches for the last occurence of these lines in the mail.
      MailFilter deletes the 2 lines, and all lines between them.
      (I've specified "--^-,--^-" to delete the Topica footer).
    # MailFilter can repair broken subjects, e.g.:
      "RE: Re[3]: This and That"  ==>  "Re: This and That"
    # MailFilter can change any header field of the incoming mail.
      For example I use:
         ;-- Topica
         List-Help=
         List-Unsubscribe=
         X-Topica-Id=
         ;-- Ads
         X-HotPOP=
         ;-- Misc
         Errors-To=
         X-Accept-Language=
         X-Flags=
         X-MimeOLE=
      Because the right side of '=' is empty, the concerning header
      fields are deleted. This enhances the readability of the remaining
      header, and saves space on my hard disk.

  - safety
    MailFilter can make a backup copy of the original message, and also
    of the changed message.

o outgoing mail:
  - MailFilter can change any header field of the outgoing mail.
    This can be used for privacy, e.g. if you don't want the whole world
    to know, what mail client you use. Or it can be used for special
    greetings (see the "X-Greetings" header field in my mails smile, or ...

  - backups
    MailFilter can make a backup copy of the original message, and also
    of the changed message.

In summary, MailFilter is something like a mixture of parts of
KorrNews <http://www.tglsoft.de/misc/hamtools_en.htm>, and parts of
Benign <http://www.firetrust.com/products/benign/>. It runs stable on
my Windows 98 system.

Main disadvantages that I'm aware of:
- MailFilter can handle any mail file on your hard disk, by calling e.g.
      MailFilter -in  this.eml
      MailFilter -out that.eml
  But in order to handle your incoming and outgoing mail automatically,
  you have to install a local mail server (such as
  MorVer <http://www.morver.de/>). The local server then calls
  MailFilter.
  This also means that MailFilter cannot handle the mails on a remote
  mail server. Any mail must be downloaded to the hard disk first, so
  that MailFilter can access it.
- Currently only partial support for mails, that contain *nested* mime
  parts.
- No version for Linux/FreeBSD at the moment.
- Some error messages currently in German.

I'm already working at the documentation (in English). As soon as this
and the translation of the error messages is finished, I'll offer
MailFilter on my website for free download.

Regards,
   Juergen

-- 
 /"\  ASCII ribbon campain  |    |\      _,,,---,,_
 \ /  against HTML in       |    /,`.-'`'    -.  ;-;;,_
  X   e-mail and news,      |   |,4-  ) )-,_..;\ (  `'-'
 / \  and unneeded MIME     |  '---''(_/--'  `-'\_)

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu