1. Anyone want to write an "intelligent" mail filter?
Every day I get more annoying SPAM e-mails. Currently it's running about 10
spams to every valid e-mail.
I'm tired of wading thru them, and I'd rather not download them at all.
My e-mail client can filter the messages by sender or subject, but most
spams now are written to get around those filters.
One thing I notice is that nearly 100% of the spams either contain the
word "lagos" or long strings of "dictionary" words to confuse the filters:
"indecisive constitute dakar summitry ajax beaver descendent withal
circumlocution asocial voluble inquire convolution replete hitler
commendation segregate cognition abstract eject disgustful"
But very few or none of the more common shorter words that would likely
appear in a valid e-mail: "a, and, or, if, you, we, I, to, for, the, this,
that....."
We should be able to come up with a routine which would analyze a given
text string and rank it according to its likelyhood of being a 'meaningful'
message. Then use that routine in an e-mail client to rank messages and
only download from the server those which appear to be 'real'.
Ideas?
Irv
--
Windows 98 is *NOT* a virus - viruses are small and efficient.
2. Re: Anyone want to write an "intelligent" mail filter?
On Tue, 4 Nov 2003 17:37:17 -0500, Irv Mullins <irvm at ellijay.com>
wrote:
>SPAM
>We should be able to come up with a routine which would analyze a given
>text string and rank it according to its likelyhood of being a 'meaningful'
>message. Then use that routine in an e-mail client to rank messages and
>only download from the server those which appear to be 'real'.
>
>Ideas?
>long strings of "dictionary" words to confuse the filters:
Hmm, I think so..
Actually such a blindingly trivial idea you may kick yourself.
But I need something which works on W98.
Deal?
Pete
PS Cut and paste me a couple of [complete] messages & I'll let you
know if my plan would work on the spam you get. I'm from up north,
[thick skinned &proud of it] and on broadband, so don't worry about
sending me large or offensive content
3. Re: Anyone want to write an "intelligent" mail filter?
Irv Mullins wrote:
>We should be able to come up with a routine which would analyze a given
>text string and rank it according to its likelyhood of being a 'meaningful'
>message. Then use that routine in an e-mail client to rank messages and
>only download from the server those which appear to be 'real'.
>
>Ideas?
>
>
I think the best way to combat spam is to create a "white list" (as
opposed to a "black list").
Any mail received that's not on your white list will be rejected as
spam. The white list can simply be your address book.
Okay, so how to deal with mail from friends who get new email addresses
or whatever? Well, the email client will send a response email to all
non-whitelist email addresses. It will say something like this:
"Hi, I'm using Irv's Mail Control v1.0. It rocks! It has flagged you as
an unacceptable sender. However, if you're my buddy, simply respond to
this email and I'll let you in!"
Since most spam is auto generated, you won't get responses from most
spammed mail. Any bouncebacks will be handled gracefully by the email
client. Any responses you get at all will go into a special "check
these" folder. Any of those that you mark as spam will then forever be
known as spam. Any you mark as valid will be added to your whitelist.
This is the best way to combat spam, even though it might require a
little effort in adding addresses to your whitelist.
4. Re: Anyone want to write an "intelligent" mail filter?
On Tue, 04 Nov 2003 18:47:22 -0600, "C. K. Lester"
<euphoric at cklester.com> wrote:
>Well, the email client will send a response email to all
>non-whitelist email addresses.
One of the problems with that is a lot of spam is sent simply to
confirm someone is reading the spammed address.
Reply and your email address is then sold on to 100 other spammers.
Just a thought,
Pete
5. Re: Anyone want to write an "intelligent" mail filter?
Pete Lomax wrote:
>On Tue, 04 Nov 2003 18:47:22 -0600, "C. K. Lester"
><euphoric at cklester.com> wrote:
>
>
>>Well, the email client will send a response email to all
>>non-whitelist email addresses.
>>
>>
>One of the problems with that is a lot of spam is sent simply to
>confirm someone is reading the spammed address.
>
>Reply and your email address is then sold on to 100 other spammers.
>
>
Yeah, I know. That's why I have graphics display turned off in my mail
client. I don't know exactly what to do about that.
If possible, can an email client send a faked "bounceback" message? or
does that have to come from the server? Maybe an "SirvEu Mail Control
v1.0"-friendly ISP will let you manage your whitelist on their server. I
dunno. Any ideas? :)
6. Re: Anyone want to write an "intelligent" mail filter?
----- Original Message -----
From: "C. K. Lester" <euphoric at cklester.com>
To: <EUforum at topica.com>
Subject: Re: Anyone want to write an "intelligent" mail filter?
You need all; a whitelist, a blacklist and a contact list. This is because not
every whitelist entry will need to be in the contacts. The algorithm should be
along the lines of ...
if sender or originating machine, is in whitelist then accept it.
if sender is in contacts then accept it.
if sender or originating machine is in blacklist then quarrentine it.
This, in conjuction with a baysian filter thingy will help a lot.
--
Derek
7. Re: Anyone want to write an "intelligent" mail filter?
Irv Mullins wrote:
> Every day I get more annoying SPAM e-mails. Currently it's running about 10
> spams to every valid e-mail.
>
> I'm tired of wading thru them, and I'd rather not download them at all.
> My e-mail client can filter the messages by sender or subject, but most
> spams now are written to get around those filters.
>
> One thing I notice is that nearly 100% of the spams either contain the
> word "lagos" or long strings of "dictionary" words to confuse the filters:
>
> "indecisive constitute dakar summitry ajax beaver descendent withal
> circumlocution asocial voluble inquire convolution replete hitler
> commendation segregate cognition abstract eject disgustful"
>
> But very few or none of the more common shorter words that would likely
> appear in a valid e-mail: "a, and, or, if, you, we, I, to, for, the, this,
> that....."
>
> We should be able to come up with a routine which would analyze a given
> text string and rank it according to its likelyhood of being a 'meaningful'
> message. Then use that routine in an e-mail client to rank messages and
> only download from the server those which appear to be 'real'.
>
> Ideas?
For the past few months I've been using the e-mail
client in Netscape 7.1. It has a "Bayesian" spam filter
that adapts to the streams of spam and normal mail
that you receive. It works pretty well.
It keeps track of all the words in your incoming e-mail,
and notes how often each word appears
in spam vs normal mail. For example, the word "Euphoria"
might have appeared in 1 of my spam messages and
99 of my normal messages, so if it sees "Euphoria" in a
message, that would indicate a 99% probability that this
is a normal message. But it doesn't just look at one word.
I believe it looks at the 20 or so words in each message
with the most extreme probabilities. It uses a formula from
Bayesian statistics to combine the probability indicated
by each word into a single overall probability. e.g.
if you had a word that indicated "90% likely to be spam"
and another that said "95% likely to be spam", the result of
combining those two words might be 97% (or something).
It will move a message out of your inbox into a spam folder
if the probability of it being spam is quite high,
something like 99%. Obviously you want to keep false
positives (real mail tagged as spam) to an absolute minimum.
In practice, over a long period of time,
suppose I get 1000 messages of which 900 are spam. It will
probably move about 800 of the spams and 1 or 2 of the
non-spams into my spam folder.
With each batch of incoming mail, I check the spam folder
for non-spams, but usually I can quickly see from the
subjects and senders that there aren't any non-spams, so I
click a button to quickly delete all the spams in one
operation.
Whenever it tags a message incorrectly (usually spam
that it missed), you can click a button to tell it so.
This way it gradually learns and gets smarter.
Also, e-mail from anyone in my address book is automatically
considered non-spam, so the false positives are quite low.
Being able to delete a whole bunch of spams in one
operation saves time. It's also nice that it keeps
my inbox largely clear of distracting spam clutter.
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
8. Re: Anyone want to write an "intelligent" mail filter?
Ive personally found that with keywords like found here:
http://itis.net/msbl/msbl-current.txt
and not allowing certain IP's like:
ftp://ftp.h2osoft.homeip.net/filters.rec
not having an open relay and keeping a global postmaster
account for unknown users of my email server seems to
work affectivly. The above links will serve to help in building
a pretty good filter...
Euman
----- Original Message -----
From: "Derek Parnell" <ddparnell at bigpond.com>
To: <EUforum at topica.com>
Sent: Tuesday, November 04, 2003 9:31 PM
Subject: Re: Anyone want to write an "intelligent" mail filter?
>
>
> ----- Original Message -----
> From: "C. K. Lester" <euphoric at cklester.com>
> To: <EUforum at topica.com>
> Sent: Wednesday, November 05, 2003 1:27 PM
> Subject: Re: Anyone want to write an "intelligent" mail filter?
>
>
> You need all; a whitelist, a blacklist and a contact list. This is because
not every whitelist entry will need to be in the contacts. The algorithm
should be along the lines of ...
>
> if sender or originating machine, is in whitelist then accept it.
> if sender is in contacts then accept it.
> if sender or originating machine is in blacklist then quarrentine it.
>
>
> This, in conjuction with a baysian filter thingy will help a lot.
>
> --
> Derek
>
>
>
> TOPICA - Start your own email discussion group. FREE!
>
>
9. Re: Anyone want to write an "intelligent" mail filter?
if you want to design a "smart" filter based on the body of an email, you'd
need at least the following:
1. ability to recognize and strip HTML
2. a dictionary of words with certain attributes of each word (noun,
verb, adjective, preposition, article)
3. ability to parse for "meaningful" sentences or sentence structures. at
this point you'd be writing a grammar checker similar to that found in MS
Word. You'd look for parts of a sentence and/or groups of words that match
no given pattern.
for example:
"i have a whole sentence in quotes"
"i have" = noun + verb = simple sentence
"a whole sentence" = article + adjective + noun = sentence object
"in quotes" = preposition + noun (plural) = prepstional phrase
"indecisive constitute dakar summitry ajax beaver"
"indecisive constitute" = adjective + verb = not a sentence (should
be adverb + verb or adjective + noun)
"dakar summitry" = noun + noun = not a sentence (makes no sense)
"ajax beaver" = noun + noun = not a sentence (makes no sense)
as you can see there are specifc patterns to the english language. anyone
with an extensive knowledge of the stucture of the language, and a few
english books (such as myself) could write such a filter. i however, do not
have the time, as my current projects are quite consuming with my time. i am
willing to lend my assistance to anyone who may need it. i have taken quite
a few english classes and 'advanced composition' classes.
~Greg
10. Re: Anyone want to write an "intelligent" mail filter?
Irv wrote:
> Every day I get more annoying SPAM e-mails. Currently it's running about 10
> spams to every valid e-mail.
>
> I'm tired of wading thru them, and I'd rather not download them at all.
> My e-mail client can filter the messages by sender or subject, but most
> spams now are written to get around those filters.
>
> One thing I notice is that nearly 100% of the spams either contain the
> word "lagos" or long strings of "dictionary" words to confuse the filters:
>
> "indecisive constitute dakar summitry ajax beaver descendent withal
> circumlocution asocial voluble inquire convolution replete hitler
> commendation segregate cognition abstract eject disgustful"
>
> But very few or none of the more common shorter words that would likely
> appear in a valid e-mail: "a, and, or, if, you, we, I, to, for, the, this,
> that....."
>
> We should be able to come up with a routine which would analyze a given
> text string and rank it according to its likelyhood of being a 'meaningful'
> message. Then use that routine in an e-mail client to rank messages and
> only download from the server those which appear to be 'real'.
>
> Ideas?
I'm working at a program, that I actually call "MailFilter".
It can change incoming and outgoing mails. The current version 0.40 has
the following features:
o incoming mail:
- spam detection
The program has 6 different built-in mechanisms for detecting spam.
At least concerning the mails that I personally get, these mechanisms
are better than the spam filters of my e-mail provider.
A little "research" in September 2003 gave:
# my mail provider:
- 67% of the spam was correctly recognized as spam (= sensitivity)
- 96% of the non-spam was correctly recognized as non-spam
(= specificity)
# MailFilter:
- 77% sensitivity
- 98% specificity
Additionally to the built-in mechanisms (which I don't want to
disclose in public at the moment, because I don't want that
spammers know them), the user can put mail addresses of "good"
senders in a whitelist, and mail addresses of spammers in a
blacklist (wildcards allowed). There is also a list, where the
user can enter her/his own spam keywords.
An intelligent handling of these lists will increase sensitivity and
specificity.
If MailFilter has recognized a mail as spam, it can add a string of
the user's choice to the beginning of the subject. Then your mail
client e.g. can put such mails in a special folder.
- web bugs
If the user wants, MailFilter deletes so called web bugs.
- security
If the user wants, MailFilter deletes active contents of the mail
(e.g. scripts).
- HTML
In "multipart/alternative" mail, MailFilter can delete unneeded HTML
parts completely.
- privacy
MailFilter can delete the request for returning a receipt.
- readability
# MailFilter deletes garbage at the end of the mail, and of plain
text MIME parts (blank lines, and lines that only contain ' ',
'\t', '>', or '.').
# If the user specifies the beginning of 2 lines, then MailFilter
searches for the last occurence of these lines in the mail.
MailFilter deletes the 2 lines, and all lines between them.
(I've specified "--^-,--^-" to delete the Topica footer).
# MailFilter can repair broken subjects, e.g.:
"RE: Re[3]: This and That" ==> "Re: This and That"
# MailFilter can change any header field of the incoming mail.
For example I use:
;-- Topica
List-Help=
List-Unsubscribe=
X-Topica-Id=
;-- Ads
X-HotPOP=
;-- Misc
Errors-To=
X-Accept-Language=
X-Flags=
X-MimeOLE=
Because the right side of '=' is empty, the concerning header
fields are deleted. This enhances the readability of the remaining
header, and saves space on my hard disk.
- safety
MailFilter can make a backup copy of the original message, and also
of the changed message.
o outgoing mail:
- MailFilter can change any header field of the outgoing mail.
This can be used for privacy, e.g. if you don't want the whole world
to know, what mail client you use. Or it can be used for special
greetings (see the "X-Greetings" header field in my mails
, or ...
- backups
MailFilter can make a backup copy of the original message, and also
of the changed message.
In summary, MailFilter is something like a mixture of parts of
KorrNews <http://www.tglsoft.de/misc/hamtools_en.htm>, and parts of
Benign <http://www.firetrust.com/products/benign/>. It runs stable on
my Windows 98 system.
Main disadvantages that I'm aware of:
- MailFilter can handle any mail file on your hard disk, by calling e.g.
MailFilter -in this.eml
MailFilter -out that.eml
But in order to handle your incoming and outgoing mail automatically,
you have to install a local mail server (such as
MorVer <http://www.morver.de/>). The local server then calls
MailFilter.
This also means that MailFilter cannot handle the mails on a remote
mail server. Any mail must be downloaded to the hard disk first, so
that MailFilter can access it.
- Currently only partial support for mails, that contain *nested* mime
parts.
- No version for Linux/FreeBSD at the moment.
- Some error messages currently in German.
I'm already working at the documentation (in English). As soon as this
and the translation of the error messages is finished, I'll offer
MailFilter on my website for free download.
Regards,
Juergen
--
/"\ ASCII ribbon campain | |\ _,,,---,,_
\ / against HTML in | /,`.-'`' -. ;-;;,_
X e-mail and news, | |,4- ) )-,_..;\ ( `'-'
/ \ and unneeded MIME | '---''(_/--' `-'\_)
11. Re: Anyone want to write an "intelligent" mail filter?
On Tuesday 04 November 2003 07:47 pm, C.K. wrote:
> I think the best way to combat spam is to create a "white list" (as
> opposed to a "black list").
>
> Any mail received that's not on your white list will be rejected as
> spam. The white list can simply be your address book.
You're absolutely right about the 'whitelist'. That would be the first step -
or rule - to automatically let emails from friends and clients on your
whitelist thru.
> Okay, so how to deal with mail from friends who get new email addresses
> or whatever? Well, the email client will send a response email to all
> non-whitelist email addresses. It will say something like this:
>
> "Hi, I'm using Irv's Mail Control v1.0. It rocks! It has flagged you as
> an unacceptable sender. However, if you're my buddy, simply respond to
> this email and I'll let you in!"
But if this idea becomes widely used, then it will just verify your address
as valid for the spammer's next list.
> Since most spam is auto generated, you won't get responses from most
> spammed mail. Any bouncebacks will be handled gracefully by the email
> client. Any responses you get at all will go into a special "check
> these" folder. Any of those that you mark as spam will then forever be
> known as spam. Any you mark as valid will be added to your whitelist.
OK, but the spammers almost never use the same address more than once,
and it's almost always forged. And, during the most recent virus attacks,
some of the forged addresses were addresses which would have appeared
in my whitelist. So I'm not so sure how useful that would be.
> This is the best way to combat spam, even though it might require a
> little effort in adding addresses to your whitelist.
Maybe the *best* way would be to host a few Texas 'necktie parties' for
the spammers. I understand that there are only 2 or 3 hundred notorious
spammers who do 90% of the damage. If they aren't stopped, then e-mail
will become useless. It's too good a tool to lose to low-lifes.
Irv
12. Re: Anyone want to write an "intelligent" mail filter?
On Tuesday 04 November 2003 07:09 pm, Pete wrote:
> PS Cut and paste me a couple of [complete] messages & I'll let you
> know if my plan would work on the spam you get. I'm from up north,
> [thick skinned &proud of it] and on broadband, so don't worry about
> sending me large or offensive content
Good lord! You actually need someone to send you spam?
How have you avoided the 30 or so offers each day to lengthen your
erm.. attributes, the 50 for fake (rhymes with Niagara) and the 5 or 6
requests for help in exporting cash from Nigeria?
Why, just this year, I've already been offered more than twice the net
worth of the entire country, just for my help in exporting various
funds from that region. Must be tough living in a country with so much
unclaimed cash lying around.
Irv
13. Re: Anyone want to write an "intelligent" mail filter?
I'm all for those Texas necktie parties for the spammers.
Lucius L. Hilley III
----- Original Message -----
From: "Irv Mullins" <irvm at ellijay.com>
To: <EUforum at topica.com>
Subject: Re: Anyone want to write an "intelligent" mail filter?
> Maybe the *best* way would be to host a few Texas 'necktie parties' for
> the spammers. I understand that there are only 2 or 3 hundred notorious
> spammers who do 90% of the damage. If they aren't stopped, then e-mail
> will become useless. It's too good a tool to lose to low-lifes.
>
> Irv
>
14. Re: Anyone want to write an "intelligent" mail filter?
You have forgotten about the breast enhancments, perscription
drugs
through the net, college Grant or Loan, Be your own boss,
Morgages,
debt consildation, Refinancing, Stock tips, links to adult
activities,
Plantinum, Gold cards, Store discount notices, Weight loss
programs
and pills, Health insurance, Life insurance, Car insurance, Dental
insurance, Cable descrambler plans, Boost your PC speed, Free
vacation scams, Paypal and eBay account scams, Beauty products,
Find your mate or date, Live longer and younger, Free Lottorey,
Sweepstakes, Jackpot entry, Airline discounts, Travel discounts,
CD and DVD creators, HGH, stop spam, Anti-virus sellers,
*NEW* (perfect sideburns), HP, Compaq, Dell computer sells,
School study guides, Top selling books, Printer ink discounts,
Lucius L. Hilley III
----- Original Message -----
From: "Irv Mullins" <irvm at ellijay.com>
To: <EUforum at topica.com>
Sent: Wednesday, November 05, 2003 07:47 AM
Subject: Re: Anyone want to write an "intelligent" mail filter?
On Tuesday 04 November 2003 07:09 pm, Pete wrote:
> PS Cut and paste me a couple of [complete] messages & I'll let you
> know if my plan would work on the spam you get. I'm from up north,
> [thick skinned &proud of it] and on broadband, so don't worry about
> sending me large or offensive content
Good lord! You actually need someone to send you spam?
How have you avoided the 30 or so offers each day to lengthen your
erm.. attributes, the 50 for fake (rhymes with Niagara) and the 5 or 6
requests for help in exporting cash from Nigeria?
Why, just this year, I've already been offered more than twice the net
worth of the entire country, just for my help in exporting various
funds from that region. Must be tough living in a country with so much
unclaimed cash lying around.
Irv
15. Re: Anyone want to write an "intelligent" mail filter?
Hi Ricardo,
I had the same problem and i always used http://www.mail2web.com/ and=20
worked right. It=B4s a free service to check your pop account and delete=
=20
spam mails before downloading them.
Hope this helps,
Guillermo Bonvehi
Ricardo Forno wrote:
>=20
>=20
> Rob:
> Is there some SPAM filter that not only identifies SPAM, but also avoids
> downloading them from the server and deletes them in the server?
> I'm asking this because I have a 56K connection, and SPAM consumes a big
> part of connection time (I have to pay for it).
> A few days ago, I had to resort to change my e-mail address from rforno t=
o
> rmforno, in view of increasing SPAM.
> Regards.
16. Re: Anyone want to write an "intelligent" mail filter?
On Wednesday 05 November 2003 11:57 am, Ricardo Forno wrote:
> Rob:
> Is there some SPAM filter that not only identifies SPAM, but also avoids
> downloading them from the server and deletes them in the server?
> I'm asking this because I have a 56K connection, and SPAM consumes a big
> part of connection time (I have to pay for it).
> A few days ago, I had to resort to change my e-mail address from rforno
> to
> rmforno, in view of increasing SPAM.
I've just downloaded Save My Modem which seems to do exactly what we've been
talking about. It identified the first batch of e-mail (6 messages, 4 of them
spam) 100% correctly, and offered to delete them from the server and/or
send "mailbox not found" bounces. It uses SpamAssassin.
It's at http://savemymodem.sourceforge.net
Looks good.
Irv
--
Windows 98 is *NOT* a virus - viruses are small and efficient.
17. Re: Anyone want to write an "intelligent" mail filter?
----- Original Message -----
From: "C. K. Lester" <euphoric at cklester.com>
To: <EUforum at topica.com>
Subject: Re: Anyone want to write an "intelligent" mail filter?
> If possible, can an email client send a faked "bounceback" message? or
> does that have to come from the server? Maybe an "SirvEu Mail Control
> v1.0"-friendly ISP will let you manage your whitelist on their server. I
> dunno. Any ideas? :)
Hello, C. K.
KMail (the "official" email client for KDE), has a "bounce" option which
does just that. I've also planned to put that option into a little email
client that I am writing atm. Unfortunately, the bounce has to perfectly
appear as though it came from your ISP, **not** your email address. This is
something that I haven't tackled quite yet, but I haven't really tried yet,
as I haven't gotten to that point.
Travis W. Beaty
Osage, Iowa.
18. Re: Anyone want to write an "intelligent" mail filter?
On Wed, 5 Nov 2003 07:47:39 -0500, Irv Mullins <irvm at ellijay.com>
wrote:
>Good lord! You actually need someone to send you spam?
Ekkk! LOL
I meant spam you thought might be difficult to trap/detect as spam.
But after what has since been said, encouraging your ISP to do this
work for you is a much better idea.
Pete
19. Re: Anyone want to write an "intelligent" mail filter?
On Wed, 5 Nov 2003 09:04:21 -0500, Lucius Hilley
<l3euphoria at bellsouth.net> wrote:
>You have forgotten about the breast enhancments, perscription
>drugs
<snip>
I've applied that important Microsoft Internet Patch about fifteen
times now and ...
20. Re: Anyone want to write an "intelligent" mail filter?
On Wednesday 05 November 2003 03:13 pm, Ricardo wrote:
> Many thanks. I'll download it.
> But, does it avoid downloading spam? Not clear by what you wrote.
> Anyway, up to the moment my new account hasn't received any spam, so I will
> not be able to test this or other anti-spam software until I start getting
> spam again...
Yes, it first downloads and displays the subject and sender, and checks the
first few lines (12 perhaps) of the message body.and applies the filters. It
marks suspected spams in red, and offers to delete or bounce them all with
one click. Once that's done, you can use your normal e-mail client to
download the remaining good messages.
So far, it has identified correctly all spams, and not misidentified any valid
e-mails. I'm satisfied.
Regards,
Irv