Mail Filtering - Blackmilter

SMTP Phase	pre-DATA
CPU Use	low
Memory Use	medium
False Positives	low
Maintenance	medium
Effectiveness	high

Blackmilter

Blackmilter is a flexible and efficient tool for blocking mail senders by IP address. It could be used to implement an almost unlimited variety of blocking policies, depending on how the IP blocklists and allowlists are generated, how entries expire, and so on.

Blackmilter is my second line of defense after sendmail itself. It blocks about 90% of the incoming mail while using very little CPU time.

The way I use it is to run some scripts every hour which look through the system's mail log file. These scripts check for a variety of anti-social behaviors, extract the IP addresses doing the bad things, and add them to a set of local short-term blocklists.

You have to be careful with IP-based blocklists because many people change their IP addresses fairly often (via DHCP). The risk is that someone doing a bad thing could get blocked, give up his IP address, and then someone else could come along and get the same address and still be blocked. I avoid this risk because of two things. One, my blocklists are short-term - addresses automatically expire out of them in a day or so. And two, I use the "-graylist" option on blockmilter to have it return temporary failure codes instead of permanent rejection codes. Because of this, any real mail that happens to get sent from a blocked IP address will get queued on the sending site, and a few days later it will get through.

Below are explanations of some of the blocklist-generating scripts. Sendmail's log format is fairly twisted, so these scripts are a little complicated. Not too bad. Some of them use ipizer, a little lex program, as a pre-processing filter.

Using the scripts has five steps:

Rotate the sendmail log file every hour, via newsyslog.
In a cron job that runs a couple of minutes past the hour, rotate a set of IP-address files for each script.
Create the hour's new IP-address files by running the last hour's sendmail log file through each script.
Concatenate all the IP-address files together to form a blocklist file.
Get blackmilter to re-load its blocklist - if you use the -autoupdate flag, this happens automatically.

Wormy

This one leverages ClamAV, a late & expensive but very accurate filter, into an early & cheap blocklist that is still accurate. The script looks in the mail log file for entries generated by ClamAV. Here's a sample:

Jun  4 17:11:58 gate sm-mta[46365]: j550BpZM046365: Milter: data, reject=554 5.7.1 virus Worm.Mytob.BT detected by ClamAV - http://www.clamav.net

The script figures out the IP address associated with each of those entries. All of those IP addresses get added to the wormy blocklist, and they remain on the list for a day or so. After that, the address gets another chance; if it is still infected, it goes right back onto the list.

The script looks like this:

ipizer |
  egrep ': Milter: data, reject=554 5\.7\.1 .* detected by ClamAV .* \[' |
  sed -e 's/.* detected by ClamAV .* \[//' -e 's/\].*//' |
  sort -u

This blocklist also includes IP addresses found by the Non-ClamAV viruses filter, which are collected by a separate script.

Pregreeter

This one looks for the log messages generated by the greet_pause sendmail config option. Here's a sample log entry:

Jan 18 00:25:42 gate sm-mta[94688]: j0I8PgRj094688: rejecting commands from host34-34.pool8256.interbusiness.it [82.56.34.34] due to pre-greeting traffic

Since the IP address is included in the log entry, the script to collect them is pretty simple. The threshhold for getting added to the list is low, currently 5 pre-greetings per hour.

While the greet_pause check is fairly cheap, it does involve a multi-second delay, which means a sendmail process sitting aroung taking up memory for those seconds. By putting the misbehaving IP addresses into a blocklist, we can reject them immediately and save on memory.

The script looks like this:

egrep 'rejecting commands from .* due to pre-greeting traffic' |
  sed -e 's/.*\[//' -e 's/\] due to pre-greeting traffic.*//' |
  sort |
  uniq -c |
  awk '{ if ( $1 >= 5 ) print $2; }'

DNI

This one stands for "Did Not Issue". It looks for log entries like this:

Dec  8 17:24:32 gate sm-mta[24812]: iB91OWf7024812: ameranth.com [216.70.241.138] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA

I have no idea what causes these but they are obviously not legitimate SMTP connections. They come from only a couple hundred different IP addresses, each of which does many thousands of connections. It's possible this is from spammers probing to see if I'm running a mail server. It's also possible it's some sort of lame attempt at a denial of service attack. Whatever, they're easy enough to block. The threshhold for getting added to the list is fairly high, currently 30 DNIs per hour, so there's not much risk of blocking a site which is just having some network problems.

The script looks like this:

egrep 'did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA' |
  sed \
    -e 's/.*\[//' \
    -e 's,\].* did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA.*,,' |
  sort |
  uniq -c |
  awk '{ if ( $1 >= 30 ) print $2; }'

Prober

This script looks for addresses attempting to send to lots of users that don't exist. These are spammers doing a "dictionary attack" to try and find new victims. The log entries for this look like:

May 17 10:21:18 gate sm-mta[20161]: i4HHKtBE020161: ... User unknown

Because the IP address does not appear in the log entries, we use ipizer to add it. The threshhold for inclusion is currently 20 bogus users per IP address in one hour.

The script looks like this:

ipizer |
  egrep '\.\.\. User unknown \[' |
  sed -e 's/.*\.\.\. User unknown \[//' -e 's/\].*//' |
  sort |
  uniq -c |
  awk '{ if ( $1 >= 20 ) print $2; }'

Too Many

This one looks for addresses that just send too many darned messages. The threshhold for inclusion is high, currently 60 messages per hour from the same IP address. If you have multiple users on a high-traffic mailing list, you'll probably exceed this limit.

The script looks like this:

egrep 'NOQUEUE: connect from ' |
  sed -e 's/.* NOQUEUE: connect from .*\[//' -e 's/\].*//' |
  egrep -v '^127\.0\.0\.1$' |
  sort |
  uniq -c |
  awk '{ if ( $1 >= 60 ) print $2; }'

Spammer

This blocklist keys off of the later, more expensive Bayesian filters. Any time the Bayesian layer finds an "egregious" spam message that scores the maximum, it logs the IP address. Those addresses are then gathered up and added to this list. The threshhold for inclusion is low, currently only 3 messages per hour.

Persistent

This one is sort of a meta-list. It looks for addresses that are already blocked by the blocklist, but still keep trying to send. It has a fairly high threshhold, currently 40 messages per hour.

The script looks like this:

egrep 'blackmilter: blocklist ' |
  sed -e 's/.*blackmilter: blocklist [^\[]* \[//' -e 's/\].*//' |
  sort |
  uniq -c |
  awk '{ if ( $1 >= 40 ) print $2; }'

<<< [Sendmail Config] <<<

>>> [Graymilter] >>>