Mail Filtering

Or, how to block a few million spams per day without breaking a sweat.

© 2005 by Jef Poskanzer.

Introduction

In November 2004, Microsoft's second-in-command Steve Ballmer made some headlines by mentioning that Chairman Bill Gates was getting four million spams per day. At the time, I was dealing with a little spam problem of my own - I was getting around a million spams per day. I found it a little comforting that my problem wasn't quite as bad as Bill's. However, a couple of weeks later Ballmer corrected himself, saying he mis-remembered the stat and Gates actually gets four million per year.

This means I was getting one hundred times as much spam as Bill Gates.

Nevertheless, after filtering we both get about the same amount: around ten spams per day in our inboxes. Ballmer says that Microsoft has an entire department dedicated to protecting their mailboxes from spam. At ACME Labs there's just one guy, one server, and a T1 line. And yet my filters are a hundred times as effective as Microsoft's. How do I do it?

These pages will show you how, and help you deploy similar filters on your own system.


Goals

What am I trying to do here?



Results

For those who like to read the end of a novel first, here are some overall stats showing how the filters are performing.



Environment

This is all based on a Unix system running sendmail. If you're not using Unix, or you're using a different Unix-based mail system, most of the specific advice here will not help you. You may still find some value in the general ideas.
>>> [Background] >>>