« new pats posted - 20050602 (maintenance pats release) | Main | Links Roundup »
June 3, 2005
SciAm on Spam
This Scientific American article on antispam tactics is worth reading, if only because it discusses the failures of successive waves of "machine learning" techniques, such as naive Bayesian and fingerprinting approaches, all of which (wrongly) focus on content, rather than whether the message was solicited. Admitting the failure of machine learning and content analysis, they move on to yet another set of losing propositions (challenge/response systems, "captchas", and the mythical but exceedingly persistent idea of "charging spammers", whether through requiring excessive processor cycles or somehow expecting thieves to "pay" for their abuse). Finally, they admit that laws are not the answer. Yet for some reason, the authors are optimistic in the end, saying "we have little doubt that the combination of the current and next-generation techniques will eventually stop most spam".
Sadly, such optimism is completely unwarranted when you look at the obvious and continued failures of content analysis, the abusive and annoying practice of challenge/response, and the stupidity of assuming that spammers would ever pay for what they can steal. Content analysis requires that the message be accepted and analyzed before being flagged (with notable exceptions such as Milter-based systems for sendmail), which defeats the real point: we don't even want to accept spam, much less accept it and then store it and waste cycles analyzing its content. More to the point, some content which would be perfectly acceptable and even solicited by some recipients is equally unacceptable to others. Challenge/response is a fatally flawed system in which the inability to verify the sender's authenticity results in a storm of useless and annoying bounces to the (forged) senders, who may or may not even exist. I won't even dignify the idea of payment schemes by discussing it here, as it seems to me to be obvious on its face to be an unworkable fantasy.
Our project relies far more on analysis of the source of a message; "spamsign" such as the use of certain spam software with clear and rather pathetically obvious signatures; and other characteristics of spam-sending software such as their "rudeness" or outright attempts at deception. It may seem surprising to these no doubt well-intentioned researchers, but we've found that worrying about the content of a spam message is practically useless compared to header analysis, enforcing protocols, and the like, which regularly block 98% or more of the spam delivery attempts made here on our servers (on average, we accept one spam message for every five hundred or so we reject on a given day, and most of those are caught by 419/phish content filters - the numbers are perhaps misleading as some accounts have less restrictive antispam policies and so the final number would be even more impressive).
It may be objected that solving the zombie problem by tracking unlikely sources of legit mail using pattern-matching will be defeated if and when the zombies are reconfigured to relay through a legitimate mail server, or that focusing on malware characteristics rather than content is likely to fail as those programs are updated and replaced. We'll worry about that when it happens. In the meantime, it's our experience that such malware represents a significant investment for the spammers, who are loath to upgrade until their response rates drop off.
One example, the malware used by 419/advanced fee fraud scammers, for example, has remained essentially unchanged for the past two years, and by continuing to block mail on its signature, we've rejected over fifteen hundred messages in the past year - messages that would have otherwise been accepted, filtered, stored, transmitted via POP, reported, etc. We prefer to reject them before that waste of time and disk space and processor and bandwidth happens.
Posted by schampeo at June 3, 2005 2:09 PM