new pats posted - 20090701 (maintenance pats release)

39819 patterns, 11412 right anchor strings, 172127 test IPs.

Contribs from yesterday, plus more from a CBL list.txt I recently
resolved down to PTRs. This release matches 99.995% of the PTRs in
that CBL zone.

Was asked to start tracking couplets (pattern class and tech, taken
together as a sort of meta-identifier); there are no new couplets in
this release.

Download them here:

sendmail:
http://enemieslist.com/downloads/sendmail_access_db
http://enemieslist.com/downloads/rightanchors

postfix:
http://enemieslist.com/downloads/postfix_regexp_table
http://enemieslist.com/downloads/postfix_regexp_table-20090701

exim:
http://enemieslist.com/downloads/exim_hosts
http://enemieslist.com/downloads/exim_hosts-20090701

The Trifecta, or, tweaking your way to glory

We have our own home-grown sendmail antispam filters here, which use a fairly broad brush to score incoming mail, but which have been remarkably effective for us for over six years.

One of the data points we check is of course whether the sending host has a generic PTR, via the enemieslist DNSBL. But we also find it useful to check the TCP fingerprint of the sending host, to see if the box on the other end is running some form of Windows - particularly certain highly vulnerable releases and patchlevels, like Windows XP Service Pack 1. We also check to see whether the message in question is in multipart/alternative format, or "HTML email", because in our experience it's rare to see spam that is in plain text format.

Each of these conditions (HTML, Windows, generic PTR) scores a fairly low spam score, because of course it's perfectly normal for mail to be in HTML format, and there are many Windows boxes running MSExchange and other legitimate Windows-based mail server software. And of course, there are many small businesses with generic addressing on their static netspace. The problem is when we see all three together.

As a default, all of our local accounts here have a spam score threshold of 4, which is sufficient to keep out the vast majority of the inbound spam - especially if the local scoring has been tweaked to give high scores to generic HELOs and low to generic PTRs - and which lets almost all normal mail traffic through. For historical reasons, the scoring is all done in integers, so we don't have the fine-tuning capabilities available in SpamAssassin, for example, where an HTML message might get a 1.7 just for containing HTML and no text part. Here, by default, HTML email scores a point, any Windows system scores a point, and any other issue is usually enough to dump it into the quarantine. A static generic PTR gets 2 points. So, the Trifecta is 4 points, enough to reject on for most accounts.

Pretty much the only time we ever have to whitelist anyone here is when the sender has hit the Trifecta outlined above. HTML-only email, sending from a Windows box, with a generic (almost always static) PTR. What's sad about this isn't that we have to make up for their IT consultants' failure to bother to request a custom PTR, or that some people run MTA software that spits out HTML-only email. No, that's pretty much par for the course in any industry without a need for a full-time IT person or team. Lawyers, galleries, non-profits, small businesses of many kinds are subject to the pressure to conform - and to pay lots of money for Exchange (when they could use free, high-performance Unix-based mail server software). And for the skills needed to install it (poorly), maintain it (poorly) and patch and upgrade it (rarely). OK, enough Unix bigotry. For now.

Some will complain that we shouldn't be blocking (or even scoring discriminately) on known "statics". The problem is that there are a lot more statically assigned IPs out there that have unfiltered access to the rest of the Internet, and are vulnerable to infection by the botnets, than there are legitimate mail servers with generic PTRs.

For example, yesterday we blocked 349 messages sent from static generics out of 8810 total rejected messages, or 4% of our total rejections, with one false positive (the message that spurred on this post). Of those, 117 were from .com or .net hosts, with the rest coming from ccTLDs we rarely have legitimate traffic from, so we can't just accept from static generics with .com or .net TLDs.

To effectively work around the infected statics problem while avoiding the occasional Trifecta-as-FP problem will take some more analysis, or, some more widespread clue among WIndows IT consultants. And we're not going to reduce our overall filter effectiveness by 4% daily just because of a once-a-quarter FP due to a lack of care on the part of someone else. So we need to tweak, and tune, our policies on this end without compromising our perimeter defenses, or adding to my quarantine watch workload.

Our system usually generates what, to our biased minds, are perfectly useful and informative error messages, especially in response to particular problems. The problem with the Trifecta is that we're blocking based on a score, not a specific set of problems, so the error looks like this:

554 5.7.1 HISCORE Contact postmaster@hesketh.net if this is in error, but your message was rejected as spam; it simply failed too many tests. (threshold: 4; score: 4)

There's a token (for our stats), immediately followed by a contact email address that is more or less unfiltered, a rationale, and a score/threshold. The problem is that many Exchange servers either truncate the error message, rendering it less useful, or explain that the remote system did not provide a reason - often including the complete error message beneath! - which most people don't bother to read. So we get phone calls to the effect that our system is blocking their mail. Which it is, and in many cases these are actual false positives. So we whitelist their IP address, and they can send again. (Incidentally, of the 349 messages we rejected, six had a 4/4 threshold/score; one of those was the false positive. Two had a 4/5, two had a 4/6, three had a 4/7. So, one way to deal with this is to raise our default threshold to 5, thereby letting in 7 more spams a day in order to prevent a quarterly FP. This on a system where userbase-wide we see about 3 or 4 spams/day make it through the filters, and maybe a couple 419 scams and phishing scams. So, a difficult choice - how tolerant do we become, and how low do we sink in order to accommodate these arguably at-fault systems?)

What's even more annoying is that once we've whitelisted the sending IP address of one of these poor victims, they'll go home and try to send from Outlook Web Access, which many IT consultants set up on yet another IP address, also with a generic static PTR. So we go through the whole rigamarole again, only this time with their OWA IP address.

The real problem here is two-fold: the failure of IT consultants to have even the most basic understanding of the nature of deliverability and its relationship to the generic PTR question, and the continuing acceptance of such a low standard of compliance with email community norms. (And yes, there's a third factor, namely, my reluctance to raise the default spam score threshold just to accommodate these edge cases.)

So let me close with a plea to any IT consultant tasked with setting up a Windows-based mail system: please, for the love of all that is good and holy, ask your customers' ISPs for custom reverse DNS for any system legitimately sending mail. We'll tolerate your HTML-only email, and your choice of Windows, if you'll do your part and signal to us with a custom PTR that this is a system that is intended to send mail, rather than an infected end-user system or NAT or insecure LAN.

new pats posted - 20090630 (maintenance pats release)

39710 patterns, 11430 right anchor strings, 171622 test IPs.

Contribs from yesterday, plus more from a CBL list.txt I recently
resolved down to PTRs. This release matches 99.995% of the PTRs in
that CBL zone.

Was asked to start tracking couplets (pattern class and tech, taken
together as a sort of meta-identifier); there are no new couplets in
this release.

Download them here:

sendmail:
http://enemieslist.com/downloads/sendmail_access_db
http://enemieslist.com/downloads/rightanchors

postfix:
http://enemieslist.com/downloads/postfix_regexp_table
http://enemieslist.com/downloads/postfix_regexp_table-20090630

exim:
http://enemieslist.com/downloads/exim_hosts
http://enemieslist.com/downloads/exim_hosts-20090630

new pats posted - 20090629 (maintenance pats release)

39669 patterns, 11429 right anchor strings, 171310 test IPs.

Contribs from the past few days, plus more from a CBL list.txt I
recently resolved down to PTRs. This release matches 99.995% of the PTRs
in that CBL zone.

Was asked to start tracking couplets (pattern class and tech, taken
together as a sort of meta-identifier); there are no new couplets in
this release.

Download them here:

sendmail:
http://enemieslist.com/downloads/sendmail_access_db
http://enemieslist.com/downloads/rightanchors

postfix:
http://enemieslist.com/downloads/postfix_regexp_table
http://enemieslist.com/downloads/postfix_regexp_table-20090629

exim:
http://enemieslist.com/downloads/exim_hosts
http://enemieslist.com/downloads/exim_hosts-20090629

Links Roundup

Poor choices in automatic / registration-based naming

One of my favorite memories from the past few years of scanning networks' PTRs is the day I ran into some spam from uncg.edu, the University of North Carolina at Greensboro. Seems the naming scheme for their dynamic networks was something along the lines of e.g.:

notceperryuncgedu.uncg.edu [152.13.115.41]

I saw a few more like this (munged to protect the poor folks), all with hostnames ending in uncgedu, and gave my usual sigh of resignation, but then I saw one that banished all doubt from my mind as to what was really going on:

notpantha91aolcom.uncg.edu [152.13.116.78]

Yes, that's right, they were using the end user's email address, stripping the dots and the @, and making that the hostname. Now, none of these example IPs resolves to anything, and I'm pretty confident they're no longer doing that. But it really freaked me out that anyone could be so lackadaisical with their end users' personal information, especially their email addresses.

Principles

If I could sum up one lesson that I'd like for anyone who reads these pages to take away it would be this:

The act of naming is personal, but with communal aftereffects.

Another way of thinking of this, if you don't mind a foray into religious studies (and if you do, skip to the next paragraph, it's okay), is that naming is perhaps the most sacred act an individual may perform, being an act of participation in a divinity and community of language. But bear in mind that religions, by definition, aren't mystical - the very term "religion" defines a community of belief and a shared language for discussing those beliefs - so naming participates in both a very personal act of recognizing and externalizing your perceptions, and a very social act because the name becomes available for others to use, interpret, and acknowledge.

While it may be perfectly fine for me to name my child "Beeblebrox", because that's what I always think of him as, the name will be used by other people, too. That's one reason why despite the fact that a parent may always have a nickname for a child, they still give the child a respectable name (unless the parent in question is Frank Zappa, of course). That's just a metaphor for CNAME and PTR, BTW. And may well have nothing to do with what's actually in /etc/hostname, either - the PTR is for external recognition, CNAME for alternate use. Oh, and you're not Frank Zappa. He could get away with it. You can't.

Names, once given, convey information, and do so beyond the local context. Just because I call a server "skynyrd" because (personal reason) it had one incident where it crashed unexpectedly doesn't mean I shouldn't name it so that (public, community-oriented reason) it may be recognized for what it is beyond my network, in this case a database and Web server. In the case of an IP dynamically assigned to residential cable users via DHCP, don't assume that just because your whois SWIP for that netblock has a memo or note to that effect, that the PTRs don't need to - why not keep the name itself as the locus of such information? You're more likely to change the PTRs than you are to remember to change the note in a whois record, once those IPs are reallocated as statically assigned commercial DSL. In other words:

Maximize the information associated with a name, and keep it closest to the individual unit to which it is associated.

Think of the concept of identification. In its most basic of definitions, identification is where one thing is the same as another - in this case, a name refers to an object, so the name may be used in place of the object - they are the same thing for the purposes of the particular context. In slightly more scary contexts, you might be traveling and have someone demand to see your identification, in other words, the papers that certify your name and enable strangers to confirm that the name (and perhaps picture, fingerprints, and other biometrics) matches the body.

When naming servers (or dialup ranges, or NAT pools, etc.) think of the context in which the names will be evaluated - it will be a stranger, probably not thrilled to be evaluating your host(s) at all, probably considering them somewhat of a threat, and so forth. Worst case, it will be a stranger's leave-behind rules for evaluating the same, and there will be all the personality of an automatic teller machine involved. When naming, put your best, most formal foot forward, and don't crack jokes in line at the bomb screening.

Names are detachable containers of information; don't assume the local context and assumptions and codes will survive translation to a new context.

Finally, names differentiate one object from another. If I call one name server itchy and another scratchy, I can tell the difference between the two. If I name every last one of my end user PTRs tm.net.my, or beamcablesystem.in, the names fail the differentiation test. Put another way, there is an inherent distrust in empty, generic labels - think of "Agent Smith" from the Matrix movies - he was not real, he was merely an expression or avatar of the Matrix, and could appear as one or many, and the more he was, the more threatening. Generic name, multiple copies translates as a threat, or at least diminishes trust.

Names should be unique and informative, not generic, if their referents are to provide important services you want strangers to trust.

Okay, that's enough for now, time to go get some hot dogs and a little beach time.

A couple more: kudos, and mixed kudos/gripe

Another ISP with the right idea: floodcity.net. Clearly delineated both assignment type (in the right MST position) and technology:

64-186-88-109.dialup.dynamic.floodcity.net [64.186.88.109]

One which has sort of the right idea: musfiber.com

110.113.dynamic.musfiber.com [63.246.246.110]

Unfortunately, they've also done this to delineate their pools:

116.ippool#1.musfiber.com [63.246.240.116]

In the original RFC that defines the limitations on DNS labels, RFC 1035, names are limited to alphanumeric ASCII characters and a hyphen:

The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

The 63-character length limit has been eased, and some movement towards i18n has been made, with the introduction of punycode, RFC 3492, but even punycode limits itself to alphanumeric ASCII and the hyphen while encoding the other characters. So, as with underscores, octothorpes (aka "hash marks", AKA "sharps", AKA "pound sign", etc.) are non grata in DNS labels.

And yet, they show up occasionally, as in one of my favorite abuses of the DNS ever (blissfully now repurposed as a Windstream dynamic IP with sensible PTR naming):

spam.complaints.(888)292-3827.alltel.senior.support.ticket#2-940727661 [216.96.36.58]

On the bright side, they did give you a phone number to call. To report network abuse. Excelsior!

A rare kudo: cgocable.net

I'd be remiss in my role as goad if I didn't occasionally reward those with the right ideas about PTR naming conventions, so let me just take a moment to say that the folks at COGECO Inc., a Canadian cable television and Internet service provider, have the right idea:

d141-145-153.home.cgocable.net [24.141.145.153]
d38-5-18.commercial1.cgocable.net [72.38.5.18]
d38-84-122.wifi.cgocable.net [72.38.84.122]

They've clearly delineated between their commercial, residential, and wireless services, and though they don't explicitly state whether each is statically or dynamically assigned, we still have enough to go on with respect to setting policies. If they want to be even more explicit than that in the future, then great!

Today's DNS Spotlight: Eircom

Ever wonder exactly where some infected computer is, when you find it is hitting your inbounds with repetitive requests to send mail to nonexistent accounts, as in where in the world, down to ICBM coordinates? Yeah, me, too. Unfortunately, despite the best efforts of groups like the the Prefix Whois Project, who provide eerily precise longitude and latitude for any given IP, geolocation is still an infant science. Usually, the best that can be done is to provide the ICBM coordinates of the company providing the service, which while satisfying to an owner of fantasy desktop nuclear weapons, isn't quite as satisfying as taking out the actual infected computer while leaving all else around it standing. Oh, well. Maybe someday.

On the other hand, some networks do make an effort to name their systems so that they could be found by, say, firefighters. Take, for example, this wifi node in a McDonald's in Dublin, Ireland:

83.70.120.247-dynamic.wlan-ce1.mcdonalds-50-lower-oconnell-st.cust.eircom.net [83.70.120.247]

Now, I've never been to Lower O'Connell Street, nor Dublin, nor Ireland for that matter (though my sainted Grandmother Betty was swept away as a war bride during the second World War, from her ancient homeland in Fintona, County Tyrone). And the pwhois coordinates for this IP are a good seven minutes drive on the south side of the Liffey from Lower O'Connell Street. But still. One imagines a well-launched predator trained on the wifi node, or maybe the more imaginative can picture a Terminator preparing to tap a customer replete with laptop and large fries, and a more satisfying ending (perhaps involving a Taser).

What's sad, as anyone who's been reading will instantly recognize, is that the "dynamic" keyword is a far cry from the Most Significant Token, and is not itself actually tokenizable by the weak, dot-delimited (and limited) MTAs we're blessed with, so in order to recognize this is a dynamically assigned IP you must use a regular expression. You may be able to use a substring based on "wlan-ce1", but still there's this business of a complete street address and business name to contend with. Do we need to keep track of the wifi node at Govinda's, the vegetarian place around the corner, too?

I don't know if Govinda's is an Eircom customer, but the Citywest Hotel is:

213.94.167.154-dynamic.wlan-ce1.citywest1-hotel-dublin.cust.eircom.net [213.94.167.154]

Now, we could, I suppose, just use "cust.eircom.net", but that is less than satifying (and they also use "customer.eircom.net", or used to). And it highlights another problem - just because an ISP or telco assigns an IP statically to a customer doesn't mean that customer can't then go on to re-allocate it dynamically to Big Mac eating, laptop-wielding, customers of theirs.

(A brief scan of the PTRs in 83.70.120.0/24 informs us of the depressing reality: there are several McDonald's on O'Connell Street alone... In fact, the lower /25 of that block seems dedicated to the McDonald's restaurants of Ireland.) On the other hand, they all seem to have wireless, so, make of that what you will.

So where does that leave us? We know the IP is dynamic; we can use a regular expression to capture many similar instances, provided that Eircom sticks with its naming convention for building out wireless LANs; we can even get within a few miles of the actual location if we wanted to launch an imaginary missile. But it would be so much easier if Eircom just used "dyn.eircom.net", or even "dyn-wlan.cust.eircom.net"; they already know it's going to be handed out dynamically via wifi. Why not just say it and group all similar IPs under the same top-level token?

new pats posted - 20090624-01 (maintenance pats release)

39299 patterns, 11333 right anchor strings, 170516 test IPs.

More from a CBL list.txt I recently resolved down to PTRs. This release
matches 99.995% of the PTRs in that CBL zone.

Was asked to start tracking couplets (pattern class and tech, taken
together as a sort of meta-identifier); there are no new couplets in
this release.

Download them here:

sendmail:
http://enemieslist.com/downloads/sendmail_access_db
http://enemieslist.com/downloads/rightanchors

postfix:
http://enemieslist.com/downloads/postfix_regexp_table
http://enemieslist.com/downloads/postfix_regexp_table-20090624-01

exim:
http://enemieslist.com/downloads/exim_hosts
http://enemieslist.com/downloads/exim_hosts-20090624-01

Today's DNS Superstars: Entel Chile

Take a look at the PTRs in 200.72.157.0/24 for a stellar example of why there should be a basic intelligence requirement for anyone touching the IN-ADDR.ARPA zones for any given ISP.

200.72.157.1: pcsistema1.eiser.local
200.72.157.4: pcproyectista1.eiser.local
200.72.157.5: pcsistema3.eiser.local
200.72.157.6: christian.christian
200.72.157.7: pcsecretaria.eiser.local
200.72.157.8: pcdibujante2.eiser.local
200.72.157.9: pcproyectista1.eiser.local
200.72.157.11: pcsecretaria.eiser.local
200.72.157.12: pcsistema1.eiser.local
200.72.157.13: regiones-serena.previred.lan
200.72.157.14: pcproyectista2.eiser.local
200.72.157.15: pcsistema1.eiser.local
[...]
200.72.157.247: pcdibujante2.eiser.local
200.72.157.248: pcsistema4.eiser.local
200.72.157.249: servidordsn.datasoftnet.local
200.72.157.250: pcdibujante1.eiser.local
200.72.157.251: pcproyectista1.eiser.local
200.72.157.252: regiones-serena.previred.lan
200.72.157.253: pcsistema4.eiser.local
200.72.157.254: pcdibujante2.eiser.local

I know, .local is actually a valid DNS suffix, in the context of zero-configuration networking discovery protocols, such as Bonjour or Microsoft Windows Active Directory, if the host is never to be visible from the public Internet. The presence of a few dozen of these very hosts in the CBL zones, however, suggests that something is wrong here, and they are actually capable of reaching the Internet directly. Nice work, guys.

I really should work up a badge for these sorts of things, maybe a bear trap containing a keyboard with hands caught in it.