March 29, 2013
Dear Network Administrators of the world
Please stop using the token 'pool' in PTRs you then assign statically. It's counterproductive and stupid. Thanks!
May 12, 2011
Oh, come on now.
January 18, 2011
A Modest Proposal: The ? Top Level Domain
Back in the Day, there were the Big Six Top Level Domains: .com, .net, .org, .gov, .mil, and .edu (also known as Categories). There were also Country Code Top Level Domains (ccTLDs), using the ISO 3166 two-letter codes for countries. This made sense as far as it went, though of course the usual confusion reigned, and RFC2606 even defined a few useful reserved TLDs (.example, .invalid, .localhost, .test) and .localdomain has been used as well, though it's not really part of the DNS so much as a hack to allow you to refer to localhost as though it had a domain part. Then came .nato, quickly followed up by .int (which made .nato unnecessary). Then in 2000 ICANN introduced .aero, .biz, .coop, .info, .museum, .name and .pro, each of which either ended up a wasteland of spammer "burner" domains or relegated to the usual practice of large brands buying their trademark in that TLD as well as all the others.
With the kerfuffle still ongoing about .xxx and .adult, and rumors of ICANN planning to introduce hundreds of new domains, thereby recreating the Usenet News hierarchy from the wrong end, I think it's time to put in a bid for what I consider to be the most obvious, useful, and necessary new TLD: ?
The ? TLD would work just as all the others, but with the convenient optimization that it need not be dot-terminated or separated (such would be assumed, so "foo.?." would simply be transcribed as "foo?"). This would allow grizzled security researchers to register quiscustodietipsoscustodes? (and variations on that theme), childrens' book authors to register whereswaldo? and whereintheworldiscarmensandiego? and put up Web sites to promote their wares and services without wondering whether .com or .org
or .info would be more appropriate. Fans of Roman history can gather at ettubrute? and nutjob Birthers can create thousands more Web sites asking variations on whereisyourbirthcertificateobama? Yiddish speakers can congregate around nu? and devotees of the classic Gallic shrug can chat at eh? (though the Canadians may want to register it first, eh?) I imagine a thriving shrine to Sly and the Family Stone at queserasera? and an online archive of MAD magazines at whatmeworry? Clint Eastwood fans would flock to doyoufeelluckytoday? and students of famed African-American and former slave Sojourner Truth could find extensive archives at andaintiawoman? Objectivists could obsess over whoisjohngalt? The current practice of certain media talking heads to ask questions as though they weren't trying to provoke a somehow incriminating denial could be accommodated handily.
The bottom line? The ? TLD answers a very real need for humans to ask questions of their environment and themselves and others; the hierarchical, noun-based Internet of hosts and labels is simply insufficient to reflect the yearning for answers we all feel to some extent or another.
The possibilities are endless. And the time is now. Please lobby your ICANN representative to consider this important idea.
November 6, 2010
$ host 18.104.22.168
22.214.171.124.in-addr.arpa domain name pointer \(argos3.cable.cbizz.nl.
November 5, 2010
November 3, 2010
Oh, sorry. I forgot .DE.
$ whois ikomsys.de
% DB time is 2010-11-03T21:20:00+01:00
How very helpful.
$ whois wi.gov
% DOTGOV WHOIS Server ready
Domain Name: wi.gov
Please be advised that this whois server only contains information pertaining to the .GOV domain. For information for other domains please use the whois server at RS.INTERNIC.NET.
Really? It's only been several years since rs.internic.net answered connections.
September 28, 2010
July 27, 2010
Well, that's just lazy.
Received: from [126.96.36.199] (PTR [188.8.131.52] (may be forged))
June 2, 2010
Admittedly, I'm Jaded, But...
This has to be the stupidest thing I've seen in a whois lookup result in a while:
$ whois ko.te.ua % This is the Ukrainian Whois query server #F. % Rights restricted by copyright. %
% The object shown below is NOT in the UANIC database.
% It has been obtained by querying a remote server:
% (pseudo.whois.com.ua) at port 43.
% REDIRECT BEGIN
% This is the Ukrainian Whois query server #O.
% Rights restricted by copyright.
remark: Domain exists.
remark: For more information contact firstname.lastname@example.org
March 25, 2010
Hey, now that's a good idea!
inetnum: 184.108.40.206 - 220.127.116.11 netname: FI-PHNET-BAANA-DSL-POOL descr: PHNet Internet Services descr: Ilmarisentie 20, 15200 LAHTI country: FI remarks: Dynamic range: 18.104.22.168 - 22.214.171.124 remarks: Static range: 126.96.36.199 - 188.8.131.52 admin-c: PNO2-RIPE tech-c: PNO2-RIPE status: ASSIGNED PA mnt-by: PHNET-NOC-MNT source: RIPE # Filtered
Too bad they don't do it on all of their blocks.
February 25, 2010
Why Netprospex Data is Worthless
So, in this down economy, I've seen the amount of spam mail received here spike - and most of it is obviously being sent to addresses from purchased lists. Here are some sample Subject lines from some of this stuff:
Sept. 2, 2009: The Rich Media Marketer Act Now and Save $100 on a Job Posting (Monster.com) Patient Condition* (419/phish) Change-Growth-Strategic Refocusing of Executive Talent (Barron/Carlington) Are you prepared? (from my former colo, but they bought the address) Linked-In Essentials for Executives - teleseminar - Two Days Left Your keyword banners advertising demo - Place billboard on top of Google Information Systems Opportunities (NETSHARE invite) Conference call request (Thoughbot) IT Consultants Onsite (hispeedstaffing.com) Hi Steve I have a question for you (SEO Experts Inc.) Evaluation of CRM solution (birlasoft.com) 2010 budget planning (openmetrik.com) Business meeting invitation with Hesketh.com (Oxagile) Complimentary Subscription Notification (businesswatchnetwork.com) Reminder about your invitation from Epro Tech (via LinkedIn) Current and Future EMC Celerra Customers (Graham Magnetics) Limited Time Offer - Buy Two Job Postings and Save $135 (Monster.com) FREE Webcast Invitation - "Thought Leadership Roundtable Webcast - Data Leakage" - CPE Credits - Register Now! (WhiteHatWorld.com) The Post Office Now Brings Cake for the Holidays (postcake.com)
This month has included Victoria's Secret Valentine's day spam, energy tax credits for window replacement, lots of recruiter/staffing companies, lead management software, network services, the usual Indian "request for meeting" outsourcing crap, diploma mill stuff, capital advisors, business phone solutions, WAN management software demos, wireless telecom services, Liberty Direct insurance, medical billing degrees, "Green IT Economic Summit" invite, Six Sigma workshops at Dartmouth, Sharepoint developer outsourcing, cheap auto insurance, free (actually "complementary", >snicker<) cruise, Prudential Financial insurance, Blue Star Infotech, "e-billboards" SEO spam, Singlesnet, satellite TV, Ziff Davis Enterprise spam, Dish Network, new cars, home warranties, term life insurance, more Dish Network, more auto insurance, and tax refund services.
several of which even contained references to the address of our office from 2000-2005 (we've moved twice since then).
How do I know the lists are purchased? Because the address that has seen nearly a fifth of the spam received here this month:
- has never been used to send mail
- has never been used to sign up for any mailing list
- is an obvious e-pended creation from names and initials
- has been seen in prospecting email from people who admitted they bought it
- is actually used by ESPs to vet imported lists because of these facts
- is associated with my informal name and ostensible title
- is the target of mail containing the address of our old offices (from which we moved in March 2005)
I'd seen a fair bit of mail to that address from people I asked about where they bought the list, and a couple of them even admitted to buying the list (both from Lead411). So I was already aware that there are services out there that:
- make up addresses
- make up some information to go along with them
- don't verify the stuff that might actually be true
- and then sell them to idiots
I just didn't know the scope of the problem until I read an article recently mentioning Netprospex. The usual silly marketer mentality was in full evidence (read: "of course it's okay to email people you don't know, just don't do it in bulk!") and I figured I'd better check out whether or not Netprospex was any "better" than Lead411.
There's a way to search for business by name, so I searched. I got three results, two of which were applicable to my company. The first, rated at "94% accuracy", and which claimed to have been "verified" (that's the CEO's tagline for Netprospex, remember) on October 23, 2009, was the name of our former CFO, who left the company altogether in early 2007. The second, rated at "61% accuracy", was for me, ostensibly the "Chief Technology Officer" of our tiny Web dev shop, and claimed to have been "verified" August 24, 2009. I didn't feel like spending $1.50 to download my own contact information, especially since they have us classified as part of the "Packaged Software Industry" (we build Web applications for not-for-profits, FWIW, you be the judge) but I suspect the address they have is incorrect.
You can click through any contact, however, so I did - and got a popup showing our area code, the first digits of our former office's street address (we moved in June of 2008). Laughing uncontrollably at the train wreck that is Netprospex now, I clicked through the "94% accurate" and recently "verified" contact for our CFO - yes, they have the two-offices-ago address for her.
Oh, and despite all the obvious effort they've gone through to collect and maintain this information about our company's officers, they don't have a record for our CEO and founder, after whom the company is named.
In other words, Netprospex data is basically nothing more than comically embellished bullshit. Which you can pay for!
January 11, 2010
A New Metric for Gauging the Uselessness of a Given WHOIS Server
I've decided that I am from now on going to rank WHOIS servers by the percentage of actual useful data returned in a query response to the amount of useless boilerplate cruft returned in the same query response. Right now, the .name WHOIS server is ranking near dead last.
Nota bene: CIRA, by returning little useful information for many queries, but keeping its boilerplate/disclaimerese to a minimum, is screwing with my "rank by percentage" approach, but still.
November 6, 2009
Why I Harbor a Strong Dislike for .PL
This is the regex I have to use to match the second level domain part of domains in .pl (in order to figure out what the domain part of a hostname is):
(aid|agro|atm|auto|biz|com|edu|gmina|gsm|info|mail|miasta|media|mil|net| nieruchomosci|nom|org|pc|powiat|priv|realestate|rel|sex|shop|sklep|sos|szkola| targi|tm|tourism|travel|turystyka|augustow|babia-gora|bedzin|beskidy|bialowieza| bialystok|bielawa|bieszczady|boleslawiec|bydgoszcz|bytom|cieszyn|czeladz|czest| dlugoleka|elblag|elk|glogow|gniezno|gorlice|grajewo|ilawa|jaworzno|jelenia-gora| jgora|kalisz|kazimierz-dolny|karpacz|kartuzy|kaszuby|katowice|kepno|ketrzyn| klodzko|kobierzyce|kolobrzeg|konin|konskowola|kutno|lapy|lebork|legnica|lezajsk| limanowa|lomza|lowicz|lubin|lukow|malbork|malopolska|mazowsze|mazury| mielec|mielno|mragowo|naklo|nowaruda|nysa|olawa|olecko|olkusz|olsztyn| opoczno|opole|ostroda|ostroleka|ostrowiec|ostrowwlkp|pila|pisz|podhale|podlasie| polkowice|pomorze|pomorskie|prochowice|pruszkow|przeworsk|pulawy|radom| rawa-maz|rybnik|rzeszow|sanok|sejny|slask|slupsk|sosnowiec|stalowa-wola|skoczow| starachowice|stargard|suwalki|swidnica|swiebodzin|swinoujscie|szczecin|szczytno| tarnobrzeg|tgory|turek|tychy|ustka|walbrzych|warmia|warszawa|waw|wegrow|wielun| wlocl|wloclawek|wodzislaw|wolomin|wroclaw|zachpomor|zagan|zarow|zgora| zgorzelec)
Yes, really. It's like the shining example of the .us TLD, but they also allow domains in the .pl TLD as well. Oh, wait. So does .us now.
October 22, 2009
Will the stupidity ever end?
Today, I found this in our quarantine.
Return-Path: email@example.com Received: from q3email.securesites.net (q3email.securesites.net [184.108.40.206]) by tabasco.hesketh.com (8.14.1/8.14.1/20080606) with ESMTP id n9MDXAFr005510 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <ELIDED>; Thu, 22 Oct 2009 09:33:28 -0400 Received: from Q3GN0038 (del-static-174-68-7-210.direct.net.in [220.127.116.11] (may be forged)) by q3email.securesites.net (18.104.22.16860614/8.13.6) with SMTP id n9MDDISh055841 for <ELIDED>; Thu, 22 Oct 2009 13:33:10 GMT From: Feroz Zaidi <firstname.lastname@example.org>
So, an entity calling itself "securesites.net", located in Colorado, accepted mail from a host in India, probably New Delhi, that HELO'd with an unqualified hostname (something we haven't done since MyDoom was easily stopped as a result back in 2003), from a host whose name does not resolve to its IP, without any evidence of SMTP authentication, sending from a customer's domain (check out the whois for q3tech.net and q3tech.com) to a third party (us).
The sender MAILFROM, based on the date, is obviously designed to allow them to throw it away once the spam run is over, or distinguish between bounces from one spam run to the next.
Now, this could be that Q3 Technologies (if you're at the Javits Center in NYC for Global Sourcing Forum + Expo, and want to buttonhole them about why they're spamming CEOs of Web development companies, be my guest) hired someone in India to send this invite on their behalf (the clickthrough tracking links are all pointing to 22.214.171.124:81, on a non-standard HTTP port, on an IP in New Delhi), and "securesites.net" has its mail servers set up to allow relaying for anyone sending from an address in a certain set of domains. But is this at all wise? I don't think so.
The clincher here is that Q3 Technologies is a customer, making "securesites.net" complicit in the spam run, never mind their poor security otherwise. Whether it is CAN-SPAM compliant or not, these jokers are allowing the insecure relay of UCE via their mail systems, and in obvious ignorance of longstanding practice.
What makes this even more suspicious is that while they allowed relaying for <email@example.com> from that IP in India, they disallow it from my server:
$ telnet 126.96.36.199 25 Trying 188.8.131.52... Connected to 184.108.40.206. Escape character is '^]'. 220 q3email.securesites.net ESMTP Sendmail 220.127.116.1160614/8.13.6; Thu, 22 Oct 2009 16:18:18 GMT HELO tabasco.hesketh.com 250 q3email.securesites.net Hello smtp.hesketh.com [18.104.22.168], pleased to meet you MAIL FROM: <firstname.lastname@example.org> 250 2.1.0 <email@example.com>... Sender ok RCPT TO: <firstname.lastname@example.org> 550 5.7.1 <email@example.com>... Relaying denied. Proper authentication required. QUIT 221 2.0.0 q3email.securesites.net closing connection Connection closed by foreign host.
It's possible that they've seen so many complaints already that they've shut down this spammer, or tightened up their configuration. More likely, they've whitelisted mail from the IP in India for the express purposes of letting them send mail via their systems. One day, I hope I don't see any more of this nonsense.
September 21, 2009
getpaidsolutions.com weighs in on the "is Pluto a planet" question
mercury.getpaidsolutions.com [22.214.171.124] venus.getpaidsolutions.com [126.96.36.199] mars.getpaidsolutions.com [188.8.131.52] jupiter.getpaidsolutions.com [184.108.40.206] saturn.getpaidsolutions.com [220.127.116.11] neptune.getpaidsolutions.com [18.104.22.168] uranus.getpaidsolutions.com [22.214.171.124] Host pluto.getpaidsolutions.com not found: 3(NXDOMAIN)
September 18, 2009
DHCP doesn't necessarily mean dynamic
Any systems or network administrator who has worked with DHCP knows that it can be used to dole out fixed/static IP addresses to clients, but the far more common use in residential Internet service is for it to dole out dynamic IPs from a pool. The occasional exception arises now and again, and here's one:
A quick check of WHOIS shows this host is in a /22 block marked as "PrairieWave Static Host Assignment". Oddly, the IP in question is listed in the Spamhaus PBL, which is notorious for avoiding listing static generic IPs. So, who's right here? The ISP? Spamhaus?
I try a few more IPs, all marked as generic by Enemieslist some years ago, but containing the DHCP token, and they all fall into blocks marked in WHOIS as static. OK, then, let's try one marked explicitly as static in a token:
All WHOIS says is that these are "PrairieWave DSL DHCP". Well, let's try another. This one also has "static" in the name, and it's marked as static in WHOIS as well. Unfortunately, the IP whose name I'm trying to add a pattern for now also contains the "static" token, but the WHOIS output is mum as to its type (DSL?), so no way to tell that part; it looks like they offer cable, DSL, wireless, dialup and colocation as well. Their Support FAQ says that static IPs are available, but doesn't say anything about custom PTRs.
Probably the best bet we could make here is to assume that because they're using DHCP, but presumably want to reduce support costs, they're mixing their static and dynamic IPs within the same blocks (as we saw with the "DSL DHCP" remark in the WHOIS output for the host with the static token in the name), but as static IPs are clearly not the default, our default is to assume dynamic—especially because they are known to use "static" tokens in the names of statically assigned hosts.
A quick scan of a few subnets suggests that they will provide custom PTRs for their customers requesting statics, so we're sticking with "dynamic" for the DHCP hosts, WHOIS notwithstanding. In this case, though, we do it knowing we may well be wrong. Worst case scenario is that the IPs with those "DHCP" names are actually "mixed" static and dynamic.
Now this is just silly (WHOIS FAIL)
So, today I was trying to research a domain I had added a pattern for (one of its hosts was reportedly sending spam to traps, it was generically named, and so into EL's pats database it goes). The domain? krawarkon.pl. The host itself had no tokens or indication of what it might be, as is often the case.
OK, I know from cruel experience that Polish WHOIS service (via the whois protocol) is useless, generally only giving created on dates, a contact handle and some information about how to contact the registrar by postal mail or phone. And sure enough, I was right.
$ whois krawarkon.pl
registrant's handle: nta90646 (CORPORATE)
nameservers: dns.tkb.pl. [126.96.36.199]
created: 2006.03.31 17:15:35
last modified: 2009.03.12 11:37:07
+48.801 33 22 33
+48.12 297 88 10
+48.12 297 88 08
OK, so now I know the domain is registered. Woo-hoo! If I wanted to call Poland, or drop them a postcard to ask what sort of company with the domain krawarkon.pl is, I'd be all set. Or I could bug the admin contact via email, if s/he spoke English or I translated a question into Polish. Well, sometimes their Web-based WHOIS service is better (though not usually; often enough you go through the painful process of guessing whether or not the CAPTCHA they use is going to be case-sensitive or not, only to find that a lookup of foo.pl returns the useful information that it belongs to "FOO Sp. z o.o."). So, I gave that a try, and lo! The CAPTCHA entered correctly, I got back the following:
REGISTRANT: company: PPUH "KRAWARKON" Spóka z o.o. street: Burgaska / Warszawa 2-4 city: 02-758 Warszawa location: pl last modified: 2006.03.31
Well, that's more than I had. Now I know they are a Polish company, in Warsaw, that their full name is PPUH "KRAWARKON" Spóka z o.o. ("Sp. Z o.o." being roughly translated as "Polish Limited Liability Corporation"), and that they're only three or so years old. Well, that doesn't give me an indication of what
inetnum: 188.8.131.52 - 184.108.40.206 netname: KRAWARKON descr: PPH Krawarkon Sp. z o.o. descr: ul. Burgaska 2/4 descr: 02-758 Warszawa country: PL admin-c: MW1321-RIPE tech-c: PB4904-RIPE status: ASSIGNED PA mnt-by: TPNET source: RIPE # Filtered
Same information, more or less, if slightly abbreviated. No comments or remarks to the effect that you'd be able to tell what sort of service they have. It's only a /24, which suggests a small company or one of several blocks assigned to a larger one; a quick query to whois.ripe.net for "Krawarkon" later, and I know they have several netblocks, nine /24s to be exact, so I figure it's a pretty small company, but it's probably not simply corporate if it has nine /24s for two locations. OK, then so now I know it's probably an ISP of some kind. (It's probably not a telephone company, because there aren't that many in Poland, per this Wikipedia article, and what few there are tend to have "tel" or "evdo" or "gsm" in their names, TPNET and Netia notwithstanding).
By now, I'm getting annoyed, though honestly I shouldn't complain much - at least the WHOIS records contained the company name (they often don't in some other countries, like Russia), and the name wasn't immediately followed by the name of some engineer or other (which confused me for the first couple of years I did this). But all I really wanted to know was this: is the host dynamically assigned, and what sort of service is it providing?
As a last resort, I finally go to the Web site, and learn immediately from the title of the home page that we're dealing with a triple-play Internet/Television/Telephone concern (from the Polish "Internet Telewizja Telefon"). But the home page is in Flash, so I skip the intro and get to the next page, whose title reads "Telewizja Kablowa Krawarkon", or "Krawarkon Cable Television" (even I can read that much Polish). Great. But the text on the rest of the page is in Polish, too - so I jump over to Google Translate and see what I can learn about their service. Unfortunately, with the exception of the text "about us", it's all
Once again, the table containing the prices, while obviously showing that we're dealing with broadband cable (and television, which almost always denotes residential service), is also a graphic and is not translated. Besides the indignity of learning that if I were in a former Communist Bloc country I could be getting 10Mb/s for 122 zlotys (or less than the $45/month I'm paying at home for 6Mb/s from Road Runner), I learn nothing about dynamic or static assignment or NATs. Assuming residential cable customers probably get dynamics, I could stop here - but I notice in the image that one "a la carte" line reads "dodatkowy adres IP", so I jump over and translate that. "Additional IP address". Pricey at another 61 zlotys. But that suggests static assignment, so I have to be satisfied with the possibility that there may be static IPs in the mix as well as dynamics. Fortunately, there's a link to "Rules", or their acceptable use policy or terms and conditions, so I check it quickly to see if it has anything to say on the matter of whether they allow customers to run mail servers on their cable Internet. If there's anything more obtuse than legalese, it's legalese auto-translated from Polish. Nothing in there prohibiting someone from running a mail server.
I then check to see if the IP is listed in Spamhaus' PBL; nope. Not that this means it's definitely not dynamic; PBL doesn't list a lot of IPs we have patterns for, even dynamic patterns. A quick lookup of the entire range should help me make a final determination. Nope, no custom reverse DNS at all (which if seen might indicate at least some statics). Try all of the ranges we know about. Just a few obvious static hosts at the beginning of those ranges with generic PTRs like the one we first found in the trap feed. A couple of mail servers in the krawarkon.pl domain (poczta being Polish for "mail"). No custom reverse DNS in most of the ranges, beyond gateways and the few mail servers already mentioned.
Maybe the customer service link in the navbar will help - sometimes these contain screenshots (usually of Windows Networking control panel dialogs) showing how to configure DHCP. Nope, just hours of operation and phone numbers.
One other thing I can try: rsync the CBL list.txt file and check for listing density. As the ranges assigned to Krawarkon are all /24s, I can just grep for the first three octets of each. As I'm waiting for the rsync download to finish, I check the Enemieslist callback logs, which indicate reasons for acceptance or refusal of connections by certain users of the sendmail package. Lots of localhost and unqualified HELOs, all rejected by the reporting mail servers. Now that the CBL is done rsyncing, I check the densities and find:
220.127.116.11/24: 3 18.104.22.168/24: 1 22.214.171.124/24: 1 126.96.36.199/24: 0 188.8.131.52/24: 1 184.108.40.206/24: 0 220.127.116.11/24: 2 18.104.22.168/24: 4 22.214.171.124/24: 7
Few enough, especially for a Polish cable provider, but still looks like there are several bot-infected hosts in there. Few enough that I can check them manually using the CBL Lookup form; all bots: rustock, mega-d, grum, bagle-cb, cutwail. All of those in the 80.51.73/24 block are Rustock, suggesting that it may be the same botted host, getting new IPs. I make the final call - these are dynamic hosts.
Obviously, I don't always go through all of these steps when classifying hosts; often the class is obvious from tokens in the name or comments in the whois lookup. My point in writing this is to illustrate how a typical, truly generic, hostname is perniciously content-free, and how useless most of the traditional resources we might have been able to expect to use in order to gain information about such a host really are. If Krawarkon's hostmaster had simply used "cable.dyn" or similar in the naming convention, all this pain could have been avoided - along with all of the bot spam.
September 16, 2009
If your PTR contains or ends with IN-ADDR.ARPA, you're Doing It Wrong. Go read a book. That is all.
September 11, 2009
August 27, 2009
Oh, Ukraine, with your insane politics and beautiful politicians and inscrutable ways; I can't even begin to imagine what you were thinking when you decided upon this particular naming convention:
down.voucher.volia.net [126.96.36.199] downfallless.pants.volia.net [188.8.131.52] dozenness.falsetto.volia.net [184.108.40.206] alien.insinuate.volia.net [220.127.116.11] drafting-literature.volia.net [18.104.22.168] drapesly-commuter.volia.net [22.214.171.124] drawbridgeless-prospect.volia.net [126.96.36.199] drawing.copybook.volia.net [188.8.131.52] dressless-debate.volia.net [184.108.40.206] drill.abacus.volia.net [220.127.116.11] drilling-stupidity.volia.net [18.104.22.168] drinking.lament.volia.net [22.214.171.124] drugless-increment.volia.net [126.96.36.199] drunk-ruling.volia.net [188.8.131.52] alien.surprising.volia.net [184.108.40.206] dry-seeker.volia.net [220.127.116.11] dubing-stub.volia.net [18.104.22.168] dutchness-reentry.volia.net [22.214.171.124] dutchness.birthrate.volia.net [126.96.36.199] dwelling-bookends.volia.net [188.8.131.52] dwindling-poultice.volia.net [184.108.40.206]
Yes, in order to name their end user residential cable and DSL modem pool hosts,
The problem, for us, is that when we try to come up with a pattern for this sort of naming, we have to use things like:
To match the "contemporary" host above, we have to just match:
The real problem is that pattern also matches other hosts that are most likely part of their infrastructure, or unassigned:
cmts.volia.net [220.127.116.11] metro.volia.net [18.104.22.168] unknown.volia.net [22.214.171.124]
And we don't want
I'm sure someone thought this was pretty clever. I'd love to hear the story behind it, actually. But it's a real pain in the neck for antispam folks. Please, just block outbound port 25 and
August 14, 2009
DNS laughs, continued
August 13, 2009
DNS laughs, yet another episode in the ongoing series
I have no words.
July 30, 2009
Why I have a headache today
Enemieslist started out as a loosely defined effort to stop spammers using botnets on hosts with generic names. This was sort of an innovation at the time, because most postmasters were only worried about "dynamics". Blocking mail from "generic" hosts is a fairly straightforward concept, however, and back in the dark ages of 2003, it was inspired by two related issues.
Things had gotten to the point where we got so much spam that it meant reporting it was a massive waste of time. This was especially true when dealing with entities that were big enough to have a significantly large customer base that they also had a correspondingly large number of customers with botted PCs. I'd received enough auto-acks by then, and was tired of policing ISP's networks for them. It's one thing to report an open relay to postmaster on a single MTA; it's quite another to report all of the spam you get from, say, BellSouth customers. Every day. For years.
The other thing that contributed to my overall concept of "generic" at the time was that a spammer named Brian Westby had used a forged address in our domain as the sender for a massive spam run, lasting several weeks and only ending when the FTC cut him off. We got many tens of thousands of what would come to be called "outscatter" messages, many including the headers, if not the body, of the original message they'd received, then later decided to bounce "back" to the purported sender. It was obvious that the original messages had been sent directly from end user nodes with generic names; back then, a fair number of them were dialups though that has changed with the broadband rollout worldwide. And they all had generic, provider-assigned names. Often, they included some form of their IP address, or a token indicating what they were: dialup, DSL, cable, and other typical end-user market services (contrasted with leased lines and the like).
Reasoning that if we didn't want the outscatter from spam that had been received from these hosts, we probably didn't want messages directly from them, either, and drawing on a background that included experience with regular expressions, I started building a database of patterns that matched the names I'd seen in the headers of the outscatter messages. At first, I didn't classify anything - simply being generic was enough - but over time I started to distinguish between dynamic and static and various subcategories such as NATs and proxies and webhosts and resnets. As the data set grew, I found myself tracking other subsets, such as "outmx", simply to prevent myself from making a mistake classifying some of the more weirdly named legitimate mail sources. And with that, the definition of "generic" was stretched, perhaps beyond redemption.
A recent experience we had serves as an illustration of how weird this can get. We got some spam from a host with a dyndns.org HELO. As DynDNS is a service that allows people to set up static DNS labels pointing to (probably) dynamic IPs, it's a pretty clear case of "this IP is probably dynamic", so we classified everything under dyndns.org as "dynamic". We quickly realized that there is significantly more clue allocated to the sorts of folks who know to get a dyndns.org hostname, and it's likely they're the sorts who like to run personal mail servers and so forth. Disregarding the issue of whether their ISP forbids such practices, or charges more for static IPs and custom PTRs, the bottom line is that the
So, how should Enemieslist classify such a host?
The question is what my old college professor would have called "transgressive"; it isn't easily answered in the simpler context in which it is asked, and in fact tends to disrupt the categories and framework of concepts that the question relies on. Because DynDNS allows you to choose any label you wish for your hostname, it's not technically "generic" - defined as "relating to or common to or descriptive of all the members of a genus (or set)". You can make the argument that all of the members of that set have a common characteristic, namely, they're masks or aliases for other names, but that's pretty weak.
I've already talked a bit about why the concept of "dynamic" versus "static" is wiggly, namely, it's merely a matter of intention rather than of duration, but this goes further. It's actually a problem we have with several of our subclasses, such as NATs and resnets, and it boils down to this: in the context of reputation, names are only truly attached to their IPs, but not necessarily to the hosts that occupy those IPs, or use those IPs as injection points for their traffic, at any given time. This is of course true in a sense for all botted hosts, whose activity is controlled from afar via a "command and control" or C&C host, after all.
So when we judge a naming convention, we're making assumptions about the hosts that will be assigned the names' corresponding IPs. And even then, we're making assumptions about the traffic that will come out of those IPs - in the case of an insecure NAT, that traffic comes from a host on the LAN or VPN behind that interface, for example. Resnet IPs may be statically assigned to drops in dorm rooms, but if the computers using those drops change every semester, the reputation of those hosts may change as well.
So we are forced to make judgements on hosts as a set, or class, based on what we know about their names and the netblocks that have hosts with those names. Usually, the host part of the label is sufficient to judge dynamicity; we don't often need to examine the domain name itself (though all of our patterns are fully anchored, and sometimes they provide clues about technology), "genericity" is determined through observation (such as if the IP is part of the name) or by having multiple examples fitting the same pattern. But in the case of DynDNS,
The final resolution to this question, in the case of DynDNS, is unclear. One saving grace of it is that any host that HELOs with a dyndns.org hostname is also likely to have a dynamic, or at least generic, PTR, or no PTR at all, so all is not lost. But it may well be that we will need to create a new class, as we have had to do to describe "cloud computing" services, and assign it to all of the domains used by DynDNS and other such services. Another option is to simply avoid classifying it at all. But neither of these is satisfying, given that our purpose here is to help evaluate the risk that a specific host may be part of a botnet, on a sliding scale with dynamics at the "bad" end.
So, a headache.
July 13, 2009
Why we treat hosts with generic static PTRs as suspect
A perennial debate has just arisen again, twice in the same week, and no, it's not "vi vs. emacs", it is the question of whether static IPs with generic names are sufficiently high-risk for us to block or score mail sent from them. Of course, everyone's mail flows are different, as are their tolerances for spam and other abuse. So you'll need to frame your own views of the matter in terms of your local policies, how your tools work, your aims in looking at PTR and HELO, etc. But let me lay out the history, outline my basic argument, and back it up with some data from a recent CBL zone.
Historically speaking, the concept of "dynamics" in the context of blocking spam arose out of the use of throwaway dialup accounts to send spam; early ankle-biter spammers, derided as "chickenboners" (in reference to their imagined mobile home littered with chicken bones) were assumed to be low-rent halfwit losers who'd fallen for some get-rich-quick-by-email scheme or similar. As it is a basic tenet that mail servers themselves should have relatively static addresses (because if they act as MX for their domain, you don't want to give up your IP to some other dynamic host that might then start receiving mail already queued up to be sent to your old, dynamically assigned, IP), the rough distinction between "dynamics" and "statics" was born.
Administrators started blocking based on whether an IP was believed to be dynamic, using various mechanisms. Early attempts included simple (and overly simplistic) regular expressions, which treated any host with substrings common to dynamics as suspect: "dial", "dyn", "ppp", etc. Blacklists such as SORBS DUL and Dynablock started up with the sole intent of listing such IPs. And Enemieslist started collecting fully-qualified regular expressions for "generics", classifying them as dynamic or static as best we could; our sendmail antispam package treated generic and static as less risky in most contexts, but still scored them as somewhat risky.
(As an aside, I fondly recall the days when you could get spammed via an obvious dialup account and respond by sending them the "Ping of Death" - ping with a custom payload such that when the remote host's TCP stack sent it back it sent the Hayes modem command for "hang up now", which the modem promptly obeyed. Heh. Good times.)
Anyway, with the rapid rise of so-called "broadband", or high speed residential Internet connections (cable, DSL, fiber optics, WiMax, and more) the distinction has become blurred - for example, I had the same "dynamic" IP assigned to the cable modem at my house for three years, during which time we renumbered our connection to the office three times, using three different providers. (Fortunately, we had hosted our servers in a colo throughout, so it wasn't an issue from the perspective of dealing with email). What was once "dynamic" in practice as well as in definition has become more static in reality. So, the definition broadened, or morphed, to denote IPs that were assigned, however dynamically or statically, to "end users", whose PCs were presumed to get infected more than hosts on commercial networks with dedicated IT staff to watch over them. And this may well be the case. We certainly see more spam traffic from hosts we've classified as dynamic than we do from those we've classified as static.
Nonetheless, while it may be true, and effective, to make the argument that you don't want to accept mail from "dynamics", because the risk of that traffic being spam in this million-host-botnet day and age, it doesn't hold that just because a host has a statically assigned address, and a generic name, that you may not want also to filter mail from there. (I'm leaving aside for now the question of whether it makes sense to run a small home mail server from a dynamically assigned IP, using dyndns or other mechanism. I have my opinions, but they're not particularly germane to the central point I'm trying to make about statics.) Why should we treat any host with generic PTR as suspect, regardless of assignment type?
The simplest way to think of this is in terms of Venn diagrams. You can divide the Internet into three basic classes of IP: dynamically assigned "leaf nodes", or end users, statically assigned leaf/end user nodes, and statically assigned infrastructure nodes (over which the intermediate traffic flows). You can also include a slice for unassigned or unused IP space, and further subdivide them into still more categories, but those are the basics as far as we're concerned. Hosts with the first type of address are what most people think of when they think of botnets - your grandmother's home PC, connected directly to the Internet via high-speed DSL, infected with perhaps several bots, and spewing spam just under the rate-limiting threshold. But when people think of the second kind of hosts, for some reason they don't think exactly the same thing.
Statistically speaking, this is odd because even if you take out the "infrastructure" static IPs mentioned above, most of the remaining "leaf node" statics will not be mail servers. At our tiny office, we have a /27 at our disposal - 32 static addresses, five burned as network infrastructure (network, broadcast, cable modem, and two NAT/VPN hosts); of the rest, only two send any mail at all, and one of those is a spamtrap server and by definition only forwards known spam to feed remote trap sinks. That's one, possibly two if you count the trap server, of the remaining 27 statics approved to send mail to remote servers. Many businesses are in a similar position, or use hosted offsite mail solutions for their MXen. Even universities with large (typically /16 allocations) with public static LAN IPs fall under similar ratios - the larger the network, the more likely they will have a dedicated subnet or subnets for mail infrastructure, often with custom PTRs. So the remainder of hosts with static assignments will likely have generic names and not be marked as legitimate mail sources. (Note that this excludes mail sent within an organization, which in modern times is usually sent via authenticated connections such as SMTP AUTH.)
A couple of special cases to consider are Web hosting providers, which are typically statically assigned (though with the rise of "cloud" computing even this is becoming less common), and NAT and/or PAT firewalls, which although static may have multiple, even dynamic, hosts sending traffic through them. Enemieslist classifies these as well, because of the heightened risk associated with them; the former are commonly compromised and used in phishing attacks, and the latter are often not secured properly against unauthorized outbound port 25 traffic.
But back to statics - I said I'd share some data. I resolved a whole CBL zone back in May, and once I'd stripped out the IPs without any PTR at all, had a set of 4169150 unique hosts. Of those hosts, Enemieslist has patterns that match all but 19082 of them (99.54%). Of those, they are classified as follows (bear in mind that it's probable that many of those simply classed as "generic" may be either dynamic or static or even a mix of both; we're trying to reduce the number of "generic" patterns, but it's a long slog and may be hopeless in most cases due to lack of information).
|19082||no enemieslist classification|
Okay, looking at the data, obviously there are many more dynamics in that list than anything else; roughly eleven times the number of statics, and roughly twenty times the number of generics. But notice that 328227 of the hosts are static - roughly 8% of the total. Throw in NATs, webhosts, resnets, and mixed, and we're looking at 9%, and if you assume that generics are static (because not obviously dynamic) we're talking about more than 1 in 8.
Still think it's not worth your while to block, or at least score, static IPs with generic names, as probable sources of spam? Admittedly, the risks are slightly higher - there are of course going to be more actual legitimate mail servers in amongst those statics, and the word about custom PTRs for mail servers hasn't quite gotten out to many admins in small businesses and the like. But still, an impressive number of statics are infected and spewing on any given day, and if you look at the numbers above you'll see that even the hosts Enemieslist doesn't match (which are, for the most part, one-off mail servers with custom PTRs that we haven't bothered to make patterns for) amount to less than six percent of the number of statics.
So, the next time you start to argue that blocking mail from static generically named hosts isn't worth the risk, ask yourself whether it's actually worth the risk of false positives to let in traffic from the 94% of the infected, known static hosts above.
My new favorite PTR
I thought I'd seen it all. I hadn't.
July 10, 2009
A passionate cry out to Web hosting administrators
In my kick-off post in this series, I said:
Web hosting and colo providers should already be forcing low-end customers' mail through their own carefully monitored smarthosts, to reduce the amount of spam and other abuse coming from oft-compromised hosting control panel platforms such as cPanel. I used to be disgusted by the folks who named their webhosting PTRs things like "hosted.by.example.net" or "2gbamonth.for.just.7.95.example.net", but now I love them. They're just the most perfect indicators of super cheap mass virtual Web hosting, from whom I almost never want any mail unless sent through a smarthost.
I'd like to take some time to expand on that a bit. As always, this is my viewpoint as formed in the crucible of trying to create regular expressions to help classify hosts by type, so take it with that caveat.
In short, mass virtual Web hosting sucks. It's a game where only the very largest, with huge economies of scale, can hope to win, and where much of that potential profit is driven by incompetent affiliates and resellers. The race to the bottom is made clear by the sheer existence of Web hosting companies that will host your basic Web site for under ten bucks a month, or even as low as two bucks a month including free domain registration. It's the IT equivalent of selling health club memberships, but the problem is that if your host is compromised with a virus at a health club, the effects tend to be fairly localized. If your host at a Web hosting provider is compromised with a rockphish kit, you're potentially affecting the entire Internet, and enabling criminal fraud to occur through your server, so it's serious stuff we're talking about here.
That said, it's wise for us to know where the mass virtual Web hosting service providers (WSPs) are, so that we can assess the risks of accepting traffic from them. But whereas with end-user dynamic space, this tends to be easy because the hostnames are so often generated (or $GENERATEd), with webhosts we face a slightly different problem: administrator creativity.
For example, Pair is one of the oldest WSPs that I know of that is still around. We've used them for hosting our own clients, they are reasonably priced, and they have their own homegrown control panel software (as opposed to the ubiquitous, and often unpatched and vulnerable, cPanel, Ensim, Sphera, Plesk, and so on). And their servers have some of the weirdest names I can think of. We had a client on one server named kodh, and another served from a box called inyo. The former, as it turns out, is a city in India. The latter is a National Forest in the wilds of California's Sierra Mountains. Neither is immediately identifiable to the uninitiated as a mass virtual Web hosting server.
Pair wins high marks for creativity, especially compared to the pedestrian efforts of others who choose simpler themes - cities and geographical locations (pair.com, dattaweb.com, hostforweb.net, server4free.de, serverenred.com, worldispnetwork.com) easily trump the use of elements (webfusion.co.uk), deities (lunarservers.com, plusserver.de), colors (fastwebserver.de), cars (websitewelcome.com), short nouns (securenet-server.net), famous ships and spaceships (acetech-inc.net), the NATO phonetic alphabet (server4you.de), astronomical entities (lunarservers.com, websiteactive.com), peppers (my personal favorite, as well as that used by mein-webperoni.de), and, of course, the "hosted-by" or "powered-by" so popular in the Netherlands and Luxembourg. Though I will confess it's difficult to imagine the qualitative difference between being "hosted-by" and "powered-by", one can easily imagine the quantitative.
Then there's first names (asmallorange.com, mchost.ru, zonepro-serveurs.net), trees (e3linux.com), possibly U.S. Presidents (genwebhost.com), and even one host who uses Latin (not merely Roman, but Latin) numbers (eCircle AG). nocdirect.com seems to have trouble making its mind up between Lord of the Rings characters, stars and places. mschosting.com seems to be using Polynesian or South Asian names. liquidweb.com appears to favor comic book superheroes, while lunarmania.com seems to like cool-sounding names, whether from Welsh royalty, Greek mythology, or that word they use for a whole-face tattoo. Hostasaurus.com appears to let its customers choose their name, though it's still in the hostasaurus.com domain, so it's a perfect muddle of randomness as far as we're concerned. Dotcomhost? Why, Dilbert and Simpsons characters, of course. Dreamhost? Again, not sure - possibly all customer chosen.
My favorite "advertising" PTR, though, has to be this one from saol.com:
If the late Billy Mays had been a web hosting admin, he couldn't have done it better. (Though he probably would have thrown in a few more PTRs just to ensure blanket, or slanket, coverage.)
So, why do we care that some admins are creative, while others seem stuck in a rut, when it comes to naming all these mass virtual Web hosting servers? Frankly, we don't, at least not much. It's a pain in the neck for us to collect all of these new names, but not that big a deal if you like long regular expressions composed of pipes. The only real issue with that choice of naming convention for EL is that it doesn't "look generic" to some, but it obviously is.
That said, the vast majority of WSPs use sensible, direct naming that indicates the servers' status and role as a virtual server, some even using the control panel package (cpanel, plesk) and/or platform (win, linux, mac) in the name. It's just that the more we look, the more we find compromised WSP servers hosting, and spewing, phishing scams. And this makes us wish that it wasn't so easy for someone paying ten bucks a month to fail utterly to prevent it from happening again, and again. It would make our job easier if they all had names like "vps24" or "plesk16" or "7.win.cpanel", but that's not likely to happen soon, if ever.
So, what's the takeaway? If you're starting up a new WSP service, and heaven help you if you are, please try to put as much significance as you can in the names you choose for your servers. Distinguish between mass virtual hosting and dedicated colocation boxes. Please try to indicate whether a server is running as a nameserver, webhosting box, or mail relay, because we've seen many that cram all of these into one, and then give it the least useful name possible in our opinion (one guess: yep, it's just a nameserver - that just happens to also be their outbound relay and a mass virtual hosting box). Make sure that if you do indicate how much per month you're charging, that you also indicate the currency, so we can calculate how likely it is that you have a functioning abuse desk. If you give default names to your PTRs and allow customers to choose custom ones as soon as they've been assigned a new IP, please distinguish between unassigned and assigned default hostnames (we're looking at you, thePlanet). If you give your servers names in the root zone of your domain, use a subdomain for mail or smarthosts or name service, so we can block the "generic" web hosting boxes. Of course, it's better if you put those under their own subdomain, too, but hey.
Of course, if wishes were horses, there'd be flies everywhere, as the old saying goes, so I may as well throw a few more gadfly requests in while I'm warmed up. Ideally, control panel software should disable - via packet level filters - outbound port 25 by default. At best, if that's not possible or desirable, it should route outbound traffic through a rate limiter or rate monitoring system in order to detect compromises quickly, before they get to the children. Non-rate-limited boxes should have their outbound SMTP traffic monitored for spikes, with administrative notification enabled. At the very least, default configurations should use a specified outbound smarthost, so that the reputation of the responsible party (the WSP) is affected by their failure to keep their customers' machines properly secured. If the customer wants to pay to have their own dedicated IPs and servers, and therefore their own reputation, give them their own domain name or at the very least their own delegated DNS subdomain. Don't let them hide in and around the generic default naming, or it will all suffer.
And, for Pete's sake, pick something better than "exotic peppers" for your clever server naming scheme.
June 30, 2009
The Trifecta, or, tweaking your way to glory
We have our own home-grown sendmail antispam filters here, which use a fairly broad brush to score incoming mail, but which have been remarkably effective for us for over six years.
One of the data points we check is of course whether the sending host has a generic PTR, via the enemieslist DNSBL. But we also find it useful to check the TCP fingerprint of the sending host, to see if the box on the other end is running some form of Windows - particularly certain highly vulnerable releases and patchlevels, like Windows XP Service Pack 1. We also check to see whether the message in question is in multipart/alternative format, or "HTML email", because in our experience it's rare to see spam that is in plain text format.
Each of these conditions (HTML, Windows, generic PTR) scores a fairly low spam score, because of course it's perfectly normal for mail to be in HTML format, and there are many Windows boxes running MSExchange and other legitimate Windows-based mail server software. And of course, there are many small businesses with generic addressing on their static netspace. The problem is when we see all three together.
As a default, all of our local accounts here have a spam score threshold of 4, which is sufficient to keep out the vast majority of the inbound spam - especially if the local scoring has been tweaked to give high scores to generic HELOs and low to generic PTRs - and which lets almost all normal mail traffic through. For historical reasons, the scoring is all done in integers, so we don't have the fine-tuning capabilities available in SpamAssassin, for example, where an HTML message might get a 1.7 just for containing HTML and no text part. Here, by default, HTML email scores a point, any Windows system scores a point, and any other issue is usually enough to dump it into the quarantine. A static generic PTR gets 2 points. So, the Trifecta is 4 points, enough to reject on for most accounts.
Pretty much the only time we ever have to whitelist anyone here is when the sender has hit the Trifecta outlined above. HTML-only email, sending from a Windows box, with a generic (almost always static) PTR. What's sad about this isn't that we have to make up for their IT consultants' failure to bother to request a custom PTR, or that some people run MTA software that spits out HTML-only email. No, that's pretty much par for the course in any industry without a need for a full-time IT person or team. Lawyers, galleries, non-profits, small businesses of many kinds are subject to the pressure to conform - and to pay lots of money for Exchange (when they could use free, high-performance Unix-based mail server software). And for the skills needed to install it (poorly), maintain it (poorly) and patch and upgrade it (rarely). OK, enough Unix bigotry. For now.
Some will complain that we shouldn't be blocking (or even scoring discriminately) on known "statics". The problem is that there are a lot more statically assigned IPs out there that have unfiltered access to the rest of the Internet, and are vulnerable to infection by the botnets, than there are legitimate mail servers with generic PTRs.
For example, yesterday we blocked 349 messages sent from static generics out of 8810 total rejected messages, or 4% of our total rejections, with one false positive (the message that spurred on this post). Of those, 117 were from .com or .net hosts, with the rest coming from ccTLDs we rarely have legitimate traffic from, so we can't just accept from static generics with .com or .net TLDs.
To effectively work around the infected statics problem while avoiding the occasional Trifecta-as-FP problem will take some more analysis, or, some more widespread clue among WIndows IT consultants. And we're not going to reduce our overall filter effectiveness by 4% daily just because of a once-a-quarter FP due to a lack of care on the part of someone else. So we need to tweak, and tune, our policies on this end without compromising our perimeter defenses, or adding to my quarantine watch workload.
Our system usually generates what, to our biased minds, are perfectly useful and informative error messages, especially in response to particular problems. The problem with the Trifecta is that we're blocking based on a score, not a specific set of problems, so the error looks like this:
554 5.7.1 HISCORE Contact firstname.lastname@example.org if this is in error, but your message was rejected as spam; it simply failed too many tests. (threshold: 4; score: 4)
There's a token (for our stats), immediately followed by a contact email address that is more or less unfiltered, a rationale, and a score/threshold. The problem is that many Exchange servers either truncate the error message, rendering it less useful, or explain that the remote system did not provide a reason - often including the complete error message beneath! - which most people don't bother to read. So we get phone calls to the effect that our system is blocking their mail. Which it is, and in many cases these are actual false positives. So we whitelist their IP address, and they can send again. (Incidentally, of the 349 messages we rejected, six had a 4/4 threshold/score; one of those was the false positive. Two had a 4/5, two had a 4/6, three had a 4/7. So, one way to deal with this is to raise our default threshold to 5, thereby letting in 7 more spams a day in order to prevent a quarterly FP. This on a system where userbase-wide we see about 3 or 4 spams/day make it through the filters, and maybe a couple 419 scams and phishing scams. So, a difficult choice - how tolerant do we become, and how low do we sink in order to accommodate these arguably at-fault systems?)
What's even more annoying is that once we've whitelisted the sending IP address of one of these poor victims, they'll go home and try to send from Outlook Web Access, which many IT consultants set up on yet another IP address, also with a generic static PTR. So we go through the whole rigamarole again, only this time with their OWA IP address.
The real problem here is two-fold: the failure of IT consultants to have even the most basic understanding of the nature of deliverability and its relationship to the generic PTR question, and the continuing acceptance of such a low standard of compliance with email community norms. (And yes, there's a third factor, namely, my reluctance to raise the default spam score threshold just to accommodate these edge cases.)
So let me close with a plea to any IT consultant tasked with setting up a Windows-based mail system: please, for the love of all that is good and holy, ask your customers' ISPs for custom reverse DNS for any system legitimately sending mail. We'll tolerate your HTML-only email, and your choice of Windows, if you'll do your part and signal to us with a custom PTR that this is a system that is intended to send mail, rather than an infected end-user system or NAT or insecure LAN.
June 26, 2009
Poor choices in automatic / registration-based naming
One of my favorite memories from the past few years of scanning networks' PTRs is the day I ran into some spam from uncg.edu, the University of North Carolina at Greensboro. Seems the naming scheme for their dynamic networks was something along the lines of e.g.:
I saw a few more like this (munged to protect the poor folks), all with hostnames ending in uncgedu, and gave my usual sigh of resignation, but then I saw one that banished all doubt from my mind as to what was really going on:
Yes, that's right, they were using the end user's email address, stripping the dots and the @, and making that the hostname. Now, none of these example IPs resolves to anything, and I'm pretty confident they're no longer doing that. But it really freaked me out that anyone could be so lackadaisical with their end users' personal information, especially their email addresses.
If I could sum up one lesson that I'd like for anyone who reads these pages to take away it would be this:
The act of naming is personal, but with communal aftereffects.
Another way of thinking of this, if you don't mind a foray into religious studies (and if you do, skip to the next paragraph, it's okay), is that naming is perhaps the most sacred act an individual may perform, being an act of participation in a divinity and community of language. But bear in mind that religions, by definition, aren't mystical - the very term "religion" defines a community of belief and a shared language for discussing those beliefs - so naming participates in both a very personal act of recognizing and externalizing your perceptions, and a very social act because the name becomes available for others to use, interpret, and acknowledge.
While it may be perfectly fine for me to name my child "Beeblebrox", because that's what I always think of him as, the name will be used by other people, too. That's one reason why despite the fact that a parent may always have a nickname for a child, they still give the child a respectable name (unless the parent in question is Frank Zappa, of course). That's just a metaphor for CNAME and PTR, BTW. And may well have nothing to do with what's actually in /etc/hostname, either - the PTR is for external recognition, CNAME for alternate use. Oh, and you're not Frank Zappa. He could get away with it. You can't.
Names, once given, convey information, and do so beyond the local context. Just because I call a server "skynyrd" because (personal reason) it had one incident where it crashed unexpectedly doesn't mean I shouldn't name it so that (public, community-oriented reason) it may be recognized for what it is beyond my network, in this case a database and Web server. In the case of an IP dynamically assigned to residential cable users via DHCP, don't assume that just because your whois SWIP for that netblock has a memo or note to that effect, that the PTRs don't need to - why not keep the name itself as the locus of such information? You're more likely to change the PTRs than you are to remember to change the note in a whois record, once those IPs are reallocated as statically assigned commercial DSL. In other words:
Maximize the information associated with a name, and keep it closest to the individual unit to which it is associated.
Think of the concept of identification. In its most basic of definitions, identification is where one thing is the same as another - in this case, a name refers to an object, so the name may be used in place of the object - they are the same thing for the purposes of the particular context. In slightly more scary contexts, you might be traveling and have someone demand to see your identification, in other words, the papers that certify your name and enable strangers to confirm that the name (and perhaps picture, fingerprints, and other biometrics) matches the body.
When naming servers (or dialup ranges, or NAT pools, etc.) think of the context in which the names will be evaluated - it will be a stranger, probably not thrilled to be evaluating your host(s) at all, probably considering them somewhat of a threat, and so forth. Worst case, it will be a stranger's leave-behind rules for evaluating the same, and there will be all the personality of an automatic teller machine involved. When naming, put your best, most formal foot forward, and don't crack jokes in line at the bomb screening.
Names are detachable containers of information; don't assume the local context and assumptions and codes will survive translation to a new context.
Finally, names differentiate one object from another. If I call one name server itchy and another scratchy, I can tell the difference between the two. If I name every last one of my end user PTRs tm.net.my, or beamcablesystem.in, the names fail the differentiation test. Put another way, there is an inherent distrust in empty, generic labels - think of "Agent Smith" from the Matrix movies - he was not real, he was merely an expression or avatar of the Matrix, and could appear as one or many, and the more he was, the more threatening. Generic name, multiple copies translates as a threat, or at least diminishes trust.
Names should be unique and informative, not generic, if their referents are to provide important services you want strangers to trust.
Okay, that's enough for now, time to go get some hot dogs and a little beach time.
June 25, 2009
A couple more: kudos, and mixed kudos/gripe
Another ISP with the right idea: floodcity.net. Clearly delineated both assignment type (in the right MST position) and technology:
One which has sort of the right idea: musfiber.com
Unfortunately, they've also done this to delineate their pools:
In the original RFC that defines the limitations on DNS labels, RFC 1035, names are limited to alphanumeric ASCII characters and a hyphen:
The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.
The 63-character length limit has been eased, and some movement towards i18n has been made, with the introduction of punycode, RFC 3492, but even punycode limits itself to alphanumeric ASCII and the hyphen while encoding the other characters. So, as with underscores, octothorpes (aka "hash marks", AKA "sharps", AKA "pound sign", etc.) are non grata in DNS labels.
And yet, they show up occasionally, as in one of my favorite abuses of the DNS ever (blissfully now repurposed as a Windstream dynamic IP with sensible PTR naming):
On the bright side, they did give you a phone number to call. To report network abuse. Excelsior!
Today's DNS Spotlight: Eircom
Ever wonder exactly where some infected computer is, when you find it is hitting your inbounds with repetitive requests to send mail to nonexistent accounts, as in where in the world, down to ICBM coordinates? Yeah, me, too. Unfortunately, despite the best efforts of groups like the the Prefix Whois Project, who provide eerily precise longitude and latitude for any given IP, geolocation is still an infant science. Usually, the best that can be done is to provide the ICBM coordinates of the company providing the service, which while satisfying to an owner of fantasy desktop nuclear weapons, isn't quite as satisfying as taking out the actual infected computer while leaving all else around it standing. Oh, well. Maybe someday.
On the other hand, some networks do make an effort to name their systems so that they could be found by, say, firefighters. Take, for example, this wifi node in a McDonald's in Dublin, Ireland:
Now, I've never been to Lower O'Connell Street, nor Dublin, nor Ireland for that matter (though my sainted Grandmother Betty was swept away as a war bride during the second World War, from her ancient homeland in Fintona, County Tyrone). And the pwhois coordinates for this IP are a good seven minutes drive on the south side of the Liffey from Lower O'Connell Street. But still. One imagines a well-launched predator trained on the wifi node, or maybe the more imaginative can picture a Terminator preparing to tap a customer replete with laptop and large fries, and a more satisfying ending (perhaps involving a Taser).
What's sad, as anyone who's been reading will instantly recognize, is that the "dynamic" keyword is a far cry from the Most Significant Token, and is not itself actually tokenizable by the weak, dot-delimited (and limited) MTAs we're blessed with, so in order to recognize this is a dynamically assigned IP you must use a regular expression. You may be able to use a substring based on "wlan-ce1", but still there's this business of a complete street address and business name to contend with. Do we need to keep track of the wifi node at Govinda's, the vegetarian place around the corner, too?
I don't know if Govinda's is an Eircom customer, but the Citywest Hotel is:
Now, we could, I suppose, just use "cust.eircom.net", but that is less than satifying (and they also use "customer.eircom.net", or used to). And it highlights another problem - just because an ISP or telco assigns an IP statically to a customer doesn't mean that customer can't then go on to re-allocate it dynamically to Big Mac eating, laptop-wielding, customers of theirs.
(A brief scan of the PTRs in 126.96.36.199/24 informs us of the depressing reality: there are several McDonald's on O'Connell Street alone... In fact, the lower /25 of that block seems dedicated to the McDonald's restaurants of Ireland.) On the other hand, they all seem to have wireless, so, make of that what you will.
So where does that leave us? We know the IP is dynamic; we can use a regular expression to capture many similar instances, provided that Eircom sticks with its naming convention for building out wireless LANs; we can even get within a few miles of the actual location if we wanted to launch an imaginary missile. But it would be so much easier if Eircom just used "dyn.eircom.net", or even "dyn-wlan.cust.eircom.net"; they already know it's going to be handed out dynamically via wifi. Why not just say it and group all similar IPs under the same top-level token?
June 24, 2009
Today's DNS Superstars: Entel Chile
Take a look at the PTRs in 188.8.131.52/24 for a stellar example of why there should be a basic intelligence requirement for anyone touching the IN-ADDR.ARPA zones for any given ISP.
I know, .local is actually a valid DNS suffix, in the context of zero-configuration networking discovery protocols, such as Bonjour or Microsoft Windows Active Directory, if the host is never to be visible from the public Internet. The presence of a few dozen of these very hosts in the CBL zones, however, suggests that something is wrong here, and they are actually capable of reaching the Internet directly. Nice work, guys.
I really should work up a badge for these sorts of things, maybe a bear trap containing a keyboard with hands caught in it.
June 22, 2009
Oddities and observations
When you've looked at as many PTR names as I have over the past few years, you notice some things; trends and weirdnesses, likelihoods and rarities. Here are a few of the things I've noticed, with commentary where appropriate.
- Bulgarian companies tend to use "BG" as part of domain names they register in non-.bg TLDs. Maybe their ccTLD registrar is difficult to use?
- Finns are apparently amused by the use of Roman numerals in PTRs
- Brazilian network engineers are overwhelmingly drawn to the N-N-N-N.$domain PTR naming format, or, alternately, host-N-N-N-N.$domain.
- Swedish housing associations like to register domains in .ac (Ascension Island) for some reason. Also, there seem to be a lot of Swedish housing associations with their own Web sites. (Oddly, German firms don't register domains in .ag, Danes don't register domains in .as, and Swedes haven't pushed for the creation of .ab)
- A rather high proportion of network service providers stick their webmail access page on the host named with their domain, so www.foo.net is their main Web site, but foo.net is their webmail.
- The vast majority of Cooperative Telephone companies are in the Western US and Canada and, for some reason, Argentina.
- Eastern European ISPs are very, very proud of their optical networks.
- Polish domains, and their Web sites, are extremely likely to be focused on a single city or region, and to express a strong regional identity.
- Also, Polish domain registration data is likely to contain both the organization name and the names of the primary contacts, but in the Organization field; if you do not speak or read Polish, you might be led to believe that half of the companies in Poland end in what looks like someone's name.
- Canada, as you'd expect, has the most organized and informative whois output of any nation in the world, often with complete descriptions of the range of activities a company is involved in, or the types of degrees granted by a university. O, Canada!
- There is a staggeringly large number of nations for whom there is no whois service whatever; and an embarrassingly large number of ccTLDs who only offer whois service via a Web site. Many of the latter are behind CAPTCHAs of dubious quality (like Poland's, which apparently randomly decides whether to require case sensitivity).
- I don't know how anyone can read Arabic at the size it is usually presented at on Web sites. (Maybe it's just my only Arabic fonts are really small, I don't know).
- Underscore is not, and has never been, legal in PTR records, and yet 007mundo.com, 11abg.net, 3s.pl, 8x.com.br, acsplus.com, adnc.com, am.poznan.pl, amnetsal.com, andrzejow.net, astral.ro, asu.edu, atenco.net, bardstown.com, bell.net.mt, bordercomm.com, btc-bci.com, btc-net.bg, cc.la.us, cebridge.net, certelnet.com.br, net-uno.net, clnet.cz, cogentco.com, colomsat.net.co, compnet.com.pl, compower.pl, ctbcnetsuper.com.br, ctcis.net, cvalley.net, datacomsa.pl, datastream.com.mt, detecsa.net, digicelbroadband.com, directservices.com, doggy.com.au, dom_bci_prod.cl, e-awa.net, e-kolay.net, easynet.net.au, edb.com, emcali.net.co, emtelco.com, ert.com.co, etb.net.co, europronet.ba, foremosttech.net, fretel.com, fsu.edu, fullcom.cl, g-node.com.au, g-service.ru, gemytel.net, gilat.com.pe, gotonets.com, grm.net, gvt-uol.com.br, harvard.edu, homei.net.ua, koc.net, hpmg.com, hyperband.com, idial.com.br, impsat.net.ve, in-addr.arpa, infosys.ru, infotelecom.es, inode.at, inter-c.pl, inter.net.il, interdom.net.pl, intnet.bj, ip-network.net, iptelsl.net, irishbroadband.ie, iusacell.net, ivnet.ru, k12.ne.us, k12.pa.us, kevenet.hu, kh.edu.tw, knu.ac.kr, krosoft.pl, ktkadan.cz, kubangsm.ru, kutztown.edu, lakesnet.com, lauderdalemarinecenter.com, linktel.net, marpin.dm, melitaonline.net, misc, n4a.pl, nbis.net, ndemand.com, neobee.net, netasia.net, neterra.net, netvox.ru, newskies.net, nsysu.edu.tw, omni-tech.net, onvol.net, optus.net.au, orbisnet.cz, phonewave.net, plugon.com.br, marpin.dm, port33.ru, porttelecom.ru, preciscom.com, pro-lan.net, psv.ru, revealsystems.net, ridgewoodcable.com, rsadvnet.it, sarenet.es, satnet.net, seltecnet.ca, serdi.ru, simonet.com.br, sinor.ru, sintmaarten.net, sky.com.mt, skypipeline.com, slcomm.net, smartbro.net, smrw.lodz.pl, solbroadband.com, solunet.com.ar, spartan-net.net, stv.ru, suonline.net, superweb.com.ua, surnet.cl, susice-net.cz, suttk.ru, t-mobile.sk, tauzen.net, teklinks.com, telekom.ru, teleline.hu, telenet.ge, telkom.net.id, telusmobility.com, tktelekom.pl, tmxnet.pl, tnn.net, ttn.ne.jp, tv-naruto.ne.jp, twt.it, uni-duesseldorf.de, uni-lj.si, unlp.edu.ar, uraic.ru, urbe.edu, urjc.es, vsnl.net.in, wcg.net, websurfer.com.np, xcelco.on.ca, xcnetworks.com, zapp.ch, and zappway.ch all use them.
- An absolutely astonishing number of domains have zones so broken that there is no dot (but often an underscore or dash) between the supposed "host" part and the presumed "domain" part.
June 15, 2009
Dead chicken magic and the DNS
Figuring out whether a given hostname naming convention reflects dynamic or static assignment can be considered roughly akin to what an old programmer colleague of mine calls "dead chicken magic". At times, it's quite easy, because of a consistently applied and transparent use of certain tokens (such as "dyn" or "dialup" or "static" or "cable"); at others, you might as well shine up the crystal ball. For those cases where the naming convention isn't transparent, we're fortunate in having several useful tools at our disposal.
Perhaps the most useful of all - when it's available - is rwhois. Given an IP or set of IPs, it is theoretically possible to find out what kind of netblock you're dealing with. Many admins put notes into the registry indicating pools, static commercial allocations, and what sort of service uses the IPs in question (DSL, dialup, wireless, etc.) Most do not, however, and so if there are no clues whatsoever in the whois output, you're left with a hunt on your hands.
ISP and telco Web sites are often a good source for such information as well, especially smaller local ISPs and telcos, who tend to focus on a smaller subset of technologies (such as cable TV over fiber optic, or wireless, or metro ethernet). Price lists are another great place to find out whether a given provider offers static IP addresses, because they often charge the customer more for them. This can be somewhat less reliable (for us, anyway) when the Web site is in a language we don't speak, but Google Translate does a pretty good job in most cases (still waiting for Azerbaijan to English, FWIW). The big problem is when a given ISP offers many different kinds of services, but doesn't indicate which are which in their naming, or indicates the speeds available on their price lists without indicating the underlying technology.
Wikipedia is a surprisingly good source for information that companies often fail to provide on their own Web sites - such as the services and technologies they offer, the corporate relationships between branded services, and so forth.
One thing we've noticed is that countries that are part of the former Soviet bloc, or those that were once referred to as part of the "developing world", are usually deploying fiber and metro ethernet, whereas countries with older telephony infrastructure deploy dialup and DSL. And oddly enough, there is a strong correlation between a given region and the odds that they will dole out static IPs, or in some cases, put all of their customers on private addresses and route all traffic out through a subset of NATs with public IPs.
Another thing worth noting is that the widespread use of certain tokens to identify dynamics can be a source of massive amounts of false positives. For example, those of us who grew up with dialup might be forgiven for thinking that PPP is a strong indicator of a dialup modem. Unfortunately for us, while that may have been true ten years ago, modern technologies like fiber optics and metro ethernet often use PPP (PPPoE in the latter case) to assign static IPs. This is common in Eastern Europe, for example. The same thing can be said for DHCP (Dynamic Host Configuration Protocol), which explicitly implies "dynamic" but which may be used to dole out static IPs through the use of scopes and permanent leases. The classifier of naming conventions ignores this at their peril.
Given a large enough data set, dynamicity can be assumed across a set of hosts if you can observe different hosts acting in similar ways over time but never concurrently. So, a bot on one PC that is a client of a given ISP might be smart enough to probe DNSBLs in order to discover whether a given dynamic IP is listed, and drop that lease to fetch a different IP. Enemieslist's regular expression approach is most useful in this instance, as once we have a pattern for an ISP's naming convention, it eliminates the issues with the delay between when a new, reputation-clean IP is assigned to a bot and when DNSBLs list it as a spam source. It also avoids the issues with staleness - where an IP is listed by a DNSBL but has since been swapped for another by a smart bot. In any event, it's difficult to obtain such large data sets with sufficient transparency into the traffic itself, so it's a last resort.
Once all the methods above have been tried, the real dead chicken magic comes in - and we have to rely on local knowledge and custom. Fortunately, we've been doing this for so long that we've amassed a sufficiently broad data set that we can often make defensible assumptions on the basis of prior knowledge. The best example of this is knowing an insider who can help, of course. Such as the former admin and blacklist operator who filled me in on which ISPs in the Netherlands didn't offer static IPs (turns out it's quite rare). Or the friend who clarified the often confusing abbreviations used by various Japanese telcos, so we could tell dynamic from static and fiber from DSL.
In any case, it can be a lot of fun. And at other times, it can be a mind-numbingly annoying and Sisyphean task. Either way, it would be a lot easier, and a lot more fun, if more admins (and more RIRs) put more information into their naming, their whois, and their Web sites.
June 14, 2009
Basic principles of DNS and their discontents, continued
OK, I found some time today away from visiting friends and relatives to track down a few more bad examples before I get to the good examples.
I should be able to block (or score as slightly suspicious due to generic, provider-assigned PTR naming) mail from all of these hosts with a single "static.tcisl.net.in" entry in my access.db. Instead, I have to use a pattern. And I have to make it even more generic than I ordinarily would because heaven forfend they add service to Hyderabad.
Anyway, enough bashing of the folks whose naming choices don't reflect my druthers; let's talk a bit about some examples that do. One ISP that does use naming that enables remote systems to make smart policy decisions is vsi.ru.
For their dynamic dialups, vsi.ru uses the following naming:
This has the obvious benefit of informing the remote admin that this is a dialup. While it is not that uncommon, especially in some "developing" nations, for dialups to be statically assigned, typically such connections are not used for incoming mail service. The naming they use for their dynamic DSL is as follows:
In addition, they explicitly call out residential users:
This allows us to distinguish between residential and dynamic users on the one hand, and commercial users with fixed / static IPs on the other:
Granted, there is a somewhat wide variety of namings applied to what we would consider the same subset of user IPs, but all in all their approach meets all of our basic suggestions:
- dynamics and statics are explicitly named
- the tokens used are in the Most Significant Token spot
- they distinguish between different technologies (dialup, DSL)
- they distinguish between residential and commercial users
- they even combine assignment type and technology into single tokens
- they avoid the "local knowledge" requirement; no marketing bullshytte tokens are involved at all
Anyone who is responsible for naming their client and end user nodes should take their excellent example to heart.
Tomorrow, we'll dig into some of the peculiarities of regions and types of services; it's amazing how certain areas deviate from practices common to other regions.
June 11, 2009
Basic principles of DNS and their discontents, continued
In addition to the humorous, and sometimes blatantly idiotic, practices discussed in our last post, we now turn our attention to the core set of end user node naming practices (for name-based filtering of abusive traffic, anyway). Perhaps first and foremost of these is what I like to call the Most Significant Token principle. Similar in concept to the idea of a most significant bit in programming, this is an approach to naming that expects the token to the immediate left of the domain to be where we find the most significance (for our purposes).
Before I get into the obvious "Right Way" to do it, I'd like to take a minute to highlight some of the "Very Wrong Ways" to do it, using real-life examples.
This host name makes two common "mistakes" - the dynamic token is at the far left, and the dotted IP is host-octet-right (rather than net-octet, which would cover a wider set of hosts as a right-anchored substring).
This host name comes closer, but because the dynamic token isn't dot-separated, sendmail and other MTAs cannot do a substring match on it using access.db alone; regular expressions are required.
Not necessarily bad on its own - the static token is the most significant token - this is here to illustrate that simplistic pattern matching on things like /^cpe/ is a great way to make dangerous assumptions about what constitutes a "dynamic" hostname. (It is still obviously generic, but not dynamic in the way that concerns us.)
Here we have a static ADSL line (it's SWIP'd to an insurance company). Even though much ADSL in the US is dynamic/residential, and even though pacbell.net uses 'ded' for some of its dedicated/static lines, the most significant token here is the geographic locale, of which there are literally dozens. Many of the former Baby Bells' naming conventions are similar.
Here's another example from a former RBOC. The minor mistake made here is that the locale token is considered more important than the 'sip', signifying "static". So in order to use them without regular expressions, you have to collect a few dozen substrings.
One of the sad things is when an ISP acquires, and then loses, "clue". One example of this is below (as of late 2003):
The admin in charge of naming was eager to be in compliance with the Most Significant Token principle, so all of their dynamics could be filtered using the single substring "dyn.personainc.net". In later years, however, new allocations looked like this:
No indication of dynamic/static, though they do make the "cable" aspect clear. And as their rwhois server is non-functional (for me, anyway) there's no way to tell whether this is a corporate customer hosting a mail server, or a bot on a residential leaf node. Subsequent naming uses "$HEXADECIMAL.cpe.persona.ca", again with no indication of staticity.
Next week, we'll discuss some good examples. Unless I come across some even more egregious bad examples and can't help myself, that is.
Basic principles of DNS and their discontents
Though there are certain edge cases and complex situations in which DNS name assignment requires some skill and forethought, for those hosts we are most concerned about here - mostly dynamic end user nodes, or "leaf nodes" - the problem is really quite simple. Once you've made the decision to actually provide reverse DNS (PTR) records for your leaf nodes, you have to decide how to dole out responsibility for zones, what information to provide as part of the name(s), who your "audience" is for that quasi-encoded information, how security affects those decisions, and what operational concerns come into play.
As with everything else in human endeavor, the distribution of that rare element known euphemistically as "clue" is uneven at best. Worse, even those blessed with an abundance of "clue" may not be similarly blessed with a broad scope and understanding of the effects their decisions may have on contexts completely out of their control. What may make perfect sense to a NOC netadmin in one organization may be impenetrable gibberish to another with a different skillset or application or context. And what may make an overworked and undervalued mail server administrator happy may well be considered as a breach of corporate security by someone else. And some people, well, let's just say their sense of humor or what is considered clever is pretty much guaranteed to annoy someone else.
Let's use a few examples. My all-time favorite PTR naming convention, by far, has to be this gem from the folks at wobline.de:
They do a similar thing for generic dynamic pools. What I love about this is that is so clearly illustrates the abdication of responsibility on the part of the administrator for any and all abuse coming from their network. Responsibility for abuse they could control, through port blocking, walled gardens, and various other tricks, has been shunted to the recipient - who is expected to ignore it, because after all, the IP has been dynamically allocated and everyone knows those are IPs used by the end user, the customer, and we all know they're the Untouchables of the Internet.
Oh, and instead of simply using dynamic.wobline.de, which one can stick into a sendmail access.db or postfix check_ptr_access hash table, they stuck the last two octets of the IP on the right hand side of the hostname, requiring regular expressions in order to block them wholesale. Thanks, guys!
Another fine example of the use of the DNS for protest, this time of a different sort, comes from alameda.net:
Yes, for some reason, obviously not that admin's fault, ARIN hadn't updated their records, or someone hadn't properly managed the in-addr.arpa allocation, for several years. Obviously, someone at alameda.net received complaints about IP space they no longer controlled, and wanted to make a statement to that effect. Fortunately, it seems this has finally been fixed. The sample IP I had with that PTR no longer resolves, and whois says currently belongs to Level3.
But more important than protest and humor in the DNS comes an understanding of the basic concepts. A DNS label should point to the resource or resources it identifies, and vice versa, so that an IP can resolve to a PTR, the corresponding A for which hopefully resolves to the IP in question. This is for use in knowing which host corresponds to which IP, for use in many contexts, though our concern here is email abuse. When an ISP chooses to allow very stupid things to happen via their DNS zones, it reflects on their sense of responsibility for the traffic that is emitted from nodes on their networks.
For example, Beam Cable Systems, an Indian firm in Hyderabad, has chosen to assign a single PTR to all, or most, of their end user nodes:
When you look up the IP for that host, you expect to get a big bucket of cold fail. To your surprise, you actually get the IP 184.108.40.206. The problem with this is that when every other IP in your netblocks also has the PTR beamcablesystem.in, that means that only one IP will not be considered forged by mail systems around the world. Now, one could argue that by deliberately providing recipients of abuse from such systems such clear evidence of the shadiness and unreliability of the sending IPs, you're doing them all a favor. But I think that's the wrong way to read this.
As of mid-May, there were some two thousand IPs listed by the CBL with the PTR beamcablesystem.in. This is admittedly less than four percent of the hosts in their /16 considered actively infected over a period of a few days, or however long the CBL keeps IP data before expiring it. And if they are dynamically assigned (though they also provide services to businesses, with static IPs, it's not clear whether they also provide custom PTRs for those static IPs) it's possible that some of them are being rapidly cycled through by smart bots, so the actual number of hosts infected is impossible to say with certainty.
The problem comes in when someone with an interest in blocking traffic from such infected hosts at the edge of their network, or early in their spam filtering solution, wants to block them efficiently, using their name rather than a possibly stale list of IP addresses in a DNSBL somewhere. By refusing to provide a differentiator between any of their customers, they effectively force us to deny service to all of their customers or accept abuse from the subset of their customers who are infected. Neither is an acceptable solution.
tm.net.my did the exact same thing for a long time, though I'm pleased to say that they seem to have come to their senses. Hopefully, Beam Cable System will as well.
June 10, 2009
Another PTR mystery - this time in Portugal
For some reason, which I have been unable to explicate - even with the assistance of the Portuguese-speaking wife of a colleague - there is a tendency for schools in Portugal to have hostnames such as these:
Just to grab some examples that showed up in a recent CBL list.txt...
The general idea seems to be that every school in Portugal should have Internet access, which is laudable. It is apparent that these schools are connected via a statically assigned ADSL link. And as they're being listed in the CBL, it's also apparent that they have issues with infected hosts on their LANs spewing spam and abusive traffic out of their networks. We hope they can address the issues involved and stay out of the blocklists, but that's not why I call attention to them now.
Enemieslist is an attempt to classify Internet hosts by their PTRs. We've found it very useful to do so by way of regular expressions. But the hosts named above don't readily lend themselves to this sort of thing, because every hostname is what we'd call a "singleton", so a pattern for each is overkill. A pattern for the whole lot of them might not be a good idea, either, because while it's probable that they are all what we would classify as "static/adsl", some, or even all of them might well be NATs ("natproxy/unknown") or VPNs ("natproxy/vpn"). As the classification applied to a set of hosts is important because each is used to calculate the risk of accepting mail from any member of the set, I am reluctant to just call them static ADSL nodes, and as NATs are a much more likely source of spew from bots inside the networks than the average static ADSL node, this presents a problem.
Such are the mysteries that occupy my days.
For now, I will continue to add individual "patterns" for each of them, until I can confirm that they're all NATs or perhaps something entirely different. But if the administrator(s) in charge of the hosts in question had reflected on my needs and concerns (heh) they might have indicated more precisely whether these were to be used as NATs. I am reasonably sure, from checking the occasional MX record, that they don't handle their own inbound mail, which is routed through the Portuguese National Computing center. But beyond that, the mystery remains.
June 9, 2009
A few thoughts on reverse DNS / PTR naming
Over the years, I've collected nearly forty thousand PTR naming conventions, creating regular expressions from them for the sake of compactness and power. I've classified them in terms of whatever best guess I can make as to their assignment type and the technology that connects those hosts to the Internet. This started as an effective antispam tactic and response to the rise of botnets, and has grown into more than that the longer I keep at it. It's been a rewarding and instructive trip, and there's always a new problem to solve, whether it's figuring out the meaning behind some new and cryptic naming, or trying to read ISP and telco Web sites in translation, or guessing how people say "mail" in a hundred languages.
However, I will always have a special circle set aside in my own personal Hell for the following list of domains and regions (in no particular order).
- assigns static IPs in 'lnk.telstra.net', any name goes
- ditto, only without a subdomain
- incapable of deciding on a single or small set of conventions, nearing a hundred at the time of this writing
- like tenet.odessa.ua, telstra
- assign statics with "dynamic-ip" tokens, dynamics with "static-ip" tokens
- cheerfully mixes geographical, technical, IP-related tokens for statics and dynamics, all under rdsnet.ro
- like tenet.odessa.ua, no subdomains at all
- ditto, for the most part
- much of Poland
- a mix of top-level, geographical, municipal and various other naming (and their whois? fugeddaboudit)
- nearly all of Brazil
- it seems obvious that one or two technicians set up the DNS infrastructure for the entire country: most of it is n-n-n-n.domain.com.br
- assigned a single PTR ('tm.net.my') to their entire end user network
- Viet Nam
- has a strange predeliction for naming their hosts "localhost" or "adsl-xxx"
- apparently, subdomains are for losers
I'm sure I'll add more to this list; these are just a few that came to mind.
Now, granted, I'm going to be coming at these names and practices with a particular purpose in mind - namely, trying to classify them by their names and assess the risk of accepting certain types of traffic from them - so it may seem unreasonable to some why I'd find certain practices distasteful. It's only recently that there have even been Internet Drafts to suggest best current practices for naming pools and customer blocks, so I shouldn't be so surprised to find such a gulf between what I consider "useful" naming and what I consider evidence of incompetence or insanity at worst. (I mean, come on, who names a public IP "localhost"...?)
I can appreciate the special madness that systems and network administrators have for the act of naming - whether it is the sort that names servers after characters from comic strips like Bloom County; or the sort that incorporates interface names, rack locations, IP addresses in hexadecimal, into their conventions; or even the sort that uses Roman Numerals (as do three Finnish ISPs and one German) or Latin numbers (one German webhost). I learned about the act while watching my first real netadmin name new Sun hardware after seas (which got confusing when Sun launched Java[tm] because java the server didn't have anything to do with it). I named my Web hosting servers after hot peppers for a long time (and currently have a "tabasco" and a "jalapeno", having retired "habanero" and "serrano"). I took on the habit of naming my laptops after my then-current favorite musicians, so I'm writing this on a several-year-old "tupelo" and had "fugazi", "waits", "radiohead" (which I repurposed for a wireless router) and so forth. I've had several boxes named after bourbons. I understand.
You want to be able to give some personality to cold and impersonal hardware, hardware you will be working around and on and under all the time. You want to be able to distinguish the server that always has a disk crash on holidays from the one whose ethernet card freaks out and throws the box into a kernel debugger.
Sometimes, you just want to express disgust or admiration or disrespect. I once had a home-built Pentium 133 that ran Windows NT that I called "stepchild" as a joke (we had to reboot it at least once a week, and all it was doing was running proxy software for our internal LAN). My brother the Luddite, when roped into hacking perl for us one year while he was between overseas teaching jobs, named his laptop "powerloom" as a constant reminder that he may as well be manacled to it like a child during the early Industrial Revolution.
But with spam levels hovering at around 9 in every 10 messages, million-host botnets capable of pretty much anything their "owners" (or renters, of course) decide they want to do with them, desperate administrators everywhere are using every trick they can in order to allow for more accurate, and rapid, discrimination as close to the edge of their networks as possible. And if you look at the numbers, you'll see that the boxes most likely to get infected by a bot are the very boxes whose actual owners wouldn't know a hostname from a horseshoe.
So while the urge to be cute, or clever, remains, when coming up with your PTR records, a little seriousness is called for here. Don't be like the Ukrainian DSL provider who chose two random words for each hostname in their customers' dynamic pools. Don't be like the Australian, Dutch, Romanian, Russian, Polish, or Bulgarian ISPs and telcos who think that every customer should have their own custom name as an RR in their top-level domain. Don't be like the Indian ISP who seems to create variations on their naming conventions like the special Indian genius created gods for every village. Don't be like the imagined Brazilian network admin who decided that 90% of all the hosts in 200/8 should be a generic name based on the IP, with no assignment type or technology tokens at all.
So what should you do? In the opinion of someone who has spent the better part of the last six years tracking down and classifying networks' PTR naming conventions, this is what you should do. Feedback, argument, speculation, clue, etc. all welcome - this is an evolving document, and I don't expect to get everything right at first.
At the very least, unallocated/unrouted IPs should be named as such - so it's easy to tell when they've been hijacked. And, for the love of all that is good and holy, name them something else when you finally do allocate them.
Dynamically assigned IPs should say that they are dynamic - and that should be the very first token to the left of the domain. Those who use lists of substrings to block spam from dynamics don't want to have to collect one for every town in every state your mega-telco or ISP holding company happens to be capable of serving, if you decide that 'dynamic.raleigh.nc' makes more sense than 'raleigh.nc.dynamic' (think about it). And if possible, please distinguish between low-bandwidth (dialup, ISDN, frame relay, wifi) and high-bandwidth (DSL, fiber - whether catv or ftth/p, metro ethernet, wimax, etc.). Why? Because a bot on a multiple megabit fiber link can spew a lot more spam, or DDoS packets, or ssh scans, or dictionary attacks on ftp servers, than can one on a dialup - and the owner is less likely to notice, so they're likely to remain infected longer.
Statically assigned IPs should say they are static, or be custom and associated with the customer's domain, not the provider's. Particularly for mail servers, NAT/PAT boxes, and the like.
Honestly, the problem is so bad that many are simply using "generic" as a basis for rejecting mail from unknown hosts. And it works. If you're running a mail server on a host with generic PTR, good luck getting it delivered at all two years from now. If your mail server PTR domain is your ISP, you really need to look into getting a custom PTR yesterday.
Web hosting and colo providers should already be forcing low-end customers' mail through their own carefully monitored smarthosts, to reduce the amount of spam and other abuse coming from oft-compromised hosting control panel platforms such as cPanel. I used to be disgusted by the folks who named their webhosting PTRs things like "hosted.by.example.net" or "2gbamonth.for.just.7.95.example.net", but now I love them. They're just the most perfect indicators of super cheap mass virtual Web hosting, from whom I almost never want any mail unless sent through a smarthost.
University residential networks, or 'resnets' should contain the token resnet (or dorm, popular in Europe and Asia). Yes, I know, it's nice to be able to name a node after the building it is in, but without a lot of work and research it is difficult to tell whether a given building is the school of economics or a co-ed dorm. Don't count on local knowledge being global. And tokens like 'housing' or 'student' are ambiguous, they might refer to "the housing department" or "mail servers for students", not to student residential networks.
Cable TV and VOIP and triple-play providers - if you provide both residential service (which we assume by default for Internet access over the same fiber as cable television) and commercial (which is becoming more popular with the rise of VOIP) - please, say so in the names. Road Runner has "res.rr.com" and "biz.rr.com" (though they named a mail server mail.biz.rr.com, much to my initial consternation). Charter and Comcast do not (though the latter has 'comcastbusiness.net' for business customers). It matters.
Telcos providing DSL; please indicate whether your ADSL is dynamic (as it usually is in the US but not in the Netherlands, for example) and whether it is residential or commercial. Do not rely on us to know that "maxpro" is commercial and "fastweb" is residential; we do not care to know the branding exercises of every ISP/telco in the world. And if we're counting on ADSL to be residential, tell us it's ADSL, not just generic DSL or "broadband".
Corporate network admins? Distinguish between dedicated NATs and PATs and those that have mail servers directly behind them; if you haven't secured your NAT against unauthorized outbound port 25 traffic, it's not up to us to determine whether refusing all mail coming from your NAT will also drop legitimate mail on the floor of the server room. The odds are very, very good that mail from a NAT is bot-originated these days. Don't make us think too hard here.
If your employees and locations are lucky enough to have their own LAN-side public IPs, please monitor your gateways for sudden upticks in outbound mail from end user LAN nodes if you can't simply block that traffic altogether, or re-route it through your mail servers.
It's appalling how many ISPs and telcos and cable TV providers don't actually ever say what their "super fast Internet speeds" are going to traverse, technology-wise. I'd say that on half of the Web sites I visit, in vain attempts to classify a new naming convention or add some detail to older patterns, it is simply impossible to know whether they sell DSL, fiber, ethernet, and so on. Oddly, wireless providers are quite clear about the fact that they're selling wireless, whether broadband or not. And the proud new owners of Eastern European networks will usually brag to the visitor about how fast their all-fiber net has grown, and how fast it will continue to grow.
Finally, name your mail servers "mail" or "smtp" or "mx" or something that indicates that they're legitimate sources of mail. (Granted, some i18n consideration must come into account here, but the words for mail aren't that different in the Romance languages, or in the Slavic; we can stand to learn "correo" and "poczta", just don't make us learn ten thousand languages' form of "mail").
Feedback is welcome: email@example.com. If there's any real interest in a public debate I'll open up comments here.
September 19, 2008
I've been in touch with Greg Aaron from Afilias regarding my recent gripe complaining about the poor quality of address vetting at various registrars, with Afilias singled out for particular abuse. He's asked me to post the following, which outlines what Afilias is actually doing to fight domain abuse.
Allow me to tell you some of the things we've been doing at Afilias. Far from doing nothing as you stated, we actually have one of the most innovative registry anti-abuse programs around.
This year so far, Afilias has been responsible for proactively getting more than 80,000 .info domains suspended for spam and related WHOIS accuracy violations. And it's working. For example, the number of .info domains on the Outblaze blacklist (ob.surbl.org) has dropped 85% over the last few months due to those suspensions, and .info's ranking on URIBL has dropped also. Interestingly, the domain you used as an example (IDOWEDDINGBOOK.INFO) is not on the URIBL or SURBL spam blacklists. If they missed it, I don't feel so bad.
In addition to the spam domains, we've actively rooted out and dealt with significant numbers of phishing, malware distribution, and fast-flux domains. Spam and these abuses are often related, and so we've been making contributions to the community to help deal with them. Examples include my work on the Anti-Phishing Working Group (such as this important white paper), work on ICANN's Fast-Flux Working Group, and our CTO's work on ICANN's Security and Stability Committee. Also FYI, check out the new .info anti-abuse policy that Afilias sent through the ICANN process this summer. We thought it important to make our position on domain name abuses clear for all.
Afilias' general approach is to work with the registrars cooperatively on domain take-downs, for several reasons:
- The registrants are the registrars' customers, and they have the direct relationship. The registrars have responsibility for the registrants, and have contractual terms of service that are suited to dealing with abuse.
- Once they are informed of a potential problem, the registrars usually have more data to go on than the registry does. The registrars are taking the credit card information, they can see what domains their registrants have registered in other TLDs, and they know what domains belong to which of their resellers. Those are important clues to fraud and abuse, and registries such as us have no visibility into those things.
- Our anti-abuse reporting to the registrars almost always leads to positive actions. Most registrars want to know about problem registrants, and prefer not to keep them as customers. The positive result is that the registrars suspend domains and sometimes entire registrant portfolios. Registrars have told me that our reporting has also led them to drop abuse-prone resellers.
It is sometimes frustrating that the process takes time, but diligence is important. For example, I see a lot of inadvertently incorrect WHOIS records. No one wants to turn off someone's domain by mistake.
So it's unrealistic to expect anyone to catch and address every batch of problem domains, and we can't guarantee we will. But Afilias is always happy to receive well-sourced, documented tips to go on. I contacted AB Domains about the IDOWEDDINGBOOK.INFO case, and I will let you know how it turns out. Thanks, and I look forward to keeping in touch with you.
With best wishes,
Director, Domain Security
I'll have more commentary and a response soon.
September 16, 2008
Hey, Afilias, wake up - you're supporting spammers!
I don't often take time out from posting links and project updates to point out things I think are deeply wrong with Internet governance, the responsibility of registrars to police their customers, and so forth. That's what CircleID is for, after all. But today I'm going to shine a light on Afilias, the .info registry, because they're falling down on the job.
There is a spammer or spam organization (I suspect it is "topshoppingcart", a fairly standard "make erection fast" pharma spammer - if you've seen the "Reorder reminder" spam, you've seen them) who is in the practice of registering .info domains with names that are usually words strung together (such as idoweddingbook.info, the example we'll use here). This in itself isn't unusual, spammers have gotten fairly creative when it comes to naming their domains, by necessity, because of course they use and discard them by the hundreds of thousands.
What is unusual here is that this particular spammer forges their whois information in predictable, and maddeningly obvious, ways. From the whois entry for the example domain we quoted above:
Domain ID:D25810743-LRMS Domain Name:IDOWEDDINGBOOK.INFO Created On:08-Aug-2008 14:33:41 UTC Last Updated On:20-Aug-2008 11:47:05 UTC Expiration Date:08-Aug-2009 14:33:41 UTC Sponsoring Registrar:AB Systems Inc. (R394-LRMS) Status:TRANSFER PROHIBITED Registrant ID:YLK9V2O12N4KXRX4 Registrant Name:Xavier Jimenez Registrant Organization:Safeway Inc. Registrant Street1:364 Trumann Str. Registrant Street2: Registrant Street3: Registrant City:Marana Registrant State/Province:AZ Registrant Postal Code:92371 Registrant Country:US Registrant Phone:+1.4807936923 Registrant Phone Ext.: Registrant FAX:+1.4807936923 Registrant FAX Ext.: Registrant Email:firstname.lastname@example.org
There are several "tells" in the information above, and the forgery is easily seen from the use of "Str." as an abbreviation - which is common in Germany and Eastern Europe, but not in the United States. In order of obviousness:
- There is no Trumann Street in Marana, Arizona, so it's obviously forged, in this case, as in every other one I've seen, from the name of a town.
- The only town named Trumann in the United States is in Arkansas
- The 92371 ZIP code is in California, not Arizona
- There is no 793 exchange in the 480 area code
- Why would Safeway Inc. be registering random .info domains? And why would they be using a Wal-Mart "wmconnect.com" address?
In a nutshell, if Afilias did any vetting at all on address information for new domain registrations, they could stop this spammer (who is probably using stolen credit card numbers anyway). So the question becomes: why aren't they?
I'm not trying to suggest that they're somehow in on it with the spammers; you'd have to be a frothing loon to think that they don't want to stop this sort of registration, especially if they aren't being paid, as is often the case. It may be that this spammer is actually paying, though, as the domain was registered over six weeks ago and the whois entry is still in effect, as is the DNS.
For forty dollars US, which is even less than forty dollars Canadian these days, they could get a commercial US ZIP code database, or if they wanted to go the open source route, they could even grab an incomplete ZIP database from sourceforge for the cost of a half a megabyte download. For US$250, they could buy a commercial NPA NXX database listing all of the US area codes and their exchanges. I'm sure other mechanisms exist for licensing such databases for use in other software, and there are commercial and open services (such as geocoder.us or geonames) that provide similar information.
Back in April, I was reporting these domains as I found them to a consultant working with Afilias. He's since moved on, but the practice continues. Surely it can't be difficult to identify this practice and put a stop to it with some basic vetting of registrant address data.
Wake up, Afilias. And to all the registrars out there who accept "555.555.5555" as a fax number, and "vet" it by adding "+1" to the front, or allow "null" to be the city in an address, you're hereby on notice, too.
UPDATE: made a couple of minor edits; just wanted to make it clear that I do distinguish between Afilias the registry and the various registrars.
October 4, 2007
Why is it that whois returns the same thing for all Taiwanese universities?
Ministry of Education Computer Center
12th Fl, 106, Hoping E. Road, Sec 2.
Taiwan Republic of China, R.O.C
Are they afraid that someone might find out they have more than one university? Or is this a reflection of some sense that everything must be centrally controlled and so all requests for information must go through a single point? Or is it just incompetence or lack of care?
Why is it that so many Polish whois entries conflate the contact's name with the contact's organization, even when using a registrar that allows for separate fields for them? For example, kaskada.net contains:
KASKADA Artur Lagodzinski email@example.com +48.627479977
KASKADA Artur Lagodzinski
ul. Kilinskiego 35
Just curious how this practice got started. It makes it a lot more painful to try to determine what organization a given Polish domain belongs to.