« new pats posted - 20090614 (maintenance pats release) | Main | Links Roundup »
June 15, 2009
Dead chicken magic and the DNS
Figuring out whether a given hostname naming convention reflects dynamic or static assignment can be considered roughly akin to what an old programmer colleague of mine calls "dead chicken magic". At times, it's quite easy, because of a consistently applied and transparent use of certain tokens (such as "dyn" or "dialup" or "static" or "cable"); at others, you might as well shine up the crystal ball. For those cases where the naming convention isn't transparent, we're fortunate in having several useful tools at our disposal.
Perhaps the most useful of all - when it's available - is rwhois. Given an IP or set of IPs, it is theoretically possible to find out what kind of netblock you're dealing with. Many admins put notes into the registry indicating pools, static commercial allocations, and what sort of service uses the IPs in question (DSL, dialup, wireless, etc.) Most do not, however, and so if there are no clues whatsoever in the whois output, you're left with a hunt on your hands.
ISP and telco Web sites are often a good source for such information as well, especially smaller local ISPs and telcos, who tend to focus on a smaller subset of technologies (such as cable TV over fiber optic, or wireless, or metro ethernet). Price lists are another great place to find out whether a given provider offers static IP addresses, because they often charge the customer more for them. This can be somewhat less reliable (for us, anyway) when the Web site is in a language we don't speak, but Google Translate does a pretty good job in most cases (still waiting for Azerbaijan to English, FWIW). The big problem is when a given ISP offers many different kinds of services, but doesn't indicate which are which in their naming, or indicates the speeds available on their price lists without indicating the underlying technology.
Wikipedia is a surprisingly good source for information that companies often fail to provide on their own Web sites - such as the services and technologies they offer, the corporate relationships between branded services, and so forth.
One thing we've noticed is that countries that are part of the former Soviet bloc, or those that were once referred to as part of the "developing world", are usually deploying fiber and metro ethernet, whereas countries with older telephony infrastructure deploy dialup and DSL. And oddly enough, there is a strong correlation between a given region and the odds that they will dole out static IPs, or in some cases, put all of their customers on private addresses and route all traffic out through a subset of NATs with public IPs.
Another thing worth noting is that the widespread use of certain tokens to identify dynamics can be a source of massive amounts of false positives. For example, those of us who grew up with dialup might be forgiven for thinking that PPP is a strong indicator of a dialup modem. Unfortunately for us, while that may have been true ten years ago, modern technologies like fiber optics and metro ethernet often use PPP (PPPoE in the latter case) to assign static IPs. This is common in Eastern Europe, for example. The same thing can be said for DHCP (Dynamic Host Configuration Protocol), which explicitly implies "dynamic" but which may be used to dole out static IPs through the use of scopes and permanent leases. The classifier of naming conventions ignores this at their peril.
Given a large enough data set, dynamicity can be assumed across a set of hosts if you can observe different hosts acting in similar ways over time but never concurrently. So, a bot on one PC that is a client of a given ISP might be smart enough to probe DNSBLs in order to discover whether a given dynamic IP is listed, and drop that lease to fetch a different IP. Enemieslist's regular expression approach is most useful in this instance, as once we have a pattern for an ISP's naming convention, it eliminates the issues with the delay between when a new, reputation-clean IP is assigned to a bot and when DNSBLs list it as a spam source. It also avoids the issues with staleness - where an IP is listed by a DNSBL but has since been swapped for another by a smart bot. In any event, it's difficult to obtain such large data sets with sufficient transparency into the traffic itself, so it's a last resort.
Once all the methods above have been tried, the real dead chicken magic comes in - and we have to rely on local knowledge and custom. Fortunately, we've been doing this for so long that we've amassed a sufficiently broad data set that we can often make defensible assumptions on the basis of prior knowledge. The best example of this is knowing an insider who can help, of course. Such as the former admin and blacklist operator who filled me in on which ISPs in the Netherlands didn't offer static IPs (turns out it's quite rare). Or the friend who clarified the often confusing abbreviations used by various Japanese telcos, so we could tell dynamic from static and fiber from DSL.
In any case, it can be a lot of fun. And at other times, it can be a mind-numbingly annoying and Sisyphean task. Either way, it would be a lot easier, and a lot more fun, if more admins (and more RIRs) put more information into their naming, their whois, and their Web sites.
Posted by schampeo at June 15, 2009 5:20 PM
Trackback Pings
TrackBack URL for this entry:
http://enemieslist.com/mt/cgi-bin/mt-tb.cgi/928