Big Data

There are hundreds of millions of domains registered on the Internet’s authoritative DNS name servers. And hundreds of thousands new or modified registrations occur every day. Some of these are legitimate, but many are for malicious purposes. The security community flags a tiny fraction of these existing, new and modified domain registrations as bad.

OpenDNS handles recursive DNS resolution for about 2% of the Internet’s users. Every day, we receive DNS queries for hundreds of millions of these same domains. Using a simple visual to plot frequency count, it’s quick to observe the “head” formed by users attempting to connect to the most popular domains (i.e., google.com, facebook.com).

long tailAnd the “long tail” formed by very few users attempting to connect to the much bigger proportion of relatively unknown domains (i.e., asdasadf232ds.hosting.ru). Most of these queries are to the before mentioned registered domains, but some are to non-existent domains.

Collection & Characterization

Other security vendors have analyzed this long tail for threats. However, many have much smaller data sets to work with as their technology is limited to a few threat vectors such as email or Web traffic. Many lack real-time, cloud-based data collection systems. Almost all lack an Internet-wide routing network to help analyze the context of the data collected.

OpenDNS’s technology platform enables us to collect data on this long tail across every threat vector, in real-time, in the cloud using our Internet-wide Anycast routing network. However, it’s difficult for researchers to observe malicious linkages between seemingly unrelated objects using an Internet-sized data set. Visualizing big sets of data by different attributes enables patterns to be more easily characterized. However, more sophisticated techniques than frequency counts of DNS queries are required to identify domains as bad or suspicious.

One such sophisticated method is by tracking the ASN (Autonomous System Number) associated with every domain name our recursive DNS servers have resolved via the IP routing prefixes that we can associate using our Anycast network and the IP addresses listed in its DNS records. We can reverse the process to link domain names by their ASN.

domain IP

Visualizing Threat Patterns

In our previous blog post, we observed the increasing use of algorithmically generated domain names (aka. “DGA”) used by malicious software to phone home to botnet controllers. There are a several [obvious] characteristics of the names themselves to set them apart from legitimate domains (e.g. character length, randomness).

OpenDNS security researchers have developed heuristics to detect such characteristics using lexical analysis and Shannon’s information theory entropy analysis coupled with big data manipulation. However, this alone can only identify a domain as suspicious. Therefore, our researchers built and/or customized several visualization engines that correlate additional domain attributes. These engines enable us to observe clusters sharing other similar characteristics. We’ll share three case examples below.

CASE #1

domain list

  • Several hundred suspicious domain names were clustered together since they all mapped to the same IP address, which by itself is still nothing more than suspicious. The following screenshot shows a few of these.
  • All the suspicious domains were created on the same date (01 Aug 2012).
  • While “whois” information indicates they’re all registered by different people, after a closer look, the registrant’s email addresses share the pattern of facbani[digits]@hotmail.com. The following screenshot shows this across three different domain registrations.

whois

  • This IP address shares the same ASN as a large number of other malicious domain names registered in the past and present.

casemap

  • Purposely faked domain registrants all occurring on the same date and connected with past and present malicious domains is enough to confidently flag these domains as malicious to protect OpenDNS customers.

CASE #2

  • 47 suspicious domains were clustered together because they triggered our DGA heuristics via a very restrictive threshold setting.
  • By observing DNS query trend analysis, there is a strikingly uniform time series pattern over a week’s long time window.

trends

  • All the domain names had non-existent DNS records; meaning they could not be resolved to an IP address.
  • We’re able to flag these as “likely bad” since our hypothesis is that these are DGA domains, but not yet registered by the cybercriminal. We will soon be releasing a new category for OpenDNS customers to block such domains that we believe, but are not certain, are malicious.

CASE #3

  • Several suspicious domains were clustered together due to being densely clustered by IP, thus ASN (6539, 15418), and name servers (e.g., skyhi.mobi, tnsdns.net).

casemap2

  • parkeddomainVisual inspection of a few of these domains were observed to be parked domains. They could possibly be URL forwarding services for profiteers, but not cybercriminals.
  • parkeddomain2These domains will remain flagged merely as suspicious.

Managing Security Risks

As the number of correlated domain attributes characterized as malicious or suspicious increase, the confidence our researchers and/or automated systems have to flag a domain name increases. This is a continuous proactive system, so as new data is collected and patterns are detected, the extra intelligence is used to update the domain flag accordingly.

Today, our security settings include “Malware”, “Botnets”, “Phishing” and “Suspicious Response”. The first three indicating that we’re very confident that the domains in these categories are indeed bad. However, soon we will provide customers one or two additional categories for domains that we believe are likely to be bad in one or more ways. Customers may block or allow these categories to manage their security risk for different networks, users or devices as needed.

This post is categorized in: