Authoritative DNS Overview
Each day, OpenDNS handles an average of 40 billion recursive DNS queries that are efficiently directed to our 13 worldwide datacenters. Each data center hosts tens of DNS resolvers. When a resolver receives a recursive DNS query, it first checks if it has an answer in its cache, and replies with that answer. If there’s no answer in the cache, or if the answer has expired, then it issues a DNS upstream query to the authoritative name servers and passes the response back to the client. In other words, a recursive resolver performs two main operations: reply from its cache, or issue a query up the DNS authoritative name servers chain.
In this blog, we discuss a new, simple-yet-effective method of mining for new malicious domains. The technique is based on the responses of authoritative name servers that are hosted on IPs tied to known suspicious or malicious domains.
In authoritative DNS traffic, each DNS message has the IP of the name server that sent the reply followed by the raw DNS message. A sample authoritative DNS message is displayed below. The IP address of the name server that sent back the message is 184.108.40.206. Since the Authoritative Answer (AA) bit is set, we know that 220.127.116.11 is one of the authoritaitive name servers for www.google.com. (A domain can have multiple authoritative name servers, and many do as it is preferable for redundancy). Notice that in this packet, there are no Authority or Additional sections.
flags QR AA
www.google.com. IN A
www.google.com. 300 IN A 18.104.22.168
www.google.com. 300 IN A 22.214.171.124
www.google.com. 300 IN A 126.96.36.199
www.google.com. 300 IN A 188.8.131.52
www.google.com. 300 IN A 184.108.40.206
The different observed types of authoritative answers we record in our logs are depicted in the diagram below.
Over the past years, it has become a commodity for cybercriminals to register domains through free registrars, or registrars with lax rules with regard to registrant identification information or abuse reports. These domains are then used for all kinds of malicious purposes: phishing or scam sites, malware hosting, distribution or drop sites, or rendezvous points for botnets to receive payloads and directives.
Through the domain registration process, entries for one or multiple name servers authoritative for the new domain are added to the zone of the domain higher up in the DNS hierarchy. For instance, if we register the domain, example.com, and provide ns1.example.com and ns2.example.com as name servers for example.com then new NS records for ns1.example.com and ns2.example.com are added in the .com zone.
The registrant could also use name servers provided by his hosting provider if he does not want or need to deal with managing his own DNS name server.
Then, at the authoritative name servers level (ns1.example.com and ns2.example.com in this example), new A records for the new domain are added (if the new domain needs only to map to IP addresses), where the domain name is made point to one or multiple IP addresses.
These IPs are the hosting machines for the new domain’s content, or could be a proxy to forward traffic to another layer of domains, IPs.
In the case of domains registered for malicious purposes, the hosting machines for the domain can be picked from general-purpose hosting providers or infected machines of unsuspecting users.
New malicious domains discovery method
One of the intuitions behind this method comes from the observation that malicious domains and their respective name servers are often hosted on the same IP or a close range of IPs that are recycled. This obviously can be used by legitimate sites, since acquiring a new IP range is not free, so the hosting usage of IPs is maximized like in the case of virtual private servers. Nevertheless, this practice is rather prevalent in the case of malicious domains and it provides a ground for this method that we explain in the following section.
We parse the authoritative DNS logs and for each authoritative DNS answer, we check the IP of the nameserver that issued the response to see if it exists in our database of malicious IPs. We extract all DNS messages that were issued by nameservers whose IPs have been active recently and that have a high number of malicious domains mapping to them. Then, we mine the Answer section of each message and extract every domain, IP pair when it is an A record. This list of domain, IP pairs constitutes candidates for further investigation to decide if the domains are indeed malicious or not.
We first exclude discovered domains that appear in our domains’ whitelist as well as those domains that are already known to be malicious. Our goal is to discover new malicious domains, therefore, the remaning list of domains is checked with several classification heuristics like for example keeping only those domains that are part of dense clusters of malicious domains and IPs, or domains whose names have high lexical perplexity and entropy (check our blogs “How Likely is a domain to be malicious” of January 8th and “The role of country code top-level domains (ccTLDs) in malware classification” of January 18th), or domains that have been very recently registered and are parked pages.
As an illustrative example, we take a sample of one hour worth of DNS authoritative logs from one resolver in our London data center. That represents about 5,316,930 DNS messages. After applying our domain discovery method, we identify several hundred new suspicious or malicious domains.
A sample of newly discovered malicious domains is presented below:
The main goal of the Umbrella security labs is to experiment with and apply a wide variety of techniques and algorithms for discovering malicious domains by mining through our Big Data platform that consists of both recursive and authoritative DNS traffic as well as other intelligence sources. At the end, we retain those methods that are the most efficient and effective in bringing up added value in protecting our customers.