One of the main responsibilities the OpenDNS labs team is tasked with is identifying new malicious infrastructure. In this blog, I’ll discuss how we discovered new malicious domains from a well known malware family.
Many DGAs work by feeding a date into a mathematical function to generate a string of characters. Typically, a TLD is then appended to the end of the string, thus forming domain name. This domain name is then contacted for instructions. If the domain name does not resolve to an IP address or the domain does not respond with instructions, the process is repeated. This is a common method of obscuring the command and control servers a malware uses.
Dhia Mahjoub, Steve Mckinney, and I recently presented our findings from tracking the new Gameover Zeus botnet at ISOI. The newGOZ implants used this DGA technique and introduced salts (a.k.a. magic numbers) to the function for added complexity. Two known salts were found in newGOZ binaries, and Steve, a security researcher at Cisco, suggested the idea of brute forcing the salt space in an attempt to identify additional salts.
Domain generation algorithms aren’t a new concept. Neither is the Ramnit family of malware. Recently, Johannes Bader published the function Ramnit uses to generate its command and control domains. An interesting characteristic about the algorithm Ramnit uses is that it does not include a date or timestamp as input to the generation algorithm it uses. This means that, unlike many other malware families that make use of DGAs, Ramnit does not generate a new set of domains depending on the date. In contrast to the newGOZ DGA, Ramnit’s domain generation pattern is not periodic. Below is a picture of the DNS query volume we saw for one of the newGOZ command and control domains:
The newGOZ algorithm uses the current date as input to its DGA. This causes newGOZ to generate a new set of domains each day. Each domain in the set of domains generated for a particular day has a similar query volume pattern to the above graph. Below is a picture of the query volumes OpenDNS has seen for a Ramnit command and control domain:
Math Fights Math
Taking the algorithm implementation from Bader’s blog, the following steps were taken:
- The number of domains to generate was statistically set to one
- A Python generator was added to loop over the seed space (from 0x00000000 through 0xFFFFFFFF)
- The first domain Ramnit would contact for a seed is calculated
- The domain from step three was queried against OpenDNS’s resolver logs at a random hour from a random recent day
- This determines if OpenDNS has received queries for this domain
- If no queries have been seen the domain is ignored and the next seed from step two is used in step three
- If we have seen queries for the domain name the seed from step two is set aside for further processing
- Once a batch of possible seeds is identified, we calculate the first 500 domains the DGA using each seed would produce
- We observe the query volumes for those 500 domains over the last week
- This step validates the findings by using client queries
- This step identifies potential false positives (the Ramnit DGA does collide with legitimate domain names)
- This step determines the size of the set of domains for each seed is (different seeds do, in fact, generate different domain set sizes)
- Each seed and its count of domains to generate is recorded
- These steps are continued until the seed space in step two is exhausted
Due to the first step of randomly selecting a query hour for the first domain generated from each seed, this method has potential false negatives. It does, however, identify a minimum number of seeds in use by Ramnit binaries. Unfortunately, our current system needs optimizations. Out of the approximately 4 billion possible seeds, we’ve only generated and inspected about three percent. Fortunately, this system has been able to identify a few thousand Ramnit command and control domains we were not previously blocking.
Clients Querying These Domains
One interesting note about the client queries for the Ramnit command and control domains identified this way is that many of the client IP addresses querying for these domains are geographically concentrated in only a few countries (GB, AU, IE, and US) and many of the IP addresses query for domains generated by multiple seeds. Explanations for this pattern in client queries include:
- a single Ramnit implant is using multiple seeds
- multiple Ramnit infections behind a single public IP address are using different seeds
- malware sandboxes detonating Ramnit samples are using OpenDNS’s resolvers
Future work for this research includes parallelization to speed up the brute forcing of seed space, generalizing the system for use with other malware families’ DGAs, and further exploring the behavior of compromised clients.