You may not know it, but for the past several months OpenDNS has been building an incredible community of security enthusiasts. We recently received an interesting site submission from someone in our security community that presented a challenge in obfuscation research.
Before I explain, here’s a bit of background. The first thing a malware author does before releasing malicious code is to check that popular antiviruses and content filters are not going to detect it. Evasion techniques are numerous and constantly evolving, and Web malware is especially hard to detect. Web browsers have become extremely complex, and with that complexity comes a lot of attack vectors, and a lot of possible obfuscation techniques.
Early on, Web filters simply scanned Web content for malicious links. As expected, malware authors quickly worked around these, by exploiting the fact that different parsers can interpret the same content in a totally different way.
So, should obfuscated code systematically be flagged as suspicious by content filters? Unfortunately, distinguishing legitimate web application code from malware code is far from trivial.
Modern web applications typically use compilers, compressors, DOM attributes, lazy/deferred loading and optimization techniques that make it difficult to distinguish obfuscated from unobfuscated code without raising an excessive number of false positives.
OpenDNS use a different approach. We are analyzing billion of DNS queries every day in order to detect suspicious behavior. This allows us to spot domains serving malware very quickly. Then, our research team analyzes the code itself, in order to proactively block other domain names the malware is going to spread to.
Let’s take a look at some real-world, obfuscated malware code we stumbled upon.
As expected, this is barely readable. However, it’s easy to see that a lot of code is just a distraction: extra parenthesis and numbers expressed in different ways are all around, but may not be actually useful.
Compressing the code, and reformatting it makes it easier to understand how this code works.
There is a setup phase where variables are constructed by combining a lot previously defined variables initially containing just a few characters. At at some point, this forms code that is ready to be executed. And we want to trace the code from here.
In order to do so, we copied the code to a sandbox, and wrapped it in an object:
Running the malicious code in a dedicated scope makes it easy to override properties and functions in order to observe its behavior.
In this context, the code fails because “eval” hasn’t been defined, whereas it would have been if executed in the global (window) scope:
We learn that:
eval() is being used to run code.
eval() has been evaluated in the current context (presumably using an explicit reference to “this”), not in the global context.
Thus, we can easily override it in order to check what code is evaluated, and drop to the debugger whenever we find something that looks interesting to trace. This can also be used in order to simulate different web browsers.
There are numerous ways to load code from a remote server and to replace the current page with third-party content: image+canvas, inserting script tags, loading external stylesheets, XHR, frames…
But most of the functions required to do so can be overridden, including DOM manipulation functions.
As a starting point, we are going to override XMLHttpRequest() and eval().
This let us see what dynamically generated code is being evaluated, and what XHR queries are attempted.
It’s interesting to see that XHR requests are actually used to detect and work around debuggers (Firebug) and deobfuscation tools. Also note that “GE” is used instead of “GET”. This works fine in all major browser, yet this is something that content filters may not be aware of.
Running the code after these two XHR requests makes the browser immediately load a different page hosting different malicious code.
Since we diverted eval() calls, something else must have been used in order to achieve this result.
Let’s find out what. For starters, we can track the global scope for variables containing something which is likely to be related to an external resource, like “http” or “location”. A simple piece of code that we can run manually or as a watched expression can do the trick:
The “MxG” variable matches, and contains an obvious reference to malicious third-party content. We now have to track any function calls involving this variable or something composed with its content.
But there are no more XHR calls. No DOM manipulation. No more eval() calls. How does this code get executed?
Using a debugger, we can keep tracing the code. It can be reconstructed by replacing variables with their content.
Here is the reconstructed, unobfuscated, commented code:
eval() can thus be rewritten as:
If you’ve made it this far into the post you can see that it’s no longer enough to simply look for specific patterns in the code. It takes a keen eye, and behavioral analysis to detect malware given the advent of such sophisticated obfuscation techniques. This is just one way that OpenDNS is approaching malware research in order to evolve ahead of threats. Stay tuned in the coming weeks as the security research team will be sharing more details on how we look at the 40+ billion DNS queries we see each day.