nlpImagine that after a nice, relaxing long weekend, you come in to work Monday morning at your job at the bank. While waking up with a cup of coffee, you begin checking email. Among the usual messages, there’s a message about a security update and you click it. Security updates are so common these days that it’s normal to get another email about one. What you don’t know is that your system has just been infected, starting off a long chain of events behind one of the biggest thefts in cyber history.

This scenario is similar to how the first successful, large-scale computer bank robbery was launched in January 2013, by a group labeled Carbanak. First mentioned in a recent Kaspersky report, the Carbanak group launched a series of cyber-espionage attacks targeting banks and financial institutions for at least two years. Prior to this report, in December, 2014, Fox-IT Security published information on what may have been a precursor to these attacks about a banking Trojan called Anunak. These attacks resulted in a loss of over one billion dollars across a number of banks from countries such as Ukraine, China, Russia, U.S., and Germany.

In the attacks, the malicious actors gained entry to an employee’s computer by utilizing spear phishing techniques to install a backdoor, granting them remote access to the system in order to exfiltrate data. They were then able to move laterally across systems and gain access to administrative accounts, which were used to conduct fraudulent money transfers and control ATMs.

As reported by Fox-IT and Kaspersky, these attacks were conducted by an advanced persistent threat (APT) group. OpenDNS Security Labs builds predictive models to track these types of adversarial groups and block domains related to their activities, in order to keep our customers safe. To create these models, we mine our large DNS data infrastructure for data about attacks and then uncover the patterns within. Looking at the data related to these attacks, we found that the domains in this particular Carbanak data set exhibited similar patterns to domains associated with DarkHotel and other APT data sets. Additionally, we were able to collaborate with Michael Sandee from Fox IT security to gain access to data from the Anunak attacks, which had overlap with the Kasperksy report. Let’s take a look at some of the features we were able to extract from the data sets.

When comparing these domains to the DarkHotel data set and other APT domains, we observed that they were constructed in a similar lexical fashion. One of the spoofing techniques often leveraged is the impersonation of a legitimate software or tech company in an email claiming a required software update. Some examples from the different sets were as follows:

DarkHotel:

  • adobeupdates[.]com
  • adobeplugs[.]net
  • adoberegister[.]flashserv[.]net
  • microsoft-xpupdate[.]com

Carbanak:

  • update-java[.]net
  • adobe-update[.]net

Examples of APT Domains:

  • gmailboxes[.]com
  • microsoft-update-info[.]com
  • firefoxupdata[.]com

Essentially we are defining a “malicious language” within the lexical nature of DNS traffic, and applying sentiment analysis on FQDNs. In an attempt to construct this language, we have created a corpus of domains that elicit a common pattern where adversaries merge together certain dictionary words and tech company strings. Here are some examples from our corpus:

  • facebooklogin-facebook[.]com
  • security-paypal-center[.]com
  • securitycheck-paypal[.]com
  • billingupdate-paypal[.]com

We also observed patterns in WHOIS information from some of the Anunak/Carbanak domains many of which are registered with Bizcn.com, Inc. For example:

 

Details for update-java[.]nettraffic-update-java

registrar-update-java

When conducting our investigations with OpenDNS Investigate, we found that there were multiple examples of suspicious looking domains advertising “java updates”. Additionally they all exist on the same infrastructure, are lexically similar, and exhibit similar interesting patterns:

Screen Shot 2015-02-22 at 1.49.54 AM

OpenDNS Security Labs specializes in developing new threat detection models to identify different types of attacks. One of the newest additions to our arsenal is NLPRank. Utilizing natural language processing (NLP), the predictive model identifies potentially malicious typo-squatting/targeted phishing domains. APT groups often use spear-phishing techniques and legitimate domain spoofing as an obfuscation technique to carry out their criminal campaigns. NLPRank is designed to detect these fraudulent branded domains that often serve as C2 domains for targeted attacks. Our system utilizes heuristics such as NLP, ASN mappings and weightings, WHOIS data patterns, and HTML tag analysis to classify these type of attack domains. NLPRank uses a minimum edit-distance on substrings to check for the word distance between legitimate and typo-squatting domains (ex. malware.com vs. rnalware.com, linkedin.com vs. 1inkedin.net).

Let’s step back and discuss high-level how the edit-distance algorithm works. Minimum edit-distance is a shortest-path, dynamic-programming algorithm  that checks for similarity between 2 strings. The minimum edit-distance between 2 strings is defined as the minimum number of edits it takes (ex. insertion, deletion, substitution) to turn string A into string B. Basically anytime you have to make an edit you incur a penalty. We are searching for the least path (sequence of edits), from our initial string to our goal string.

  • Initial state: string we’re transforming
  • Operations: insert, delete, substitution
  • Goal state: final string
  • Path cost: minimized # of edits

Word Example:

Initial String:

i n c e _ p t i o n

Goal String:

_ e x e c u t i o n

For this example, there are 5 edits, 3 substitutions, 1 deletion, 1 insertion, making the penalty 5.

Domain example:

Initial Domain:

g00gle.com

Goal Domain:

google.com

For this example, there are 2 edits, 2 substitutions, making the penalty 2.

Some real-world applications of the edit-distance algorithm are seen in spell-checking, information retrieval, machine translation, speech recognition, computational biology to align nucleotide sequences, and now information security. The intuition behind using this algorithm is that essentially we’re trying to define a language used by malicious domains vs. a language of benign domains in DNS traffic.

Another way NLPRank detects fraudulent domain behavior is observing domains hosted on ASNs that are unassociated with the company they’re spoofing. We leveraged OpenDNS SecurityGraph’s vast amount of ASN data to investigate different types of attacks in our user’s DNS data. We used this to build up an ASN map of all legitimate domains mapping to their appropriate ASNs. For example, you would expect an Adobe domain advertising an update to be associated with an ASN associated with Adobe (ex. 14365, 44786, etc.), or a Java update to be associated with an Oracle ASN (ex. 41900, 1215,  etc.). Both of the Carbanak domains mentioned above using those company names as substrings came from: ASN 44050, PIN-AS Petersberg Internet Network LLC in Russia.

NLPRank, which also detected some of these Anunak/Carbanak domains, was recently used to identify a cluster of advanced Paypal Phishing Attacks that we detailed on this blog. It was also able to identify many similar types of of phishing attacks spoofing major companies including: Google/Gmail, WellsFargo, Facebook, Dropbox, Apple/iTunes, and many more from Paypal. We also found that attackers use kits and tools such as HTTrack to copy legitimate sites. Here is a snippet of code we found in the HTML of one of these Phishing sites outlining tool used to copy the site:
httrack

One of the interesting things we found was that certain tags were directly copied from the legitimate site. For example some the links found for jobs on spoofed Paypal pages, would be directly linked to the jobs page on the legitimate Paypal site.

Here is just a few of the screenshots of NLPRank’s findings:

Google/Gmail Phishing SiteScreen Shot 2015-02-18 at 11.55.25 PM
Screen Shot 2015-02-15 at 9.18.31 PM
Facebook phishing siteScreen Shot 2015-02-18 at 11.46.38 PM
WellsFargo PhishingScreen Shot 2015-02-20 at 7.56.43 AMMultiple Company Typo-squatting on “google”
Screen Shot 2015-02-14 at 9.45.49 PM

Adobe spoofing using Dropbox Webpage
Screen Shot 2015-02-13 at 1.16.42 PM

A bunch of Paypal Phishing sites
Screen Shot 2015-02-17 at 4.32.06 PMScreen Shot 2015-02-21 at 3.05.54 PMScreen Shot 2015-02-21 at 7.12.25 PM
Here is another Paypal which was a redirect from the domain email-paypal[.]info
Screen Shot 2015-02-22 at 1.00.48 PM

One can see from the above examples that NLPRank is succeeding in identifying domain spoofing/targeted phishing attacks, and is a robust method for defending against APT attacks such as Anunak/Carbanak. Our labs team will continue to develop NLPRank threat model to discover more of these type of targeted attacks domains and keep our customers safe.

This post is categorized in: