• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Contact Sales
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Why Us
    • Why Cisco Umbrella
      • Why Try Umbrella
      • Why DNS Security
      • Why Umbrella SASE
      • Our Customers
      • Customer Stories
      • Why Cisco Secure
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Global Cloud Network Activity
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Interactive Intelligence
      • Cyber Attack Prevention
      • Umbrella and Cisco Talos Threat Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco Umbrella and SecureX
  • Products
    • Cisco Umbrella Products
      • Cisco Umbrella Cloud Security Service
      • Recursive DNS Services
      • Cisco Umbrella SIG
      • Umbrella Investigate
      • What’s New
    • Product Packages
      • Cisco Umbrella Package Comparison
      • – DNS Security Essentials Package
      • – DNS Security Advantage Package
      • – SIG Essentials Package
      • – SIG Advantage Package
      • Umbrella Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Cloud Data Loss Prevention (DLP)
      • Cloud-Delivered Firewall
      • Cloud Malware Protection
      • Remote Browser Isolation (RBI)
    • Man on a laptop with headphones on. He is attending a Cisco Umbrella Live Demo
  • Solutions
    • SASE & SSE Solutions
      • Cisco Umbrella SASE
      • Secure Access Service Edge (SASE)
      • What is SASE
      • What is Security Service Edge (SSE)
    • Functionality Solutions
      • Web Content Filtering
      • Secure Direct Internet Access
      • Shadow IT Discovery & App Blocking
      • Fast Incident Response
      • Unified Threat Management
      • Protect Mobile Users
      • Securing Remote and Roaming Users
    • Network Solutions
      • Guest Wi-Fi Security
      • SD-WAN Security
      • Off-Network Endpoint Security
    • Industry Solutions
      • Government and Public Sector Cybersecurity
      • Financial Services Security
      • Cybersecurity for Manufacturing
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Research Reports
      • Case Studies
      • Videos
      • Datasheets
      • eBooks
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • For Customers
      • Support
      • Customer Success Webinars
      • Cisco Umbrella Studio
    • Get the 2022 Cloud Scurity Comparison Guide
  • Trends & Threats
    • Market Trends
      • Hybrid Workforce
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
    • Security Threats
      • How to Stop Phishing Attacks
      • Malware Detection and Protection
      • Ransomware is on the Rise
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Global Cyber Threat Intelligence
      • Cyber Threat Categories and Definitions
    •  
    • Woman connecting confidently to any device anywhere
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Person looking down at laptop. They are connecting and working securely
  • Blog
    • News & Product Posts
      • Latest Posts
      • Products & Services
      • Customer Focus
      • Feature Spotlight
    • Cybersecurity Posts
      • Security
      • Threats
      • Cybersecurity Threat Spotlight
      • Research
    •  
    • Register for a webinar - with illustration of connecting securely to the cloud
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
  • Free Trial
Threats

Utilizing NLP To Detect APT in DNS

By Jeremiah O'Connor
Posted on March 5, 2015
Updated on September 8, 2021

Share

FacebookTweetLinkedIn
nlp

Imagine that after a nice, relaxing long weekend, you come in to work Monday morning at your job at the bank. While waking up with a cup of coffee, you begin checking email. Among the usual messages, there’s a message about a security update and you click it. Security updates are so common these days that it’s normal to get another email about one. What you don’t know is that your system has just been infected, starting off a long chain of events behind one of the biggest thefts in cyber history.

This scenario is similar to how the first successful, large-scale computer bank robbery was launched in January 2013, by a group labeled Carbanak. First mentioned in a recent Kaspersky report, the Carbanak group launched a series of cyber-espionage attacks targeting banks and financial institutions for at least two years. Prior to this report, in December, 2014, Fox-IT Security published information on what may have been a precursor to these attacks about a banking Trojan called Anunak. These attacks resulted in a loss of over one billion dollars across a number of banks from countries such as Ukraine, China, Russia, U.S., and Germany.

In the attacks, the malicious actors gained entry to an employee’s computer by utilizing spear phishing techniques to install a backdoor, granting them remote access to the system in order to exfiltrate data. They were then able to move laterally across systems and gain access to administrative accounts, which were used to conduct fraudulent money transfers and control ATMs.

As reported by Fox-IT and Kaspersky, these attacks were conducted by an advanced persistent threat (APT) group. OpenDNS Security Labs builds predictive models to track these types of adversarial groups and block domains related to their activities, in order to keep our customers safe. To create these models, we mine our large DNS data infrastructure for data about attacks and then uncover the patterns within. Looking at the data related to these attacks, we found that the domains in this particular Carbanak data set exhibited similar patterns to domains associated with DarkHotel and other APT data sets. Additionally, we were able to collaborate with Michael Sandee from Fox IT security to gain access to data from the Anunak attacks, which had overlap with the Kasperksy report. Let’s take a look at some of the features we were able to extract from the data sets.

When comparing these domains to the DarkHotel data set and other APT domains, we observed that they were constructed in a similar lexical fashion. One of the spoofing techniques often leveraged is the impersonation of a legitimate software or tech company in an email claiming a required software update.

Some examples from the different sets were as follows:

DarkHotel:

  • adobeupdates[.]com
  • adobeplugs[.]net
  • adoberegister[.]flashserv[.]net
  • microsoft-xpupdate[.]com

Carbanak:

  • update-java[.]net
  • adobe-update[.]net

Examples of APT Domains:

  • gmailboxes[.]com
  • microsoft-update-info[.]com
  • firefoxupdata[.]com

Essentially we are defining a “malicious language” within the lexical nature of DNS traffic, and applying sentiment analysis on FQDNs. In an attempt to construct this language, we have created a corpus of domains that elicit a common pattern where adversaries merge together certain dictionary words and tech company strings.

Here are some examples from our corpus:

  • facebooklogin-facebook[.]com
  • security-paypal-center[.]com
  • securitycheck-paypal[.]com
  • billingupdate-paypal[.]com

We also observed patterns in WHOIS information from some of the Anunak/Carbanak domains many of which are registered with Bizcn.com, Inc.

For example:

Details for update-java[.]net

traffic-update-java
registrar-update-java

When conducting our investigations with OpenDNS Investigate, we found that there were multiple examples of suspicious looking domains advertising “java updates”. Additionally they all exist on the same infrastructure, are lexically similar, and exhibit similar interesting patterns:

Screen Shot 2015-02-22 at 1.49.54 AM

OpenDNS Security Labs specializes in developing new threat detection models to identify different types of attacks. One of the newest additions to our arsenal is NLPRank. Utilizing natural language processing (NLP), the predictive model identifies potentially malicious typo-squatting/targeted phishing domains. APT groups often use spear-phishing techniques and legitimate domain spoofing as an obfuscation technique to carry out their criminal campaigns. NLPRank is designed to detect these fraudulent branded domains that often serve as C2 domains for targeted attacks. Our system utilizes heuristics such as NLP, ASN mappings and weightings, WHOIS data patterns, and HTML tag analysis to classify these type of attack domains. NLPRank uses a minimum edit-distance on substrings to check for the word distance between legitimate and typo-squatting domains (ex. malware.com vs. rnalware.com, linkedin.com vs. 1inkedin.net).

Let’s step back and discuss high-level how the edit-distance algorithm works. Minimum edit-distance is a shortest-path, dynamic-programming algorithm  that checks for similarity between 2 strings. The minimum edit-distance between 2 strings is defined as the minimum number of edits it takes (ex. insertion, deletion, substitution) to turn string A into string B. Basically anytime you have to make an edit you incur a penalty. We are searching for the least path (sequence of edits), from our initial string to our goal string.

  • Initial state: string we’re transforming
  • Operations: insert, delete, substitution
  • Goal state: final string
  • Path cost: minimized # of edits

Word Example:
Initial String:

i n c e _ p t i o n

Goal String:

_ e x e c u t i o n

For this example, there are 5 edits, 3 substitutions, 1 deletion, 1 insertion, making the penalty 5.

Domain example:
Initial Domain:

g00gle.com

Goal Domain:

google.com

For this example, there are 2 edits, 2 substitutions, making the penalty 2.
Some real-world applications of the edit-distance algorithm are seen in spell-checking, information retrieval, machine translation, speech recognition, computational biology to align nucleotide sequences, and now information security. The intuition behind using this algorithm is that essentially we’re trying to define a language used by malicious domains vs. a language of benign domains in DNS traffic.

Another way NLPRank detects fraudulent domain behavior is observing domains hosted on ASNs that are unassociated with the company they’re spoofing. We leveraged OpenDNS SecurityGraph’s vast amount of ASN data to investigate different types of attacks in our user’s DNS data. We used this to build up an ASN map of all legitimate domains mapping to their appropriate ASNs. For example, you would expect an Adobe domain advertising an update to be associated with an ASN associated with Adobe (ex. 14365, 44786, etc.), or a Java update to be associated with an Oracle ASN (ex. 41900, 1215,  etc.). Both of the Carbanak domains mentioned above using those company names as substrings came from: ASN 44050, PIN-AS Petersberg Internet Network LLC in Russia.

NLPRank, which also detected some of these Anunak/Carbanak domains, was recently used to identify a cluster of advanced Paypal Phishing Attacks that we detailed on this blog. It was also able to identify many similar types of phishing attacks spoofing major companies including: Google/Gmail, WellsFargo, Facebook, Dropbox, Apple/iTunes, and many more from Paypal. We also found that attackers use kits and tools such as HTTrack to copy legitimate sites.

Here is a snippet of code we found in the HTML of one of these Phishing sites outlining tool used to copy the site:

httrack

One of the interesting things we found was that certain tags were directly copied from the legitimate site. For example some the links found for jobs on spoofed Paypal pages, would be directly linked to the jobs page on the legitimate Paypal site.

Here is just a few of the screenshots of NLPRank’s findings:

Google/Gmail Phishing SiteScreen Shot 2015-02-18 at 11.55.25 PM
Screen Shot 2015-02-15 at 9.18.31 PM
Facebook phishing siteScreen Shot 2015-02-18 at 11.46.38 PM
WellsFargo PhishingScreen Shot 2015-02-20 at 7.56.43 AMMultiple Company Typo-squatting on “google”
Screen Shot 2015-02-14 at 9.45.49 PM
Adobe spoofing using Dropbox Webpage
Screen Shot 2015-02-13 at 1.16.42 PM
A bunch of Paypal Phishing sites
Screen Shot 2015-02-17 at 4.32.06 PMScreen Shot 2015-02-21 at 3.05.54 PMScreen Shot 2015-02-21 at 7.12.25 PM
Here is another Paypal which was a redirect from the domain email-paypal[.]info
Screen Shot 2015-02-22 at 1.00.48 PM

One can see from the above examples that NLPRank is succeeding in identifying domain spoofing/targeted phishing attacks, and is a robust method for defending against APT attacks such as Anunak/Carbanak. Our labs team will continue to develop NLPRank threat model to discover more of these type of targeted attacks domains and keep our customers safe. This is just one of the ways that Cisco’s DNS-layer security support’s customers.

Previous Post:

Previous Article

Next Post:

Next Article

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Global Cloud Architecture
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Umbrella
  • Cisco Umbrella Blog

Learn more

  • Webinars
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2023 Cisco Umbrella