• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Contact Sales
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Why Us
    • Why Cisco Umbrella
      • Why Try Umbrella
      • Why DNS Security
      • Why Umbrella SASE
      • Our Customers
      • Customer Stories
      • Why Cisco Secure
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Global Cloud Network Activity
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Interactive Intelligence
      • Cyber Attack Prevention
      • Umbrella and Cisco Talos Threat Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco Umbrella and SecureX
  • Products
    • Cisco Umbrella Products
      • Cisco Umbrella Cloud Security Service
      • Recursive DNS Services
      • Cisco Umbrella SIG
      • Umbrella Investigate
      • What’s New
    • Product Packages
      • Cisco Umbrella Package Comparison
      • – DNS Security Essentials Package
      • – DNS Security Advantage Package
      • – SIG Essentials Package
      • – SIG Advantage Package
      • Umbrella Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Cloud Data Loss Prevention (DLP)
      • Cloud-Delivered Firewall
      • Cloud Malware Protection
      • Remote Browser Isolation (RBI)
    • Man on a laptop with headphones on. He is attending a Cisco Umbrella Live Demo
  • Solutions
    • SASE & SSE Solutions
      • Cisco Umbrella SASE
      • Secure Access Service Edge (SASE)
      • What is SASE
      • What is Security Service Edge (SSE)
    • Functionality Solutions
      • Web Content Filtering
      • Secure Direct Internet Access
      • Shadow IT Discovery & App Blocking
      • Fast Incident Response
      • Unified Threat Management
      • Protect Mobile Users
      • Securing Remote and Roaming Users
    • Network Solutions
      • Guest Wi-Fi Security
      • SD-WAN Security
      • Off-Network Endpoint Security
    • Industry Solutions
      • Government and Public Sector Cybersecurity
      • Financial Services Security
      • Cybersecurity for Manufacturing
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Research Reports
      • Case Studies
      • Videos
      • Datasheets
      • eBooks
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • For Customers
      • Support
      • Customer Success Webinars
      • Cisco Umbrella Studio
    • Get the 2022 Cloud Scurity Comparison Guide
  • Trends & Threats
    • Market Trends
      • Hybrid Workforce
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
    • Security Threats
      • How to Stop Phishing Attacks
      • Malware Detection and Protection
      • Ransomware is on the Rise
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Global Cyber Threat Intelligence
      • Cyber Threat Categories and Definitions
    •  
    • Woman connecting confidently to any device anywhere
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Person looking down at laptop. They are connecting and working securely
  • Blog
    • News & Product Posts
      • Latest Posts
      • Products & Services
      • Customer Focus
      • Feature Spotlight
    • Cybersecurity Posts
      • Security
      • Threats
      • Cybersecurity Threat Spotlight
      • Research
    •  
    • Register for a webinar - with illustration of connecting securely to the cloud
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
  • Free Trial
Research

The role of country code top-level domains (ccTLDs) in malware classification

By OpenDNS Security Research
Posted on January 18, 2013
Updated on March 6, 2020

Share

FacebookTweetLinkedIn

Last week we posted an examination of whether the location of where a domain is hosted increases its likelihood to be malicious. Indeed, we confirmed that some countries are hosting a significantly higher ratio of malicious sites than clean sites. But rather than rest on a superficial assumption based on the geography of where a domain is hosted, we wanted to more deeply explore the relationship between geography, ccTLDs and malicious domains.

Unlike generic top-level domains (.com, .net) that most anyone can buy, an Internet country code top-level domain (.fr, .tw) is generally reserved for a country or a dependent territory. If a website is using a specific ccTLD, it suggests that its operator’s intention to target a local audience. That said, registrars have largely relaxed the rules and a lot of ccTLDs can now be registered by non-local businesses and individuals, possibly rendering ccTLDs less relevant.

The co-occurrence matrix between ccTLDs and the actual countries clients are connecting from shows a strong correlation. Most often, websites opting for a country-specific domain are actually serving content for a local audience.

geo-matrix

Looking deeper, we observe very different frequency distributions when comparing ccTLDs, that can be explained by linguistic and cultural factors. Building ccTLD-specific models is thus critical in order to help us decide whether to classify a domain as benign or malicious. Below, I’ll discuss some of the specific models we use. 

The servers’ physical geographic diversity

The number of IP addresses and the stability of the set of IP addresses are important signals when determining whether a domain is likely to be malicious. But, it’s also very common for totally benign domains to also use multiple IPs. This is a common practice for load balancing, redundancy, optimizing latency for a large country, or to take advantage of “elastic” infrastructures.

In the following experiment, we use two training sets of .RU domain names, containing only domain names resolving to more than one IP address. All IP addresses seen over a one-week period were considered. One list contains domains known to be benign and the other list contains domains known to be currently used as infection vectors.

On these two sets, we computed the mean distance between the country’s geographic median and all the physical locations of servers hosting a name.

geo_distance_means

We observed a significantly different skewness. Hosts serving a non-malicious domain tend to be geographically close, whereas a domain serving malware can be served by hosts spread all around the globe.

Looking at the number of distinct physical locations also shows how malware can use a fast flux pattern. Fast Flux is a specific category of domains that take advantage of the fact that the set of IP addresses returned for a domain name is only valid for a limited period of time, over which the domain owner has full control. A botnet operator can leverage this feature to very quickly switch to a different set of hosts in order to serve a malicious payload.

locations_count

While 98% of benign domains having multiple IP addresses are only served by at most 3 datacenters, and show a negligible number of outliers, we see that malicious domains can very quickly hop from one host to another. One of them even scored 867 physical locations!

Locations Domain
867 lafdamow.ru
505 girwysca.ru
443 wascadux.ru
418 ajgijuap.ru
374 jilvoqsi.ru
326 enhawcus.ru
289 taosiram.ru
253 hevlehaw.ru
242 diteqciq.ru
200 vehyfgor.ru
196 zurgovod.ru
185 sepsiqbo.ru
147 nuzejviz.ru
145 etujaqhe.ru
119 marsotrip.ru
103 zazzeqan.ru
103 azvaebyn.ru

 Having hopped to 92 countries, 665 ASNs, 1486 network prefixes and 2780 IP addresses in 7 days, the lafdamow.ru domain name is an obvious outlier that we quickly blocked as malware. As a researcher, outliers like this are almost impressive in their ability to change and move around so rapidly. 

The requester’s geographic profile

Our intuition, confirmed by the co-occurrence matrix above, is that the frequency distribution of countries from which traffic is sent to ccTLD domains is predicable with a good accuracy.

The .RU ccTLD, for example, shows this expected distribution for benign sites:

geodiversity-benign

A vast proportion of queries to .RU domains are coming from Russia, followed by Ukraine and US, other countries being almost uniformly distributed.

However, we observe that malicious .RU domains show a totally different distribution of requester countries. They receive few queries from Russia, Ukraine, Belarus and Kazakhstan, and a vast majority of the queries are coming from the US.

geodiversity-pair

We use the Kolomogorov-Smirnov test to compare these distributions, after discarding the countries presenting a high variance and countries seen in the expected distribution, but not in our observations. In our experiments, the result of this test happens to be a pretty unreliable feature to label a domain as malicious. However, this is an extremely important feature to label a domain as benign, with only 0.02% false positives.

The lexical features 

Domains used to serve malicious content don’t need to have any meaningful content. In fact, not having any meaningful content could even be a strategy to avoid being seen by search engines. 

Our intuition is that ccTLDs and languages are tightly coupled, thus domain names from a specific ccTLD show predictable lexical features. A .RU domain is likely to contain a lot of Russian-specific sequence of characters, that are unlikely to occur in a .CN domain. And sequences of characters that don’t match anything we expect in English can be very frequent in Russian.

For this reason, we built a training set of .RU domain names known to be benign, from which we computed the unigrams to quadrigrams frequency distribution.

We then defined a “DGA score” function, whose output represents how wrong our guess for the next character of a domain name is, considering the 1 to 4 previous characters, based on our reference frequency distribution.

Pseudorandomly-generated domain names are usually easy to distinguish from human-generated, meaningful names. Thus, in the following experiment, we build a set of malicious names known for serving malware, but not part of a botnet leveraging algorithmically-generated names.

This DGA score is computed for a distinct set of domain names known to be clean, and for the list of malicious names.

dga-ru

While the lexical properties of name is far from being sufficient for classifying a domain as malicious or not, we observe that it is still an significant feature to use.

The Umbrella Security Labs is now blocking 80,000,000 malicious, botnet or phishing requests each day. Given the huge variety of malware, it’s clear that there’s no one-size-fits-all model. Our team uses the three models described above to detect ccTLD-specific anomalies. While there is certainly much to gain from the use of these models, we’re relentless in our quest to identify new models and algorithms that can inform us about the likelihood of a domain’s classification.  Those models vary from general to specific, but they’re all contributing to greater protection for our customers. 

Previous Post:

Previous Article

Next Post:

Next Article

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Global Cloud Architecture
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Umbrella
  • Cisco Umbrella Blog

Learn more

  • Webinars
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2023 Cisco Umbrella