Recently we have been hearing from SOC analysts and researchers that they have been struggling to evaluate the riskiness of domains seen in their environments. Regardless of vertical, we have heard a common theme loud and clear: there is a need to simplify risk indicators to enable security professionals to make better decisions.
As researchers at Cisco Umbrella, we are always looking for ways to summarize and simplify the most essential information for our customers. We browse through hundreds of DNS logs daily and we see first hand the threats that organizations face. When it comes to attacks we know that time is of the essence and that’s why we are really excited to announce Cisco Umbrella Investigate’s new and improved risk score. Now analysts can get the context needed to understand what factors contribute to the domain’s score resulting in deeper visibility, faster triage and better decision making.
As a cloud security solution with a global presence and rich threat intelligence, we’ve gleaned insights and consolidated them into building blocks that will assist a research analyst, SOC team, or anyone trying to quickly determine risk factors for a particular domain.
How we identify risky domains
As researchers, we are constantly keeping tabs on the latest threats in ransomware, malvertising, and many other threats our customers face. And while threats constantly change, some methods used by attackers are automated and reused. This is a good thing because it makes it easier for us to pick up these patterns.
For example, when writing some of our in-house algorithms to detect phishing domains we see threat actors returning to common themes. We look at the lexical structure of a domain name and ask: “are there any major brand names being spoofed” or “does the domain contain click-bait catchphrases, is this a scam, spam?” Other times, we want to know if the domain has any lexical structures similar to known malicious domain names that we’ve seen in the past (regardless of keywords or language).
But the lexical characteristics of a domain name only gets you so far. So sometimes we want to include behavioral components such as, how much traffic a particular site is getting and how it compares to other abused domain names. From there, we can assess: “is this the same number of requests, from the same region, to a compromised website?” And given that every domain has a top level domain (TLD) we can also compare the reputation of a domain across different TLDs to identify abuse.
As researchers, we have found that the new components of the risk score help answer these types of critical questions within a matter of seconds.
Introducing the new and improved Investigate risk score
Let’s take a look at the new risk score enhancements. As you can see in the top left of the image below, the Investigate risk score is synthesized into one overall score much like a credit score. Just as a credit score includes subscores such as account balances, lending history, debt ratios, and other components the Investigate risk score now includes several new subscores as well.
These new subscores include an emphasis on the lexical characteristics of domain names along with some key behavioral components.
Below the Investigate risk score is a small drop-down widget that can be expanded once clicked. Users that want to dig deeper can do so by expanding the subscores widget. Each score is normalized between 0 and 100, with 100 indicating the highest risk. Since each subscore conveys something unique, our research team has identified score values that represent low, medium, or high risk. For example, a geo popularity subscore greater than 80 is known as high risk. Similar thresholds have been defined for each subscore.
Now let’s take a closer look at each of these subscores.
Geo popularity subscore
With our global data centers, we have unparalleled visibility into DNS requests made by clients around the world. For example, we are able to analyze: is a domain getting requests from a country where it typically doesn’t? Did the number of countries requesting a domain suddenly change? Is this domain part of a geographically targeted attack? By analyzing requests to domains across all countries in the world, patterns emerge that allow us to detect anomalous behavior and identify increased risk.
By looking at popularity per country, we can get more context: for example, the CBC is popular in Canada but not Germany, and the BBC is popular in the UK but not Morocco. Neither of these facts is terribly surprising, but in some situations monitoring this type of behavior can indicate malicious activity. For example, as the visual above demonstrates we may see a US based domain with a sudden spike of unusually high traffic from Eastern Europe. This could be an indicator of risky activity. We have also seen this model unveil targeted attacks such as DNS tunneling.
Many phishing attacks still try to take advantage of the weakest link in the security chain: people. If a domain name contains words related to legitimate companies and services, it is more likely that people will be tricked into clicking. This style of social engineering continues to be a common technique of attackers, and we have built a model around detecting when domains are pretending to be something they are not. Similar to spam scanners looking for keywords such as “click here” or “account suspended” – we have taken some of these algorithms and used these types of ideas in the keyword subscore.
This subscore was inspired while we were looking through the data being generated by our newly seen domains system and observing that many domains that are clearly phishing were going undetected. Naively, we could identify these threats by adding patterns to detection lists, but this quickly becomes a maintenance nightmare. Instead, we treat the domain name like the text of an email, and use a modified spam engine to detect the phish.
A mainstay of communication between infected machines and command and control servers are domains created using domain generation algorithms (DGAs). DGAs avoid the need to embed the location of control servers directly in the malware, and enable the attackers to regain control of their botnets even in the face of sinkholing and takedowns. Using a generalization of the method behind the Cisco Umbrella DGA score, we can now more reliably predict when hostnames were generated in this fashion, without having to rely on the usual laborious methods of teaching the models what lexical patterns to look for.
Surprisingly, the idea for this subscore came from image recognition, which has recently been making incredible progress. Computers can learn to identify objects, regardless of where they appear in a picture, or which way they’re pointing. Similarly, we used a deep learning model to learn how to distinguish malicious domain names based on the presence or absence of specific character combinations, regardless of where or how they appear in the domain name. By curating over a million domains to train the model, we now reach into that memory to perform real-time searches for lexical clues indicating risk.
Top Level Domain (TLD) subscore
Not all top level domains are created equal. Over the years, the TLDs of choice for spammers and other criminals have changed according to factors such as cost, ease of batch registration, verification of registrant identity, and abuse complaint policies.
For example, we are all familiar with common domain names like cisco.com. But what about cisco.info? Or cisco.icu? While subtle, the reputation of each domain variation hinges on the TLD. From a sample of global traffic we can distinguish active TLDs (with many unique domains) and tease out those TLDs with the most abuse. With insight into TLD popularity, this score can determine what proportion of abuse is severe, adding a rich behavioral aspect to our TLD score that is unique to Cisco Umbrella customers.
Interested in checking out our enhanced Risk Score capability? Contact us today for a demo or free trial for Cisco Umbrella Investigate.
If you are currently an Investigate customer, you already have access to our enhanced Risk Score and the new subscore indicators.