• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Contact Sales
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Why Us
    • Why Cisco Umbrella
      • Why Try Umbrella
      • Why DNS Security
      • Why Umbrella SASE
      • Our Customers
      • Customer Stories
      • Why Cisco Secure
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Global Cloud Network Activity
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Interactive Intelligence
      • Cyber Attack Prevention
      • Umbrella and Cisco Talos Threat Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco Umbrella and SecureX
  • Products
    • Cisco Umbrella Products
      • Cisco Umbrella Cloud Security Service
      • Recursive DNS Services
      • Cisco Umbrella SIG
      • Umbrella Investigate
      • What’s New
    • Product Packages
      • Cisco Umbrella Package Comparison
      • – DNS Security Essentials Package
      • – DNS Security Advantage Package
      • – SIG Essentials Package
      • – SIG Advantage Package
      • Umbrella Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Cloud Data Loss Prevention (DLP)
      • Cloud-Delivered Firewall
      • Cloud Malware Protection
      • Remote Browser Isolation (RBI)
    • Man on a laptop with headphones on. He is attending a Cisco Umbrella Live Demo
  • Solutions
    • SASE & SSE Solutions
      • Cisco Umbrella SASE
      • Secure Access Service Edge (SASE)
      • What is SASE
      • What is Security Service Edge (SSE)
    • Functionality Solutions
      • Web Content Filtering
      • Secure Direct Internet Access
      • Shadow IT Discovery & App Blocking
      • Fast Incident Response
      • Unified Threat Management
      • Protect Mobile Users
      • Securing Remote and Roaming Users
    • Network Solutions
      • Guest Wi-Fi Security
      • SD-WAN Security
      • Off-Network Endpoint Security
    • Industry Solutions
      • Government and Public Sector Cybersecurity
      • Financial Services Security
        • – FTC Safeguards Rule Compliance 2023
      • Cybersecurity for Manufacturing
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Research Reports
      • Case Studies
      • Videos
      • Datasheets
      • eBooks
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • Security Definitions
      • What is Secure Access Service Edge (SASE)
      • What is Security Service Edge (SSE)
      • What is a Cloud Access Security Broker (CASB)
      • Cyber Threat Categories and Definitions
    • For Customers
      • Support
      • Customer Success Webinars
      • Cisco Umbrella Studio
      • Free Trial Help and Tips
  • Trends & Threats
    • Market Trends
      • Hybrid Workforce
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
    • Security Threats
      • How to Stop Phishing Attacks
      • Malware Detection and Protection
      • Ransomware is on the Rise
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Global Cyber Threat Intelligence
    •  
    • Woman connecting confidently to any device anywhere
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Person looking down at laptop. They are connecting and working securely
  • Blog
    • News & Product Posts
      • Latest Posts
      • Products & Services
      • Customer Focus
      • Feature Spotlight
    • Cybersecurity Posts
      • Security
      • Threats
      • Cybersecurity Threat Spotlight
      • Research
    •  
    • Register for a webinar - with illustration of connecting securely to the cloud
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
  • Free Trial
Research

Using HyperLogLog to Detect Malware Faster Than Ever

Author avatar of Security Research TeamSecurity Research Team
Updated — March 5, 2020 • 4 minute read
View blog >

Previously, we introduced our real-time API, and Senior Research Scientist Ping Yan recently blogged about how she used it to find Black Friday scams.
The data feed, described in the post mentioned above, is constantly consumed by multiple processors or stream interpreters. In this blog post, we will focus on one processor dedicated to spotting a specific category of suspicious IP addresses.
It is uncommon for an IP address to suddenly have many new domain names map to it, where there was none prior. Of course a hosting service, a load-balancing service, a CDN or a user moving a lot of domains to a new server can follow this pattern, but benign cases are both infrequent and relatively easy to distinguish from suspicious activities.
In our research, we define an IP address as being “dormant” if less than N names mapping to it have been observed in the past 7 days, and as “hyperactive” if more than M names mapping to it have been observed during the past 4 hours.
One stream we generate is a list of recently observed pairs (name, IP address). This stream is a perfect candidate for our task.

{"asn":30962,"name":"dentro.de.","owner":"dentro.de.","rr":"62.108.32.81","server_ip":"82.115.108.50","ts":1386104400,"ttl":3600,"type":"A"}
{"asn":8972,"name":"www.benm.at.","owner":"benm.at.","rr":"80.86.80.177","server_ip":"193.46.215.55","ts":1386104400,"ttl":900,"type":"A"}
{"asn":25847,"name":"model-trains-store.com.","owner":"model-trains-store.com.","rr":"64.64.3.139","server_ip":"64.64.3.136","ts":1386104400,"ttl":14400,"type":"A"}
{"asn":8685,"name":"www.engin.tv.","owner":"engin.tv.","rr":"213.155.113.195","server_ip":"212.58.3.7","ts":1386104400,"ttl":600,"type":"A"}
{"asn":29648,"name":"info-03.surgutneftegas.ru.","owner":"surgutneftegas.ru.","rr":"77.233.191.6","server_ip":"83.149.32.2","ts":1386104400,"ttl":3600,"type":"A"}
{"asn":20485,"name":"info-03.surgutneftegas.ru.","owner":"surgutneftegas.ru.","rr":"62.33.202.6","server_ip":"83.149.32.2","ts":1386104400,"ttl":3600,"type":"A"}
{"asn":3462,"name":"36-233-153-101.dynamic-ip.hinet.net.","owner":"dynamic-ip.hinet.net.","rr":"36.233.153.101","server_ip":"168.95.1.19","ts":1386104400,"ttl":86400,"type":"A"}
{"asn":20773,"name":"www.electronic-thingks.de.","owner":"electronic-thingks.de.","rr":"83.169.26.138","server_ip":"80.237.128.10","ts":1386104400,"ttl":86400,"type":"A"}
{"asn":9198,"name":"89.218.160.130.metro.online.kz.","owner":"metro.online.kz.","rr":"89.218.160.130","server_ip":"212.19.149.53","ts":1386104400,"ttl":86400,"type":"A"}

However, keeping track of all the names observed for all the IPs observed can require quite a lot of memory, especially when all we need is a bunch of counters.
Furthermore, these counters do not have to be accurate. When an IP address becomes “hyperactive,” new names are usually piling up at a very high rate, so the IP will eventually be labeled.
Instead of keeping track of individual domain names that mapped to each IP, we use the HyperLogLog algorithm that we ported to the Rust programming language.
The beauty of this algorithm is that the complexity and memory usage remain constant no matter how many elements are in the set.
Our stream processor keeps an in-memory set of IPs, and for each IP, two HyperLogLog estimators.
The former (“current”) estimates the number of names recently observed for a given IP. The latter (“archive”) estimates the number of names observed more than 4 hours ago.
When a new entry for an IP is read from the stream, we check the age of the “current” estimator. If this estimator has been in use for more than 4 hours, we merge the content of this estimator to the one dedicated to archival and reset the “current” estimator.
Thanks to the HyperLogLog algorithm, merging is a very fast and constant-time operation.
In order to detect hyperactive IPs that recently transitioned from being dormant, the stream processor estimates the cardinality of each IP using the “archive” estimator, then the cardinality of the same IP using the “current” estimator. If the former is below N (which we empirically set to 3) and the latter above or equal to M (currently 10), we print the current cardinality, the name and the IP:

88  5fd40.93taotao.com. 23.104.41.152
52  2l7d9.jjrnp.com.    23.244.38.15
153 14q3f.wzstorm.com.  23.244.38.77
107 shishicaizuiyizhongjiangdewanfa.gzhsfisher.com. 23.235.132.36
71  qo73p.yqhxnhcl.com. 172.246.178.62
95  mianfeiqipaiyouxipingtai.gzaqgy.com.    23.244.57.126
136 35441.dlyjzs.com.   23.244.38.85
46  ppyulechengwangzhandizhishishime.5udate.com.    173.234.231.103
99  ouzhoubeijuesai.axcych58.com.   23.244.57.92
45  gongjihuichengyuan.jjkho.com.   5.226.171.35
12  overlay.ringtonematcher.com.    216.137.55.127
46  i-mhow.com. 141.101.117.162

Sorting recent entries of this new stream yields domain names mapping to the most hyperactive IPs:

    571 sge.su
    553 sxo.su

These domains happen to be currently used by the Caphaw trojan.
Filtering by name patterns and TTLs immediately shows more interesting domains (listed below) being used by the Nuclear exploit pack:

     81 thinkmetal.biz
     46 cosmogift.biz
     37 lightcasa.biz
     36 movieprice.biz
     32 moviehello.biz
     31 timequality.biz
     31 infoobesity.biz
     31 comwin.biz
     30 flypanda.biz
     26 expertsurvey.biz
     20 eurosync.biz
     18 spymac.biz
     18 sharerebel.biz
     16 cybervirtual.biz
     10 drcoupon.biz

These domains can be active for a very short period of time, so blocking them as fast as possible is critical.
To put all this in context, the OpenDNS Security Graph is centered on the concept of being fast, predictive, and adaptive. We want to block malware and botnets before they even manifest themselves as a problem. The real-time API, and the stream processors built on it, allow us to react very quickly, even before the data is recorded in our databases. Sketching algorithms such as HyperLogLog make that possible on big data, with little effort, little hardware, and low latency.

Suggested Blogs

  • Cloud Application Security – Risks, Questions, Insights, and Solutions July 1, 2021 3 minute read
  • Cisco Umbrella discovers evolving, complex cyberthreats in first half of 2020 August 18, 2020 6 minute read
  • New research shows consumers want cybersecurity from service providers July 7, 2020 4 minute read

Share this blog

FacebookTweetLinkedIn

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Global Cloud Architecture
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Umbrella
  • Cisco Umbrella Blog

Learn more

  • Webinars
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2023 Cisco Umbrella