• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Free Trial
  • Contact us
  • Blog
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Products
    • Product
      • Cisco Umbrella Cloud Security Service
      • Cisco Umbrella Investigate
      • Product Packages
      • Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Interactive Intelligence
      • Cloud-Delivered Firewall
    •  
    • Webinar signup
  • Solutions
    • By Need
      • Protect Mobile Users
      • Fast Incident Response
      • Web Content Filtering
      • Shadow IT Discovery & App Blocking
      • Unified Threat Enforcement
      • Reduce Security Infections
      • Secure Direct Internet Access
      • Securing Remote and Roaming Users
    • By Network
      • Protect Guest Wi-Fi
      • SD-WAN Security
      • Off-Network Endpoint Security
    • By Industry
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
      • Our Customers
      • Customer Stories
    • Ransomware Defense for Dummies book
  • Why Us
    • Fast Reliable Cloud
      • Cloud Security Infrastructure
      • Cloud Network Status
      • Cloud Network Activity
      • Recursive DNS Services
      • Top Reasons to Trial
      • Getting Started
    • Unmatched Intelligence
      • Cyber Attack Prevention
      • Interactive Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco SD-WAN
    • Navigation-dropdown-promo-free-trial_102820
  • Resources
    • Content Library
      • Top Resources
      • Analyst Reports
      • Case Studies
      • Customer Videos
      • Datasheets
      • eBooks
      • Infographics
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • Cisco Umbrella Blog
      • Latest Posts
      • Security Posts
      • Research Posts
      • Threats Posts
      • Product Posts
      • Spotlight
    • For Customers
      • Support
      • Customer Success Hub
      • Umbrella Deployment Hub
      • Customer Success Webinars
      • What’s New
      • Cisco Umbrella Studio
  • Trends & Threats
    • Market Trends
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
      • Secure Access Service Edge (SASE)
    • Security Threats
      • Ransomware
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
    •  
    • 2020 Cybersecurity trends
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Become a partner
  • Free Trial Signup
  • Umbrella Login
  • Cloudlock Login
  • Contact Us
Research

Using HyperLogLog to Detect Malware Faster Than Ever

By OpenDNS Security Research
Posted on December 5, 2013
Updated on March 5, 2020

Share

Facebook0Tweet0LinkedIn0

Previously, we introduced our real-time API, and Senior Research Scientist Ping Yan recently blogged about how she used it to find Black Friday scams.
The data feed, described in the post mentioned above, is constantly consumed by multiple processors or stream interpreters. In this blog post, we will focus on one processor dedicated to spotting a specific category of suspicious IP addresses.
It is uncommon for an IP address to suddenly have many new domain names map to it, where there was none prior. Of course a hosting service, a load-balancing service, a CDN or a user moving a lot of domains to a new server can follow this pattern, but benign cases are both infrequent and relatively easy to distinguish from suspicious activities.
In our research, we define an IP address as being “dormant” if less than N names mapping to it have been observed in the past 7 days, and as “hyperactive” if more than M names mapping to it have been observed during the past 4 hours.
One stream we generate is a list of recently observed pairs (name, IP address). This stream is a perfect candidate for our task.

{"asn":30962,"name":"dentro.de.","owner":"dentro.de.","rr":"62.108.32.81","server_ip":"82.115.108.50","ts":1386104400,"ttl":3600,"type":"A"}
{"asn":8972,"name":"www.benm.at.","owner":"benm.at.","rr":"80.86.80.177","server_ip":"193.46.215.55","ts":1386104400,"ttl":900,"type":"A"}
{"asn":25847,"name":"model-trains-store.com.","owner":"model-trains-store.com.","rr":"64.64.3.139","server_ip":"64.64.3.136","ts":1386104400,"ttl":14400,"type":"A"}
{"asn":8685,"name":"www.engin.tv.","owner":"engin.tv.","rr":"213.155.113.195","server_ip":"212.58.3.7","ts":1386104400,"ttl":600,"type":"A"}
{"asn":29648,"name":"info-03.surgutneftegas.ru.","owner":"surgutneftegas.ru.","rr":"77.233.191.6","server_ip":"83.149.32.2","ts":1386104400,"ttl":3600,"type":"A"}
{"asn":20485,"name":"info-03.surgutneftegas.ru.","owner":"surgutneftegas.ru.","rr":"62.33.202.6","server_ip":"83.149.32.2","ts":1386104400,"ttl":3600,"type":"A"}
{"asn":3462,"name":"36-233-153-101.dynamic-ip.hinet.net.","owner":"dynamic-ip.hinet.net.","rr":"36.233.153.101","server_ip":"168.95.1.19","ts":1386104400,"ttl":86400,"type":"A"}
{"asn":20773,"name":"www.electronic-thingks.de.","owner":"electronic-thingks.de.","rr":"83.169.26.138","server_ip":"80.237.128.10","ts":1386104400,"ttl":86400,"type":"A"}
{"asn":9198,"name":"89.218.160.130.metro.online.kz.","owner":"metro.online.kz.","rr":"89.218.160.130","server_ip":"212.19.149.53","ts":1386104400,"ttl":86400,"type":"A"}

However, keeping track of all the names observed for all the IPs observed can require quite a lot of memory, especially when all we need is a bunch of counters.
Furthermore, these counters do not have to be accurate. When an IP address becomes “hyperactive,” new names are usually piling up at a very high rate, so the IP will eventually be labeled.
Instead of keeping track of individual domain names that mapped to each IP, we use the HyperLogLog algorithm that we ported to the Rust programming language.
The beauty of this algorithm is that the complexity and memory usage remain constant no matter how many elements are in the set.
Our stream processor keeps an in-memory set of IPs, and for each IP, two HyperLogLog estimators.
The former (“current”) estimates the number of names recently observed for a given IP. The latter (“archive”) estimates the number of names observed more than 4 hours ago.
When a new entry for an IP is read from the stream, we check the age of the “current” estimator. If this estimator has been in use for more than 4 hours, we merge the content of this estimator to the one dedicated to archival and reset the “current” estimator.
Thanks to the HyperLogLog algorithm, merging is a very fast and constant-time operation.
In order to detect hyperactive IPs that recently transitioned from being dormant, the stream processor estimates the cardinality of each IP using the “archive” estimator, then the cardinality of the same IP using the “current” estimator. If the former is below N (which we empirically set to 3) and the latter above or equal to M (currently 10), we print the current cardinality, the name and the IP:

88  5fd40.93taotao.com. 23.104.41.152
52  2l7d9.jjrnp.com.    23.244.38.15
153 14q3f.wzstorm.com.  23.244.38.77
107 shishicaizuiyizhongjiangdewanfa.gzhsfisher.com. 23.235.132.36
71  qo73p.yqhxnhcl.com. 172.246.178.62
95  mianfeiqipaiyouxipingtai.gzaqgy.com.    23.244.57.126
136 35441.dlyjzs.com.   23.244.38.85
46  ppyulechengwangzhandizhishishime.5udate.com.    173.234.231.103
99  ouzhoubeijuesai.axcych58.com.   23.244.57.92
45  gongjihuichengyuan.jjkho.com.   5.226.171.35
12  overlay.ringtonematcher.com.    216.137.55.127
46  i-mhow.com. 141.101.117.162

Sorting recent entries of this new stream yields domain names mapping to the most hyperactive IPs:

    571 sge.su
    553 sxo.su

These domains happen to be currently used by the Caphaw trojan.
Filtering by name patterns and TTLs immediately shows more interesting domains (listed below) being used by the Nuclear exploit pack:

     81 thinkmetal.biz
     46 cosmogift.biz
     37 lightcasa.biz
     36 movieprice.biz
     32 moviehello.biz
     31 timequality.biz
     31 infoobesity.biz
     31 comwin.biz
     30 flypanda.biz
     26 expertsurvey.biz
     20 eurosync.biz
     18 spymac.biz
     18 sharerebel.biz
     16 cybervirtual.biz
     10 drcoupon.biz

These domains can be active for a very short period of time, so blocking them as fast as possible is critical.
To put all this in context, the OpenDNS Security Graph is centered on the concept of being fast, predictive, and adaptive. We want to block malware and botnets before they even manifest themselves as a problem. The real-time API, and the stream processors built on it, allow us to react very quickly, even before the data is recorded in our databases. Sketching algorithms such as HyperLogLog make that possible on big data, with little effort, little hardware, and low latency.

Previous Post:

Previous Article

Next Post:

Next Article

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Cisco Umbrella Blog
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Cisco Umbrella

Learn more

  • Events
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2021 Cisco Umbrella