• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Contact Sales
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Why Us
    • Why Cisco Umbrella
      • Why Try Umbrella
      • Why DNS Security
      • Why Umbrella SASE
      • Our Customers
      • Customer Stories
      • Why Cisco Secure
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Global Cloud Network Activity
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Interactive Intelligence
      • Cyber Attack Prevention
      • Umbrella and Cisco Talos Threat Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco Umbrella and SecureX
  • Products
    • Cisco Umbrella Products
      • Cisco Umbrella Cloud Security Service
      • Recursive DNS Services
      • Cisco Umbrella SIG
      • Umbrella Investigate
      • What’s New
    • Product Packages
      • Cisco Umbrella Package Comparison
      • – DNS Security Essentials Package
      • – DNS Security Advantage Package
      • – SIG Essentials Package
      • – SIG Advantage Package
      • Umbrella Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Cloud Data Loss Prevention (DLP)
      • Cloud-Delivered Firewall
      • Cloud Malware Protection
      • Remote Browser Isolation (RBI)
    • Man on a laptop with headphones on. He is attending a Cisco Umbrella Live Demo
  • Solutions
    • SASE & SSE Solutions
      • Cisco Umbrella SASE
      • Secure Access Service Edge (SASE)
      • What is SASE
      • What is Security Service Edge (SSE)
    • Functionality Solutions
      • Web Content Filtering
      • Secure Direct Internet Access
      • Shadow IT Discovery & App Blocking
      • Fast Incident Response
      • Unified Threat Management
      • Protect Mobile Users
      • Securing Remote and Roaming Users
    • Network Solutions
      • Guest Wi-Fi Security
      • SD-WAN Security
      • Off-Network Endpoint Security
    • Industry Solutions
      • Government and Public Sector Cybersecurity
      • Financial Services Security
      • Cybersecurity for Manufacturing
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Research Reports
      • Case Studies
      • Videos
      • Datasheets
      • eBooks
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • For Customers
      • Support
      • Customer Success Webinars
      • Cisco Umbrella Studio
    • Get the 2022 Cloud Scurity Comparison Guide
  • Trends & Threats
    • Market Trends
      • Hybrid Workforce
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
    • Security Threats
      • How to Stop Phishing Attacks
      • Malware Detection and Protection
      • Ransomware is on the Rise
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Global Cyber Threat Intelligence
      • Cyber Threat Categories and Definitions
    •  
    • Woman connecting confidently to any device anywhere
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Person looking down at laptop. They are connecting and working securely
  • Blog
    • News & Product Posts
      • Latest Posts
      • Products & Services
      • Customer Focus
      • Feature Spotlight
    • Cybersecurity Posts
      • Security
      • Threats
      • Cybersecurity Threat Spotlight
      • Research
    •  
    • Register for a webinar - with illustration of connecting securely to the cloud
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
  • Free Trial
Products & Services

How the infrastructure behind the OpenDNS global network powers Umbrella reporting

By Adam Phelps
Posted on March 19, 2013
Updated on August 31, 2021

Share

FacebookTweetLinkedIn

The Umbrella Security Labs research team has been sharing frequently about how they’re leveraging Big Data to predict unknown threats. And, the OpenDNS and Umbrella product teams have been working to improve the quality and speed of reports in our user Dashboards. The foundation of each of those discussions is how our infrastructure team handles the data itself. This post is an exploration of how OpenDNS handles the massive amounts of data we process daily, without downtime or performance impact, and what it means for the reporting in the Umbrella Dashboard.

On an average weekday the DNS resolvers we run at OpenDNS process more than 50 billion queries originating from over 50 million individual IP addresses. These queries are directed at our anycast IPs (208.67.220.220 and 208.67.222.222), which get routed to one of our 20 data centers, and from there to one of our 80 individual resolver hosts. All combined, this results in a huge amount of data. That data must somehow be processed and aggregated to produce the reports available on our users’ Dashboards, and sampled by Umbrella Security Labs who analyze the data to detect and predict security threats.

Each query that we log on our resolvers includes the domain being queried, the query’s originating IP, appropriate customer IDs associated with it, and how the query was handled (i.e. whether it was handled normally or blocked due to malware or phishing).  These log entries average around 115 bytes each. Our system produces between 50MB/s and 90MB/s, for a total of over 5TB of raw data produced every day.

loaders

When I started working at OpenDNS in July of 2010 we were only receiving around half the queries we do now, but the analysis system in use then had been built years earlier. It was barely keeping up. It clearly wasn’t going to scale much further. Like many other companies that have delved into “Big Data” analysis, we built a Hadoop-based infrastructure to replace the previous installation, and it went live in early 2011.

Stats (2)

Our production Hadoop cluster is composed of roughly 30 (and rapidly growing!) heavy-duty Ubuntu servers. Of these, one is the active Namenode, the cluster’s coordinator and Hadoop’s only single point of failure. To alleviate this risk, we synchronously replicate the Hadoop metadata on this machine to a spare Namenode via DRDB. That allows us to rapidly failover to the spare machine if necessary, and provide a means to update the Namenode without significant downtime. The remaining worker nodes are basically interchangeable and new ones can easily be added as demand increases.

On our resolvers the query logging is configured to roll over every 4MB, which results in a new log file every few seconds on each resolver. Sitting between our Hadoop cluster and the resolvers is a set of “loader” machines which pull in these completed log files, combine them into larger chunks, compress them using LZO compression, and finally push them to the Hadoop Distributed File System (HDFS).  HDFS replicates data across multiple nodes for redundancy (in our case there are three copies of all data in HDFS), and as the compressed resolver logs for a day currently average 1.1TB, this allows our system’s ~300TB of storage to hold 3 months of raw data.

Stats flow (1)

Once all the logs for a given hour have been received, our system launches a variety of Map/Reduce jobs to crunch that data and generate aggregate data for our customer-facing reports. The output of these jobs are loaded into HBASE (a distributed No-SQL database built on top of HDFS) for future access. When a customer visits the reporting page of the OpenDNS Dashboard, the Dashboard sends a query to one of several Appservers which query HBASE for the time frame requested, sum up the data, and return it to the Dashboard for presentation.

In order to provide cluster-level failure handling, and be able to upgrade the production cluster while continuing to return reports to the Dashboard, we have an additional cluster which does nothing but provide a live backup of our HBASE data. If the production cluster has failures, or needs to be taken offline, our Appservers can quickly switch to querying the backup HBASE. As an additional level of paranoia, we also take a nightly snapshot of this backup HBASE which we copy to a machine outside of the Hadoop cluster.

And that’s the mile-high overview of the infrastructure we use to analyze the massive amount of data our system produces. Our infrastructure is also rapidly evolving as we expand our security research team, develop new reports, and work towards providing real-time analysis and reporting capabilities. If you enjoyed reading about how the OpenDNS Global Network’s infrastructure powers Umbrella’s domain name system security, I encourage you to read more in our recent whitepaper. The paper takes a look at how Umbrella Security Labs is harnessing the massive data sets mentioned above to predict future threat origins.

Previous Post:

Previous Article

Next Post:

Next Article

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Global Cloud Architecture
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Umbrella
  • Cisco Umbrella Blog

Learn more

  • Webinars
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2023 Cisco Umbrella