• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Contact Sales
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Why Us
    • Why Cisco Umbrella
      • Why Try Umbrella
      • Why DNS Security
      • Why Umbrella SASE
      • Our Customers
      • Customer Stories
      • Why Cisco Secure
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Global Cloud Network Activity
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Interactive Intelligence
      • Cyber Attack Prevention
      • Umbrella and Cisco Talos Threat Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco Umbrella and SecureX
  • Products
    • Cisco Umbrella Products
      • Cisco Umbrella Cloud Security Service
      • Recursive DNS Services
      • Cisco Umbrella SIG
      • Umbrella Investigate
      • What’s New
    • Product Packages
      • Cisco Umbrella Package Comparison
      • – DNS Security Essentials Package
      • – DNS Security Advantage Package
      • – SIG Essentials Package
      • – SIG Advantage Package
      • Umbrella Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Cloud Data Loss Prevention (DLP)
      • Cloud-Delivered Firewall
      • Cloud Malware Protection
      • Remote Browser Isolation (RBI)
    • Man on a laptop with headphones on. He is attending a Cisco Umbrella Live Demo
  • Solutions
    • SASE & SSE Solutions
      • Cisco Umbrella SASE
      • Secure Access Service Edge (SASE)
      • What is SASE
      • What is Security Service Edge (SSE)
    • Functionality Solutions
      • Web Content Filtering
      • Secure Direct Internet Access
      • Shadow IT Discovery & App Blocking
      • Fast Incident Response
      • Unified Threat Management
      • Protect Mobile Users
      • Securing Remote and Roaming Users
    • Network Solutions
      • Guest Wi-Fi Security
      • SD-WAN Security
      • Off-Network Endpoint Security
    • Industry Solutions
      • Government and Public Sector Cybersecurity
      • Financial Services Security
      • Cybersecurity for Manufacturing
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Research Reports
      • Case Studies
      • Videos
      • Datasheets
      • eBooks
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • For Customers
      • Support
      • Customer Success Webinars
      • Cisco Umbrella Studio
    • Get the 2022 Cloud Scurity Comparison Guide
  • Trends & Threats
    • Market Trends
      • Hybrid Workforce
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
    • Security Threats
      • How to Stop Phishing Attacks
      • Malware Detection and Protection
      • Ransomware is on the Rise
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Global Cyber Threat Intelligence
      • Cyber Threat Categories and Definitions
    •  
    • Woman connecting confidently to any device anywhere
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Person looking down at laptop. They are connecting and working securely
  • Blog
    • News & Product Posts
      • Latest Posts
      • Products & Services
      • Customer Focus
      • Feature Spotlight
    • Cybersecurity Posts
      • Security
      • Threats
      • Cybersecurity Threat Spotlight
      • Research
    •  
    • Register for a webinar - with illustration of connecting securely to the cloud
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
  • Free Trial
Spotlight

OG-Miner: Data Crawling on Steroids

By Thibault Reuille
Posted on April 4, 2017
Updated on April 10, 2020

Share

FacebookTweetLinkedIn

The Internet moves fast. New websites are created everyday, new articles are shared through blogs or social media, fresh data is served through APIs, emerging threats are repeatedly setup behind bulletproof and ephemeral infrastructures. Monitoring these online activities is part of the daily job of any security researcher and using the appropriate tools is key to keep the amount of investigation work manageable.

At OpenDNS labs, we have been building and using our own custom tools to perform these operations. Today, we are excited to share with you our homemade web crawler / data aggregator called OG-Miner. OG stands for OpenDNS Graph or Open Graphiti (or even Original Gangsta :p ). It is one of our main internal projects and acts as a central part of many analysis and backend processes. Find it on github here.

About

A little bit of backstory, when we first started working on the OG-Miner several key design features needed to be considered. First of all we were sitting at the top of a huge knowledge base – our “OpenDNS Security Graph” – built from our authoritative and recursive DNS logs. Each piece of data can be accessed through our powerful Investigate API and allows security researchers to retrieve network metrics and features coming from our statistical models. At some point it became clear to us that we had to step back and start studying the topology of the connections built by our algorithms and therefore work with local graphs inside this immense Security Graph. We had to start mining our API data to understand its intrinsic structure.

We quickly realized that one of the most practical and simplest ways to crawl our graph data was to implement a customizable Breadth First Traversal. Even if you’re not familiar with graph theory the intuition is pretty straightforward: you start from a given vertex, you explore the neighbors of this node, the neighbors of the neighbors and so on… In essence you iterate on the graph by levels (i.e. depth).

Combined with the adequate graph visualization tool (see OpenGraphiti), the result was a multitude of beautiful new graph datasets we were eager to decrypt and analyze (Example below). Data visualization is certainly helpful for research but turned out to be a most interesting weapon for our engineering, sales, and marketing department. For the first time, explaining the true nature of our security data and statistical models was a transparent process.

Ex: Infrastructure graph extracted from opendns.com (orange node) with a depth of 3

Security graph view of OpenDNS.com

For us this was a new perspective on our intelligence platform. But our data being mostly DNS-centric, the picture was not really complete without enriching our datasets with external APIs or libraries. There are plenty of other great data analytics out there and it would be a shame not to use them. This is the exact moment when we realized we had to go for a modular design to expand the mining capabilities or our tool. Indeed, you can integrate your own plugins to aggregate data, mine APIs or even apply any other sort of computation to your process (Ex: Local libraries, ML/Statistical models, Application monitoring etc.). The current default installation includes plugins for the OpenDNS Investigate API, VirusTotal, Shodan, MaxMind GeoIP, Selenium, HTTP, SSL, DNS and Whois and can very simply be extended.

After experimenting and playing with various ideas, we came up with a couple of new fresh innovative detection models and our engine had to be scalable and automatable to fit in a large scale real-time data processing pipeline. We opted for several modern technologies such as ZeroMQ & MongoDB and implemented some graph processing cooperation logic. This way, we were able to build a cluster of graph miners (crawlers / aggregators / processors / explorers…) working in harmony on the same central persistent graph without running into any collision or synchronization issues.

These detection models have been running very well for a while now constantly finding and blocking freshly discovered domains. Our engine has reached maturity and we believe the security scene will greatly benefit from this new security tool. We are happy to announce the official opensource release of this fancy new project for the Kaspersky Security Analyst Summit 2017 taking place in the beautiful island of Saint Martin in the Caribbean.

Previous Post:

Previous Article

Next Post:

Next Article

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Global Cloud Architecture
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Umbrella
  • Cisco Umbrella Blog

Learn more

  • Webinars
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2023 Cisco Umbrella