• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Contact Sales
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Why Us
    • Why Cisco Umbrella
      • Why Try Umbrella
      • Why DNS Security
      • Why Umbrella SASE
      • Our Customers
      • Customer Stories
      • Why Cisco Secure
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Global Cloud Network Activity
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Interactive Intelligence
      • Cyber Attack Prevention
      • Umbrella and Cisco Talos Threat Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco Umbrella and SecureX
  • Products
    • Cisco Umbrella Products
      • Cisco Umbrella Cloud Security Service
      • Recursive DNS Services
      • Cisco Umbrella SIG
      • Umbrella Investigate
      • What’s New
    • Product Packages
      • Cisco Umbrella Package Comparison
      • – DNS Security Essentials Package
      • – DNS Security Advantage Package
      • – SIG Essentials Package
      • – SIG Advantage Package
      • Umbrella Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Cloud Data Loss Prevention (DLP)
      • Cloud-Delivered Firewall
      • Cloud Malware Protection
      • Remote Browser Isolation (RBI)
    • Man on a laptop with headphones on. He is attending a Cisco Umbrella Live Demo
  • Solutions
    • SASE & SSE Solutions
      • Cisco Umbrella SASE
      • Secure Access Service Edge (SASE)
      • What is SASE
      • What is Security Service Edge (SSE)
    • Functionality Solutions
      • Web Content Filtering
      • Secure Direct Internet Access
      • Shadow IT Discovery & App Blocking
      • Fast Incident Response
      • Unified Threat Management
      • Protect Mobile Users
      • Securing Remote and Roaming Users
    • Network Solutions
      • Guest Wi-Fi Security
      • SD-WAN Security
      • Off-Network Endpoint Security
    • Industry Solutions
      • Government and Public Sector Cybersecurity
      • Financial Services Security
      • Cybersecurity for Manufacturing
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Research Reports
      • Case Studies
      • Videos
      • Datasheets
      • eBooks
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • Security Definitions
      • What is Secure Access Service Edge (SASE)
      • What is Security Service Edge (SSE)
      • What is a Cloud Access Security Broker (CASB)
      • Cyber Threat Categories and Definitions
    • For Customers
      • Support
      • Customer Success Webinars
      • Cisco Umbrella Studio
  • Trends & Threats
    • Market Trends
      • Hybrid Workforce
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
    • Security Threats
      • How to Stop Phishing Attacks
      • Malware Detection and Protection
      • Ransomware is on the Rise
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Global Cyber Threat Intelligence
    •  
    • Woman connecting confidently to any device anywhere
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Person looking down at laptop. They are connecting and working securely
  • Blog
    • News & Product Posts
      • Latest Posts
      • Products & Services
      • Customer Focus
      • Feature Spotlight
    • Cybersecurity Posts
      • Security
      • Threats
      • Cybersecurity Threat Spotlight
      • Research
    •  
    • Register for a webinar - with illustration of connecting securely to the cloud
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
  • Free Trial
Research

Elasticsearch: You Know, For Logs

Author avatar of UmbrellaEngineeringUmbrellaEngineering
Updated — May 27, 2020 • 9 minute read
View blog >

The data platform team at OpenDNS is always looking at new technologies to improve our real time search platform. Consequently, we have been keeping a close eye on Elasticsearch for quite a while, and even use it for some internal tools and metrics.

OpenDNS is now looking at using Elasticsearch as a real time search engine for our DNS log data. OpenDNS needs a powerful real time logging and search platform for several reasons. First and foremost is for our customers. Our customers need to be able to identify malicious activity on their networks as it is happening so they can respond promptly. Any time spent waiting for the data to come in is time that infections could be spreading or attacks could be gaining momentum. Similarly, we at OpenDNS use this data to monitor our own systems using several different metrics. If something goes wrong, we need to know right away so we can fix the problem before it propagates. Overall, getting data in real time means that the people monitoring this data can react in real time. This reaction time could mean the difference between a minor headache and a catastrophic problem.

For Elasticsearch to solve this problem, it not only has to be real-time, but scalable and manageable as well. OpenDNS is growing quickly, so we need a system that can grow with us without introducing technical debt. Also, our engineers don’t like getting paged on Sunday at 3 a.m., so we need a system that can deal with failures automatically without missing a beat.

Part 1: Introduction and Setup

This blog post is the first in a series that will focus on Elasticsearch and how to optimize it for log data. For information on other Elasticsearch products, including their recommended real time logging stack “ELK,” vist https://www.elastic.co/products.

Furthermore, this series will mainly show examples using Elasticsearch’s REST API, because it is simple and easy to use. With that said, Elasticsearch supports several client languages, listed here.

What is Elasticsearch?

Elasticsearch is a highly scalable search platform based on Apache Lucene. It is built from the ground up for the cloud and supports distributed indices and multitenancy. Since its release in 2010, Elasticsearch has gained many notable users and remains a very active project at Elastic under its creator Shay Banon.

Getting started with Elasticsearch

The Elasticsearch website has great documentation to walk users through installation.
Elasticsearch was designed to be distributed, so to demonstrate its full functionality it is important to set up a cluster of at least three nodes. Creating a three node cluster should be as simple as running three Elasticsearch instances with the same cluster name. The “cluster.name” variable is found in the main Elasticsearch configuration file “elasticsearch.yml” in the “config” folder. By default a three-node cluster will include two data nodes along with one elected master node.

Once Elasticsearch is installed and the cluster is connected, users need a way of visualizing their cluster. Elasticsearch supports a plugin called “Head” that will to exactly this, plus a little extra.

To install Head simply run this command from the ‘elasticsearch’ folder:

bin/plugin --install mobz/elasticsearch-head

Once installed, it can be accessed through

http://<hostname>:9200/_plugin/head/
Screen Shot 2015-05-05 at 2.46.10 PM

Head gives an intuitive view of the indices and shards of an Elasticsearch cluster as well as the ability to easily browse and search through its documents. When head is first opened, it should look something like the following:

This image shows a simple Elasticsearch cluster with three nodes: two data nodes indicated by the black circles and one master, indicated by the black star. On the right is an index named “dns-test-1.” Each green box represents a “Shard,” which are Lucene Inverted Indices under the hood. By default an Elasticsearch index has five shards, each with one replica. Primary shards are shown with emboldened borders and replicas are shown with light borders.

Elasticsearch is designed to work straight out of the box, no need to worry about schemas or creating indices yet. As long as each document is given a type and an index name, it will index the document.

Example:

curl -XPOST 'http://localhost:9200/logs/log' -d '{
    "Timestamp" : "2009-11-15T14:12:12",
    "URL" : "opendns.com",
    "IP Address" : "127.0.0.1",
   "log_id" : 1
}'

Sending a simple index request will automatically create an index called “logs” and index the given document. Elasticsearch will also automatically guess the type of each field and index the doc. Elasticsearch is pretty smart. If formatted properly the “Timestamp” field will default to the “date” data type and the “log_id” field will default to type “int.” Though trickier fields such as “IP Address”  will default to just a string, even though Elasticsearch does have a native “IP” data type. The documents themselves are indexed and stored in JSON format.

Note that once the data type for a field is set, it cannot be changed. For example if the following document:

curl -XPOST 'http://localhost:9200/logs/log' -d '{
    "Timestamp" : "2009-11-15T14:12:12",
    "URL" : "opendns.com/enterprise-security",
    "IP Address" : "127.0.0.1",
    "log_id" : "abcd"
}'

is indexed after the previous one, the following exception is thrown:

RemoteTransportException[[Reptyl][inet[/10.70.99.146:9301]][indices:data/write/index]]; nested: MapperParsingException[failed to parse [log_id]]; nested: NumberFormatException[For input string: "abcd"];

Mappings

After an initial cluster is set up, an important step is to create a type mapping for the documents being indexed. The data type and other settings of each field for a specific document type are all stored in a type mapping which is configured using the Elasticsearch Put Mapping APl. Although a default mapping will be created by Elasticsearch when a new document is indexed, the default data types are often too general. For example, any field containing only an integer will be defaulted to type “long.”

In this case, manually setting it to type “integer” might better represent the data and simultaneously save some storage space. Also, any “string” fields need to be set to “not analyzed.” By default Elasticsearch will tokenize all string fields with an analyzer. This functionality is mainly implemented to support full text search and document scoring. For log data in general, queries are match only, so none of this is important. Setting the “index” field to “not_analyzed” ensures Elasticsearch won’t waste time unnecessarily tokenizing every string field.

Before creating a mapping, any default mapping created by Elasticsearch should be deleted in order to avoid conflicts. This will also delete any documents that have been indexed into this default mapping, so be careful. Mappings are deleted with the following command:

curl -XDELETE 'http://localhost:9200/logs/log/_mapping'

Here is an example of a put mapping command that specifies a simple mapping  for documents with type “log” belonging to the index “logs” with three fields and their datatypes:

$ curl -XPUT 'http://localhost:9200/logs/_mapping/log' -d '
{
    "log" : {
        "properties" : {
           "Timestamp" : {"type" : "date"},
           "URL" : {"type" : "string",
                      "index": "not_analyzed"},
           "IP Address" : {"type": "ip"},
           "log_id" : {"type" : "integer"}
        }
    }
}'

Once a mapping is set, it is important to note that indexing documents matching the mapping type, but with fields not included in the mapping will add said fields to the mapping.

For example, if, after applying the previous mapping, I tried to submit the following index request:

curl -XPOST 'http://localhost:9200/logs/log' -d '{
    "Timestamp" : "2009-11-15T14:12:12",
    "URL" : "opendns.com/enterprise-security",
    "IP Address" : "127.0.0.1"
   "log_id" : "2",
   "user" : "John"
}'

The “user” field with type “string” would be added to the mapping automatically.

Document IDs

If unspecified, Elasticsearch will simply generate an ID for each document. This works fine in some cases, but often the user needs to be able to add their own ids.

In the most simple case, a document ID can be added to an index request itself as in the following:

curl -XPUT 'http://localhost:9200/logs/log/37' -d '{
    "Timestamp" : "2009-11-15T14:12:12",
    "URL" : "opendns.com/enterprise-security",
    "IP Address" : "127.0.0.1"
}'

Simply change the request to “XPUT” and tack on the ID to the end of the URL.
Alternatively, a field can be added to the mapping along with a specified path to pull the ID from the document itself.

The following mapping will tell Elasticsearch to use the “log_id” field as the document ID:

$ curl -XPUT 'http://localhost:9200/logs/_mapping/log' -d '
{
   "log" : {
         "_id" : {"path" : "log_id"},
         "properties" : {
               "Timestamp" : {"type" : "date"},
               "URL" : {"type" : "string",
                           "index": "not_analyzed"},
               "IP Address" : {"type": "ip"},
               "log_id" : {"type" : "integer"}
       }
    }
}'

Index Schema and Templates

With a large amount of data coming in every day, it is important to have a comprehensive way of partitioning the data into Elasticsearch. For log data, it is often intuitive to partition the data into indices based on a time interval such as daily or hourly. Partitioning data in this way comes with several advantages. For one, data expiration becomes very easy.

Instead of relying on a TTL or other expiration methods, old indices can simply be deleted altogether. Another advantage comes when the data is queried. If a query is only looking for documents from a certain time period, it can be limited to fewer indices instead of having to query an entire cluster. This index schema is especially advantageous in the real time search use case. Since the most recent index will likely be receiving the majority of the traffic, Elasticsearch will maintain a larger cache for this index, improving performance.

This process of creating indices, along with settings and mappings, can be automated in Elasticsearch by using an “Index Template.” The job of an index template is to automatically apply mappings and other settings to an index at the time it is created. A basic index template will contain: a mapping for each type to be indexed, the name or wildcard expression matching the indices to which the template should be applied, and the number of shards and replicas each index should contain. All you have to do is index a document with an index name that matches one of your templates and the index will be automatically created using the template (assuming the index doesn’t already exist).

For example, if indexing DNS logs by the day the index naming schema might look something like “dnslog-YYYY-MM-DD,” with each subsequent index name incremented by one day. It would be too much work to apply settings and mapping to each index individually. Templates can be applied to every index matching this schema by creating a template first with a wildcard in the “template” field.

For example an index template that would be applied to every “dnslog-YYYY-MM-DD” index would look something like:

curl -XPUT localhost:9200/_template/dns_template -d '
{
    "template" : "dnslog-*",
    "settings" : {
        "number_of_shards" : 3,
       "number_of_replicas": 1
    },
    "mappings" : {
        "log" : {
            "properties" : {
               "Timestamp" : {"type" : "date"},
               "URL" : {"type" : "string",
                           "index": "not_analyzed"},
               "IP Address" : {"type": "ip"}
            }
        }
    }
}
'

Once this template has been applied, creating a new index with mapping and settings already applied is as simple as sending an index request.

Example:

curl -XPOST 'http://localhost:9200/dnslog-2015-04-09/log' -d '{
    "Timestamp" : "2015-04-09T14:12:12",
    "URL" : "opendns.com/enterprise-security",
    "IP Address" : "127.0.0.1"
}'

After applying the previous template the command above will create the index “dnslog-2015-04-09” containing three shards with one replica and the “log” mapping already applied.

Conclusion

In this first post of our series, Elasticsearch has shown that it is flexible enough to be set up for log data using properly configured Index Templates and Type Mappings. In the next post in our series, we will explore the scalability and availability of Elasticsearch.

For more information on Elasticsearch, check out their website at http://www.elastic.co.
Continue to: Elasticsearch: You Know, For Logs [Part 2].

Suggested Blogs

  • Cloud Application Security – Risks, Questions, Insights, and Solutions July 1, 2021 3 minute read
  • Cisco Umbrella discovers evolving, complex cyberthreats in first half of 2020 August 18, 2020 6 minute read
  • New research shows consumers want cybersecurity from service providers July 7, 2020 4 minute read

Share this blog

FacebookTweetLinkedIn

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Global Cloud Architecture
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Umbrella
  • Cisco Umbrella Blog

Learn more

  • Webinars
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2023 Cisco Umbrella