• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Free Trial
  • Contact us
  • Blog
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Products
    • Product
      • Cisco Umbrella Cloud Security Service
      • Cisco Umbrella Investigate
      • Product Packages
      • Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Interactive Intelligence
      • Cloud-Delivered Firewall
    •  
    • Webinar signup
  • Solutions
    • By Need
      • Protect Mobile Users
      • Fast Incident Response
      • Web Content Filtering
      • Shadow IT Discovery & App Blocking
      • Unified Threat Enforcement
      • Reduce Security Infections
      • Secure Direct Internet Access
      • Securing Remote and Roaming Users
      • Remote Browser Isolation
    • By Network
      • Protect Guest Wi-Fi
      • SD-WAN Security
      • Off-Network Endpoint Security
    • By Industry
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
      • Our Customers
      • Customer Stories
    • Ransomware Defense for Dummies book
  • Why Us
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Cloud Network Activity
      • Recursive DNS Services
      • Top Reasons to Trial
      • Getting Started
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Cyber Attack Prevention
      • Interactive Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco SD-WAN
    • Navigation-dropdown-promo-free-trial_102820
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Analyst Reports
      • Case Studies
      • Customer Videos
      • Datasheets
      • eBooks
      • Infographics
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • Cisco Umbrella Blog
      • Latest Posts
      • Security Posts
      • Research Posts
      • Threats Posts
      • Product Posts
      • Spotlight
    • For Customers
      • Support
      • Customer Success Hub
      • Umbrella Deployment Hub
      • Customer Success Webinars
      • What’s New
      • Cisco Umbrella Studio
  • Trends & Threats
    • Market Trends
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
      • Secure Access Service Edge (SASE)
    • Security Threats
      • Global Cyber Threat Intelligence
      • Ransomware
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Cyber Threat Categories and Definitions
    •  
    • Navigation-dropdown-promo-threat-report_020521
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Become a partner
  • Free Trial Signup
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
Research

Elasticsearch: You Know, For Logs

By OpenDNS Engineering
Posted on May 5, 2015
Updated on May 27, 2020

Share

Facebook0Tweet0LinkedIn0

The data platform team at OpenDNS is always looking at new technologies to improve our real time search platform. Consequently, we have been keeping a close eye on Elasticsearch for quite a while, and even use it for some internal tools and metrics.

OpenDNS is now looking at using Elasticsearch as a real time search engine for our DNS log data. OpenDNS needs a powerful real time logging and search platform for several reasons. First and foremost is for our customers. Our customers need to be able to identify malicious activity on their networks as it is happening so they can respond promptly. Any time spent waiting for the data to come in is time that infections could be spreading or attacks could be gaining momentum. Similarly, we at OpenDNS use this data to monitor our own systems using several different metrics. If something goes wrong, we need to know right away so we can fix the problem before it propagates. Overall, getting data in real time means that the people monitoring this data can react in real time. This reaction time could mean the difference between a minor headache and a catastrophic problem.

For Elasticsearch to solve this problem, it not only has to be real-time, but scalable and manageable as well. OpenDNS is growing quickly, so we need a system that can grow with us without introducing technical debt. Also, our engineers don’t like getting paged on Sunday at 3 a.m., so we need a system that can deal with failures automatically without missing a beat.

Part 1: Introduction and Setup

This blog post is the first in a series that will focus on Elasticsearch and how to optimize it for log data. For information on other Elasticsearch products, including their recommended real time logging stack “ELK,” vist https://www.elastic.co/products.

Furthermore, this series will mainly show examples using Elasticsearch’s REST API, because it is simple and easy to use. With that said, Elasticsearch supports several client languages, listed here.

What is Elasticsearch?

Elasticsearch is a highly scalable search platform based on Apache Lucene. It is built from the ground up for the cloud and supports distributed indices and multitenancy. Since its release in 2010, Elasticsearch has gained many notable users and remains a very active project at Elastic under its creator Shay Banon.

Getting started with Elasticsearch

The Elasticsearch website has great documentation to walk users through installation.
Elasticsearch was designed to be distributed, so to demonstrate its full functionality it is important to set up a cluster of at least three nodes. Creating a three node cluster should be as simple as running three Elasticsearch instances with the same cluster name. The “cluster.name” variable is found in the main Elasticsearch configuration file “elasticsearch.yml” in the “config” folder. By default a three-node cluster will include two data nodes along with one elected master node.

Once Elasticsearch is installed and the cluster is connected, users need a way of visualizing their cluster. Elasticsearch supports a plugin called “Head” that will to exactly this, plus a little extra.

To install Head simply run this command from the ‘elasticsearch’ folder:

bin/plugin --install mobz/elasticsearch-head

Once installed, it can be accessed through

http://<hostname>:9200/_plugin/head/
Screen Shot 2015-05-05 at 2.46.10 PM

Head gives an intuitive view of the indices and shards of an Elasticsearch cluster as well as the ability to easily browse and search through its documents. When head is first opened, it should look something like the following:

This image shows a simple Elasticsearch cluster with three nodes: two data nodes indicated by the black circles and one master, indicated by the black star. On the right is an index named “dns-test-1.” Each green box represents a “Shard,” which are Lucene Inverted Indices under the hood. By default an Elasticsearch index has five shards, each with one replica. Primary shards are shown with emboldened borders and replicas are shown with light borders.

Elasticsearch is designed to work straight out of the box, no need to worry about schemas or creating indices yet. As long as each document is given a type and an index name, it will index the document.

Example:

curl -XPOST 'http://localhost:9200/logs/log' -d '{
    "Timestamp" : "2009-11-15T14:12:12",
    "URL" : "opendns.com",
    "IP Address" : "127.0.0.1",
   "log_id" : 1
}'

Sending a simple index request will automatically create an index called “logs” and index the given document. Elasticsearch will also automatically guess the type of each field and index the doc. Elasticsearch is pretty smart. If formatted properly the “Timestamp” field will default to the “date” data type and the “log_id” field will default to type “int.” Though trickier fields such as “IP Address”  will default to just a string, even though Elasticsearch does have a native “IP” data type. The documents themselves are indexed and stored in JSON format.

Note that once the data type for a field is set, it cannot be changed. For example if the following document:

curl -XPOST 'http://localhost:9200/logs/log' -d '{
    "Timestamp" : "2009-11-15T14:12:12",
    "URL" : "opendns.com/enterprise-security",
    "IP Address" : "127.0.0.1",
    "log_id" : "abcd"
}'

is indexed after the previous one, the following exception is thrown:

RemoteTransportException[[Reptyl][inet[/10.70.99.146:9301]][indices:data/write/index]]; nested: MapperParsingException[failed to parse [log_id]]; nested: NumberFormatException[For input string: "abcd"];

Mappings

After an initial cluster is set up, an important step is to create a type mapping for the documents being indexed. The data type and other settings of each field for a specific document type are all stored in a type mapping which is configured using the Elasticsearch Put Mapping APl. Although a default mapping will be created by Elasticsearch when a new document is indexed, the default data types are often too general. For example, any field containing only an integer will be defaulted to type “long.”

In this case, manually setting it to type “integer” might better represent the data and simultaneously save some storage space. Also, any “string” fields need to be set to “not analyzed.” By default Elasticsearch will tokenize all string fields with an analyzer. This functionality is mainly implemented to support full text search and document scoring. For log data in general, queries are match only, so none of this is important. Setting the “index” field to “not_analyzed” ensures Elasticsearch won’t waste time unnecessarily tokenizing every string field.

Before creating a mapping, any default mapping created by Elasticsearch should be deleted in order to avoid conflicts. This will also delete any documents that have been indexed into this default mapping, so be careful. Mappings are deleted with the following command:

curl -XDELETE 'http://localhost:9200/logs/log/_mapping'

Here is an example of a put mapping command that specifies a simple mapping  for documents with type “log” belonging to the index “logs” with three fields and their datatypes:

$ curl -XPUT 'http://localhost:9200/logs/_mapping/log' -d '
{
    "log" : {
        "properties" : {
           "Timestamp" : {"type" : "date"},
           "URL" : {"type" : "string",
                      "index": "not_analyzed"},
           "IP Address" : {"type": "ip"},
           "log_id" : {"type" : "integer"}
        }
    }
}'

Once a mapping is set, it is important to note that indexing documents matching the mapping type, but with fields not included in the mapping will add said fields to the mapping.

For example, if, after applying the previous mapping, I tried to submit the following index request:

curl -XPOST 'http://localhost:9200/logs/log' -d '{
    "Timestamp" : "2009-11-15T14:12:12",
    "URL" : "opendns.com/enterprise-security",
    "IP Address" : "127.0.0.1"
   "log_id" : "2",
   "user" : "John"
}'

The “user” field with type “string” would be added to the mapping automatically.

Document IDs

If unspecified, Elasticsearch will simply generate an ID for each document. This works fine in some cases, but often the user needs to be able to add their own ids.

In the most simple case, a document ID can be added to an index request itself as in the following:

curl -XPUT 'http://localhost:9200/logs/log/37' -d '{
    "Timestamp" : "2009-11-15T14:12:12",
    "URL" : "opendns.com/enterprise-security",
    "IP Address" : "127.0.0.1"
}'

Simply change the request to “XPUT” and tack on the ID to the end of the URL.
Alternatively, a field can be added to the mapping along with a specified path to pull the ID from the document itself.

The following mapping will tell Elasticsearch to use the “log_id” field as the document ID:

$ curl -XPUT 'http://localhost:9200/logs/_mapping/log' -d '
{
   "log" : {
         "_id" : {"path" : "log_id"},
         "properties" : {
               "Timestamp" : {"type" : "date"},
               "URL" : {"type" : "string",
                           "index": "not_analyzed"},
               "IP Address" : {"type": "ip"},
               "log_id" : {"type" : "integer"}
       }
    }
}'

Index Schema and Templates

With a large amount of data coming in every day, it is important to have a comprehensive way of partitioning the data into Elasticsearch. For log data, it is often intuitive to partition the data into indices based on a time interval such as daily or hourly. Partitioning data in this way comes with several advantages. For one, data expiration becomes very easy.

Instead of relying on a TTL or other expiration methods, old indices can simply be deleted altogether. Another advantage comes when the data is queried. If a query is only looking for documents from a certain time period, it can be limited to fewer indices instead of having to query an entire cluster. This index schema is especially advantageous in the real time search use case. Since the most recent index will likely be receiving the majority of the traffic, Elasticsearch will maintain a larger cache for this index, improving performance.

This process of creating indices, along with settings and mappings, can be automated in Elasticsearch by using an “Index Template.” The job of an index template is to automatically apply mappings and other settings to an index at the time it is created. A basic index template will contain: a mapping for each type to be indexed, the name or wildcard expression matching the indices to which the template should be applied, and the number of shards and replicas each index should contain. All you have to do is index a document with an index name that matches one of your templates and the index will be automatically created using the template (assuming the index doesn’t already exist).

For example, if indexing DNS logs by the day the index naming schema might look something like “dnslog-YYYY-MM-DD,” with each subsequent index name incremented by one day. It would be too much work to apply settings and mapping to each index individually. Templates can be applied to every index matching this schema by creating a template first with a wildcard in the “template” field.

For example an index template that would be applied to every “dnslog-YYYY-MM-DD” index would look something like:

curl -XPUT localhost:9200/_template/dns_template -d '
{
    "template" : "dnslog-*",
    "settings" : {
        "number_of_shards" : 3,
       "number_of_replicas": 1
    },
    "mappings" : {
        "log" : {
            "properties" : {
               "Timestamp" : {"type" : "date"},
               "URL" : {"type" : "string",
                           "index": "not_analyzed"},
               "IP Address" : {"type": "ip"}
            }
        }
    }
}
'

Once this template has been applied, creating a new index with mapping and settings already applied is as simple as sending an index request.

Example:

curl -XPOST 'http://localhost:9200/dnslog-2015-04-09/log' -d '{
    "Timestamp" : "2015-04-09T14:12:12",
    "URL" : "opendns.com/enterprise-security",
    "IP Address" : "127.0.0.1"
}'

After applying the previous template the command above will create the index “dnslog-2015-04-09” containing three shards with one replica and the “log” mapping already applied.

Conclusion

In this first post of our series, Elasticsearch has shown that it is flexible enough to be set up for log data using properly configured Index Templates and Type Mappings. In the next post in our series, we will explore the scalability and availability of Elasticsearch.

For more information on Elasticsearch, check out their website at http://www.elastic.co.
Continue to: Elasticsearch: You Know, For Logs [Part 2].

Previous Post:

Previous Article

Next Post:

Next Article

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Global Cloud Architecture
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Umbrella
  • Cisco Umbrella Blog

Learn more

  • Webinars
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2021 Cisco Umbrella