• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Contact Sales
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Why Us
    • Why Cisco Umbrella
      • Why Try Umbrella
      • Why DNS Security
      • Why Umbrella SASE
      • Our Customers
      • Customer Stories
      • Why Cisco Secure
    • Fast Reliable Cloud
      • Global Cloud Architecture
      • Cloud Network Status
      • Global Cloud Network Activity
    • Unmatched Intelligence
      • A New Approach to Cybersecurity
      • Interactive Intelligence
      • Cyber Attack Prevention
      • Umbrella and Cisco Talos Threat Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco Umbrella and SecureX
  • Products
    • Cisco Umbrella Products
      • Cisco Umbrella Cloud Security Service
      • Recursive DNS Services
      • Cisco Umbrella SIG
      • Umbrella Investigate
      • What’s New
    • Product Packages
      • Cisco Umbrella Package Comparison
      • – DNS Security Essentials Package
      • – DNS Security Advantage Package
      • – SIG Essentials Package
      • – SIG Advantage Package
      • Umbrella Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Cloud Data Loss Prevention (DLP)
      • Cloud-Delivered Firewall
      • Cloud Malware Protection
      • Remote Browser Isolation (RBI)
    • Man on a laptop with headphones on. He is attending a Cisco Umbrella Live Demo
  • Solutions
    • SASE & SSE Solutions
      • Cisco Umbrella SASE
      • Secure Access Service Edge (SASE)
      • What is SASE
      • What is Security Service Edge (SSE)
    • Functionality Solutions
      • Web Content Filtering
      • Secure Direct Internet Access
      • Shadow IT Discovery & App Blocking
      • Fast Incident Response
      • Unified Threat Management
      • Protect Mobile Users
      • Securing Remote and Roaming Users
    • Network Solutions
      • Guest Wi-Fi Security
      • SD-WAN Security
      • Off-Network Endpoint Security
    • Industry Solutions
      • Government and Public Sector Cybersecurity
      • Financial Services Security
      • Cybersecurity for Manufacturing
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
  • Resources
    • Content Library
      • Top Resources
      • Cybersecurity Webinars
      • Events
      • Research Reports
      • Case Studies
      • Videos
      • Datasheets
      • eBooks
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • Security Definitions
      • What is Secure Access Service Edge (SASE)
      • What is Security Service Edge (SSE)
      • What is a Cloud Access Security Broker (CASB)
      • Cyber Threat Categories and Definitions
    • For Customers
      • Support
      • Customer Success Webinars
      • Cisco Umbrella Studio
  • Trends & Threats
    • Market Trends
      • Hybrid Workforce
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
    • Security Threats
      • How to Stop Phishing Attacks
      • Malware Detection and Protection
      • Ransomware is on the Rise
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
      • Global Cyber Threat Intelligence
    •  
    • Woman connecting confidently to any device anywhere
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Person looking down at laptop. They are connecting and working securely
  • Blog
    • News & Product Posts
      • Latest Posts
      • Products & Services
      • Customer Focus
      • Feature Spotlight
    • Cybersecurity Posts
      • Security
      • Threats
      • Cybersecurity Threat Spotlight
      • Research
    •  
    • Register for a webinar - with illustration of connecting securely to the cloud
  • Contact Us
  • Umbrella Login
  • Cloudlock Login
  • Free Trial
Research

Elasticsearch: You Know, For Logs [Part 2]

Author avatar of UmbrellaEngineeringUmbrellaEngineering
Updated — October 15, 2020 • 6 minute read
View blog >

Part 2: Scalability and Availability

In Part 1 of this series, Elasticsearch proved that it could be configured to consume log data by using Index Templates and an time schema based on time frames. In order to move forwards with Elasticsearch it also needs to be easily scalable while maintaining high availability. This post will first explore three different node roles and how to use them scale an Elasticsearch cluster while maintaining a balanced workload. Moving forwards, this post will talk about a few different failure cases and how to protect a cluster from them.

Scalability

Scaling horizontally with Elasticsearch is as straightforward as it gets. Adding more nodes to an Elasticsearch cluster is as simple as firing up a new Elasticsearch instance with the cluster name in the Elasticsearch config(elasticsearch.yml) file to match the rest of the cluster. When the new node comes online it will be automatically discovered by the cluster and get to work right away. As a cluster grows, three different types of nodes can to be added. Part of maintaining a stable cluster is making sure that there are the right number of nodes performing each role.
Data Nodes
Data nodes are the workhorses of the cluster. They are responsible for indexing documents and performing searches as well as other index operations. Adding data nodes is the best way to scale up an Elasticsearch cluster if indexing and/or search performance needs to be improved.
A data node has the following configurations:

node.master: false
node.data: true

Adding these setting ensures data nodes will focus solely on their jobs of searching and indexing without running the risk of taking on the responsibilities of a master node.
Master Node
As an Elasticsearch cluster gets larger the cluster state becomes more cumbersome to maintain. This means your master node will be doing more work as you scale so it becomes more important to have a dedicated master node with at least one backup master node in the case of a failure. For large clusters it is recommended to have a total of three dedicated master nodes, of which one will be the master and two will be backups.
A dedicated master node is a node with the following configurations:

node.master: true
node.data: false

These settings prevent the node from storing data thus enabling it to focus solely on its job as a master node.
Load Balancer Node
Additionally, if an Elasticsearch cluster is receiving a high volume of index or search requests, adding some load balancing nodes can take some of the stress off the data nodes in the cluster. A load balancer node is not a master node and it also does not store any data. Its sole responsibility is to handle all HTTP communication.
A load balance node has the following configurations:

node.master: false
node.data: false

Load balancer nodes take pressure off of data nodes by routing search and index requests to the relevant nodes. This prevent requests from being bounced between data and/or master nodes. Additionally load balancer nodes perform all of the “Scatter and Gather” operations of a search request, allowing data nodes to focus on their primary functions.

Maintaining a Balanced Cluster

As a cluster grows, the number of shards will need to grow with it to ensure the workload is properly balanced among the cluster. If the number of data nodes grows beyond the number of shards, then some nodes will not be used. Alternatively, if the number of nodes is not a multiple of the number of shards, then the cluster will not be evenly balanced. One way to balance a cluster is by attempting to maintain a static number of shards per node. This setting is configured and can be updated in the index template. For example, to maintain a cluster with two shards per node, the following template could be used:

curl -XPUT localhost:9200/_template/template_1 -d '
{
    "template" : "syslog-*",
    "settings" : {
        "number_of_shards" : <Number of data nodes>
       "number_of_replicas": 1
    },
    "mappings" : {
        "log" : {
            "properties" : {
                  "Timestamp" : {"type" : "date",
                               "fielddata": {
                                    "loading" : "eager"
                                    }
                             },
                  "URL" : {"type" : "string",
                             "index": "not_analyzed"},
                  "IP Address" : {"type": "ip"}
            }
        }
    }
}
'

Sending this request will simply update the template with the new number of shards, but will only be applied to new indices. Once an index is created, the number of primary shards cannot be changed. We can however, balance older indices by increasing the number of replicas per shard using the following request:

curl -XPUT 'localhost:9200/dns-test-1/_settings' -d '
{
     "index" : {
     "number_of_replicas" : 1
     }
}'

For example the graphic below shows a two-data node cluster with a balanced index:
Screen Shot 2015-05-07 at 4.25.33 PM
With two primary shards per index, each with one replica, the workload will be evenly balanced between the data nodes. Then, adding a new node, the shards will be rebalanced automatically:
Screen Shot 2015-05-07 at 4.26.10 PM
With three data nodes but only two primary shards per index, the workload is no longer evenly balanced. “Blind Faith” could receive double the traffic of “Jocasta” or “Stanley Stewart”. Now update the template, setting “number_of_shards” to three, up from two, so new indices will be balanced evenly:
Screen Shot 2015-05-07 at 4.26.55 PM
Finally, increase the number of replicas of the old index to rebalance it:
Screen Shot 2015-05-07 at 4.27.29 PM
This last step may not always be necessary as it will significantly increase the storage requirement of the index, especially if rebalancing the index requires adding more than one additional replica.

Availability

With three different types of nodes, maintaining high availability means handling three different failure cases.
Data Node Failure

The most common case is the failure of a data node. In the case of a data node failure, Elasticsearch will automatically rebalance each index by creating a new copy of each failed shard using its replicas. In the time between when a node fails and the failed shards are replicated, the cluster will enter the “yellow” cluster state. This means data loss is possible if another node fails during this window. Furthermore, no node can store more than one replica of the same shard, so it is possible extra replicas will not be reassigned after a failure. They will simply wait until the failed node comes back online. For example:
Screen Shot 2015-05-07 at 4.28.29 PM
Node “Flash Thompson” Fails:
Screen Shot 2015-05-07 at 4.29.16 PM
Cluster Health is now “Yellow” due to shards ‘0’ and ‘2’ in “dns-test-2” being replicated. Once they are finished, there will be unassigned shards in “dns-test-1” still but every shard will have a stable replica:
Screen Shot 2015-05-07 at 4.30.03 PM
Cluster Health is still “Yellow” due to unassigned shards in “dns-test-1,” but the cluster is stable as every shard has at least 1 replica. Now bring the node back online and the cluster will return to normal:
Screen Shot 2015-05-07 at 4.30.41 PM
Master Node Failure
Another case is the failure of a master node. In this case, having a backup master node is essential since otherwise a data node will be elected as the master. A data node might not be able to handle the added responsibility of maintaining the cluster state which could result in all kinds of badness. Otherwise, if a master node fails a backup will simply be elected master and the cluster state should be maintained. For example:
Screen Shot 2015-05-07 at 4.31.28 PM
“Margo Damian” is currently the master node, but “Warstar” is ready and waiting in case of a failure. Then if the master node does fail:
Screen Shot 2015-05-07 at 4.32.05 PM
“Warstar” is automatically elected as master and the cluster state is maintained.
Furthermore, it is important to maintain the right number of minimum master nodes in Elasticsearch to prevent the “Brain Split” problem where two master nodes are elected within the same cluster. To do this, maintain the discovery.zen.minimum_master_nodes setting at a quorum of eligible master nodes in the cluster. For example, for a cluster with three eligible master nodes, the minimum master nodes should be set to two in Elasticsearch’s configuration.

discovery.zen.minimum_master_nodes: 2

Read more about the Brain Split problem here.
Load Balancer Node Failure
Unfortunately, failure of a load balancer node cannot be automatically dealt with by Elasticsearch. Any requests sent to the failed node will return an error message that must be handled on the client side.
If any node that is receiving index or search requests fails, the following exception will be thrown:

org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: []

Having multiple load balancer nodes should allow the client to handle this exception and redirect the request to an available node.

Conclusion

In this second part of our series, Elasticsearch has shown that it can be scaled and balanced with relative ease. Additionally, Elasticsearch can maintain a highly available cluster by assigning specific roles to each nodes and having available backups. In the next post in our series on Elasticsearch we will look into optimizing searching and sorting for log data.
Continue to: Elasticsearch: You Know, For Logs [Part 3].

Suggested Blogs

  • Cloud Application Security – Risks, Questions, Insights, and Solutions July 1, 2021 3 minute read
  • Cisco Umbrella discovers evolving, complex cyberthreats in first half of 2020 August 18, 2020 6 minute read
  • New research shows consumers want cybersecurity from service providers July 7, 2020 4 minute read

Share this blog

FacebookTweetLinkedIn

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Global Cloud Architecture
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Umbrella
  • Cisco Umbrella Blog

Learn more

  • Webinars
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2023 Cisco Umbrella