• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Cisco Umbrella

Enterprise network security

  • Free Trial
  • Contact us
  • Blog
  • Login
    • Umbrella Login
    • Cloudlock Login
  • Products
    • Product
      • Cisco Umbrella Cloud Security Service
      • Cisco Umbrella Investigate
      • Product Packages
      • Support Packages
    • Functionality
      • DNS-Layer Security
      • Secure Web Gateway
      • Cloud Access Security Broker (CASB)
      • Interactive Intelligence
      • Cloud-Delivered Firewall
    •  
    • Webinar signup
  • Solutions
    • By Need
      • Protect Mobile Users
      • Fast Incident Response
      • Web Content Filtering
      • Shadow IT Discovery & App Blocking
      • Unified Threat Enforcement
      • Reduce Security Infections
      • Secure Direct Internet Access
      • Securing Remote and Roaming Users
    • By Network
      • Protect Guest Wi-Fi
      • SD-WAN Security
      • Off-Network Endpoint Security
    • By Industry
      • Higher Education Security
      • K-12 Schools Security
      • Healthcare, Retail and Hospitality Security
      • Enterprise Cloud Security
      • Small Business Cybersecurity
      • Our Customers
      • Customer Stories
    • Ransomware Defense for Dummies book
  • Why Us
    • Fast Reliable Cloud
      • Cloud Security Infrastructure
      • Cloud Network Status
      • Cloud Network Activity
      • Recursive DNS Services
      • Top Reasons to Trial
      • Getting Started
    • Unmatched Intelligence
      • Cyber Attack Prevention
      • Interactive Intelligence
    • Extensive Integrations
      • IT Security Integrations
      • Hardware Integrations
      • Meraki Integration
      • Cisco SD-WAN
    • Navigation-dropdown-promo-free-trial_102820
  • Resources
    • Content Library
      • Top Resources
      • Analyst Reports
      • Case Studies
      • Customer Videos
      • Datasheets
      • eBooks
      • Infographics
      • Solution Briefs
    • International Documents
      • Deutsch/German
      • Español/Spanish
      • Français/French
      • Italiano/Italian
      • 日本語/Japanese
    • Cisco Umbrella Blog
      • Latest Posts
      • Security Posts
      • Research Posts
      • Threats Posts
      • Product Posts
      • Spotlight
    • For Customers
      • Support
      • Customer Success Hub
      • Umbrella Deployment Hub
      • Customer Success Webinars
      • What’s New
      • Cisco Umbrella Studio
  • Trends & Threats
    • Market Trends
      • Rise of Remote Workers
      • Secure Internet Gateway (SIG)
      • Secure Access Service Edge (SASE)
    • Security Threats
      • Ransomware
      • Cryptomining Malware Protection
      • Cybersecurity Threat Landscape
    •  
    • 2020 Cybersecurity trends
  • Partners
    • Channel Partners
      • Partner Program
      • Become a Partner
    • Service Providers
      • Secure Connectivity
      • Managed Security for MSSPs
      • Managed IT for MSPs
    •  
    • Become a partner
  • Free Trial Signup
  • Umbrella Login
  • Cloudlock Login
  • Contact Us
Spotlight

Docker Container Scheduling as a Bin Packing Problem

By Philip Thomas
Posted on May 6, 2015
Updated on October 15, 2020

Share

Facebook0Tweet0LinkedIn0

hackathon
For the internal OpenDNS engineering hackathon earlier this month, I used data from our Quadra system to develop a Docker container scheduler. The tool combines historical data about container resource consumption with a mathematical model to best decide which host should run each container. This formulation is a type of bin packing problem, and I used the JuliaOpt project’s JuMP package to formulate the solution in the Julia programming language.

Problem Statement

Given pools consisting of a quantity of identical Docker containers, each with known historical memory, CPU, and network I/O usage, and given a finite number of hosts each with known memory, CPU, and network I/O capacity – assign each container in a pool to a single host. The sum of memory and network I/O of the containers on the host must be less than its capacity. Because it is the main resource constraint for hosts, minimize the expected CPU usage on each host. In addition, attempt to keep different containers from the same pool on different hosts to create redundancy.
Screen Shot 2015-05-06 at 12.57.38 PM

Data

Collecting historical data about the resource usage of each container proved to be the most time-consuming portion of the process. The Quadra system logs resource consumption for each container in InfluxDB. So, my hackathon project starts by using this API to get a list of currently-running containers and their resource consumption over the last hour. Data about running Docker containers is accessible through the Docker Stats API introduced in v1.5.0 or by using cAdvisor.
Resource consumption by containers tends to be highly variable. For some pools, such as web apps, traffic changes proportional to web visitors throughout a day. For pools used by headless browser tests, resource consumption is often low with periodic bursts in CPU. Staging environments use comparatively little resources.
During the formulation, I assume that the host overhead is negligible for everything but the resource consumption.

Calculation

The hackathon container scheduling system was written using the Julia programming language. I formulated the problem using the JuMP package from JuliaOpt. JuMP provides a metalanguage for expressing optimization problems, then passes off the actual calculation to configurable solvers. I used the open-source CBC solver from the COIN-OR project.

After running the calculation, the script makes API calls to move containers between hosts.

Speed

This problem’s complexity is NP-Hard, so the actual calculation times with the branch and bound algorithm became infeasible for real-world use when I tested more than 200 pools of containers. Solving a small data set of 20 pools across 4 hosts took seconds, but 200 pools across 10 hosts never converged after 8 hours on my Macbook. Fortunately the algorithm is parallelizable, so it is possible to utilize all resources on a multi-core device.
The calculation takes so long because it seeks a convergent, optimal solution. However, the practical application of this system means that slight variations away from the optimal solution are viable. This is due to uncertainty in the data used for the calculation and the excess capacity on each host. A way to deal with this summarized in the “Formulation” section bullet point below.

Possible Improvements

  • Data – When pulling historical data – instead of using an average, use the (average + standard deviation) or (average + 2*standard deviation) in order to account for fluctuations in container resource consumption.
  • Formulation – Instead of treating the problem as an optimization with an objective function – treat it as a constraint programming problem. To do this in the script – set a static CPU limit, modify the constraint to be less than this variable, and delete the objective function.

Applications

The system is designed to run on a cron job – perhaps every hour – in order to continuously reallocate containers between hosts as their resources change and new pools get added.
The most practical application of this system that I found was capacity planning. With slight modification, it is possible to use this script to determine the minimum number of hosts required to service all containers. Decreasing hosts, even by one, has the potential for significant cost recovery.
Disaster recovery is another application. If a host goes down and a new one needs to be provisioned to take its place, this system can reallocate all containers before the new host is available, then again reallocate when the new host is available.

Next Steps

The current scheduler is a proof of concept. Its most important use may be capacity planning for selecting the quantity and resources of hosts used for Quadra. To learn more about optimization in Julia, check out my previous blog post on the topic.

Previous Post:

Previous Article

Next Post:

Next Article

Follow Us

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

Footer Sections

What we make

  • Cloud Security Service
  • DNS-Layer Network Security
  • Secure Web Gateway
  • Security Packages

Who we are

  • Cisco Umbrella Blog
  • Cloud Network Status
  • Cloud Network Activity
  • OpenDNS is now Cisco Umbrella

Learn more

  • Events
  • Careers
  • Support
  • Cisco Umbrella Live Demo
  • Contact Sales
Umbrella by Cisco
208.67.222.222+208.67.220.220
2620:119:35::35+2620:119:53::53
Sign up for a Free Trial
  • Cisco Online Privacy Statement
  • Terms of Service
  • Sitemap

© 2021 Cisco Umbrella