For the internal OpenDNS engineering hackathon earlier this month, I used data from our Quadra system to develop a Docker container scheduler. The tool combines historical data about container resource consumption with a mathematical model to best decide which host should run each container. This formulation is a type of bin packing problem, and I used the JuliaOpt project’s JuMP package to formulate the solution in the Julia programming language.
Given pools consisting of a quantity of identical Docker containers, each with known historical memory, CPU, and network I/O usage, and given a finite number of hosts each with known memory, CPU, and network I/O capacity – assign each container in a pool to a single host. The sum of memory and network I/O of the containers on the host must be less than its capacity. Because it is the main resource constraint for hosts, minimize the expected CPU usage on each host. In addition, attempt to keep different containers from the same pool on different hosts to create redundancy.
Collecting historical data about the resource usage of each container proved to be the most time-consuming portion of the process. The Quadra system logs resource consumption for each container in InfluxDB. So, my hackathon project starts by using this API to get a list of currently-running containers and their resource consumption over the last hour. Data about running Docker containers is accessible through the Docker Stats API introduced in v1.5.0 or by using cAdvisor.
Resource consumption by containers tends to be highly variable. For some pools, such as web apps, traffic changes proportional to web visitors throughout a day. For pools used by headless browser tests, resource consumption is often low with periodic bursts in CPU. Staging environments use comparatively little resources.
During the formulation, I assume that the host overhead is negligible for everything but the resource consumption.
The hackathon container scheduling system was written using the Julia programming language. I formulated the problem using the JuMP package from JuliaOpt. JuMP provides a metalanguage for expressing optimization problems, then passes off the actual calculation to configurable solvers. I used the open-source CBC solver from the COIN-OR project.
After running the calculation, the script makes API calls to move containers between hosts.
This problem’s complexity is NP-Hard, so the actual calculation times with the branch and bound algorithm became infeasible for real-world use when I tested more than 200 pools of containers. Solving a small data set of 20 pools across 4 hosts took seconds, but 200 pools across 10 hosts never converged after 8 hours on my Macbook. Fortunately the algorithm is parallelizable, so it is possible to utilize all resources on a multi-core device.
The calculation takes so long because it seeks a convergent, optimal solution. However, the practical application of this system means that slight variations away from the optimal solution are viable. This is due to uncertainty in the data used for the calculation and the excess capacity on each host. A way to deal with this summarized in the “Formulation” section bullet point below.
- Data – When pulling historical data – instead of using an average, use the (average + standard deviation) or (average + 2*standard deviation) in order to account for fluctuations in container resource consumption.
- Formulation – Instead of treating the problem as an optimization with an objective function – treat it as a constraint programming problem. To do this in the script – set a static CPU limit, modify the constraint to be less than this variable, and delete the objective function.
The system is designed to run on a cron job – perhaps every hour – in order to continuously reallocate containers between hosts as their resources change and new pools get added.
The most practical application of this system that I found was capacity planning. With slight modification, it is possible to use this script to determine the minimum number of hosts required to service all containers. Decreasing hosts, even by one, has the potential for significant cost recovery.
Disaster recovery is another application. If a host goes down and a new one needs to be provisioned to take its place, this system can reallocate all containers before the new host is available, then again reallocate when the new host is available.
The current scheduler is a proof of concept. Its most important use may be capacity planning for selecting the quantity and resources of hosts used for Quadra. To learn more about optimization in Julia, check out my previous blog post on the topic.