Operating distributed computing systems at scale brings a variety of challenges. Minor issues like having the wrong version of a small software library can take a whole application offline. To create a smooth transition from development to operations with regard to dependencies, environment, and testing OpenDNS has adopted open-source Docker containerization technology. Interfacing Docker containers with existing dedicated infrastructure across 23 data centers provided a unique set of routing challenges which we solved with clever application of Generic Routing Encapsulation and Border Gateway Protocol.
Containerization brings operating system-level virtualization. Compared to tools like VMware, this direct access to the system kernel brings improved speed. Docker specifically brings Git-levels of simplicity to managing system images. In short, packaging applications in Docker takes the guess work out of deployments by replicating the exact same environment between development and production.
Impetus at OpenDNS
Containerization at OpenDNS started as a hackathon project in August 2013 when a team built a small platform system to demonstrate its benefits. Since then, an OpenDNS Engineering Infrastructure team has built a comprehensive platform as a service (PaaS) based on Docker. By abstracting away the underlying infrastructure, engineering can focus on development rather than dependency management and monitoring. The PaaS itself implements service methodologies from the Twelve-factor app.
The OpenDNS Global Network includes 23 data centers plus significant cloud infrastructure in Amazon Web Services (AWS). Routing network traffic between these hardware and cloud infrastructures poses a challenge, and the OpenDNS PaaS project began as a way to replicate the performance of dedicated hardware systems in cloud networks like AWS. Described below is an experiment in routing around one of those limitations.
OpenDNS sought to be able to run hundreds of containers, each potentially having a privately-routable IP address, on a single EC2 host. The problem is that an EC2 instance can have a limited number of secondary IP addresses.
Two Quick Definitions
- Generic Routing Encapsulation (GRE) is used to encapsulate a protocol and route it over IP. In our solution we encapsulate an IP packet inside of another IP packet. This allows us to build tunnels through third party networks.
- Border Gateway Protocol (BGP) is a routing protocol used to announce reachability of IP networks. This protocol is most frequently used between ISPs. Below we’ll be using it between one of our routers and EC2 instances.
Our solution was to use GRE to create an overlay network spanning OpenDNS’s network and our AWS Virtual Private Cloud (VPC) to get around routing limitations in EC2.
The first approach was to allocate and statically route /24 prefixes (allocated from a larger block, e.g. 10.0.0.0/16) to individual docker hosts running in EC2. This would give us sufficient IP addresses to support 250 containers with unique IPs per host.
Extending this concept, we used BGP sessions between each docker host and our router. Each host announces its own address space. This allowed us to go from a single large prefix to individual /32s with minimal configuration on the router. Using Quagga as a BGP daemon on our docker hosts allowed us to redistribute /32s from the hosts’ routing table into BGP. Tools like Pipework could be used here to wire up container IPs and the routing table with Quagga automatically announcing these via BGP.
Moving a container from one host to another host becomes straightforward in this scenario. Replicate the container on a new host and assign it the prior IP address, then afterward destroy the old container. The BGP routes converge within a couple of seconds to re-route traffic.
What happens if you don’t take the old container down and there are two containers with the same IP address? Load balancing! Most routers can be configured for Equal-Cost Multi-Path (ECMP) routing to accomplish this.
This leads to powerful options such as creating fault tolerant active-active load balancing across multiple hosts, as well as using anycast across our hybrid environment which we discussed in a previous blog post.
Utilizing Docker has made operating distributed computing systems between dedicated and cloud data centers transparent. As our internal PaaS tools evolve, we continue to automate more tasks. For instance, we configure the GRE tunnels and BGP peers on each docker host and router manually. This is not scalable and we need to look at how to automate this. We used BGP as our dynamic routing protocol but we could explore OSPF as an alternative. We have been using private IP address space in our tests but we would also like to experiment with using public IP addresses in the same way. Lastly, we are excited to start using IPv6 addresses in our containers on EC2 and truly remove any address limitations.
We’d love to hear from you if you’re using a similar method or have alternate solutions!