At OpenDNS we are very proud of our uptime (100% since our launch in 2006) and work hard to maintain our status as the best in the industry. I work on the OpenDNS operations team, and each day we process over 50 Billion DNS queries. Simply processing the requests is a huge job, but we’re also tasked with ensuring that each of these queries is answered as quickly as possible. One of the technologies that helps us maintain our great availability and speed is called Anycast. In this blog post we’ll explain what Anycast is, how we use it and how it helps us maintain our awesome availability.
Duplicate IP Addresses
Conventional networking wisdom tells us that an IP address should be unique. Two hosts with the same IP address could lead to all kinds of strange and unexpected results, right? While normally this is true, there are scenarios in which multiple hosts with the same IP address can work really well. This is exactly what Anycast is.
With Anycast, the same IP address (for example, the OpenDNS nameservers 188.8.131.52 & 184.108.40.206) exists on multiple servers around the world. OpenDNS currently operates clusters of nameservers at 13 unique locations around the world and soon 6 more European sites, bringing the total to 19 sites, each with numerous DNS resolver instances. All of these DNS servers operate with the same IP addresses which means that there are over a hundred machines with an IP address of 220.127.116.11.
BGP and Peering
So how does this all work? The answers lies in the protocol that makes routing on the Internet work. BGP, the Border Gateway Protocol, is the routing protocol used on the Internet by all providers today.
OpenDNS has many direct connections with other providers, which are called BGP peerings. Over each of these peerings we run BGP as a routing protocol and both parties announce what network prefixes can be reached through this connection.
When selecting new data centers we look for locations that enable us to “peer” with many other networks. Having all those direct connections with other networks improves redundancy as it increases the number of available paths to our users and content providers. Best of all, it improves performance because the round trip time (RTT) is reduced. At OpenDNS we have many direct peerings with a large number of other networks around the world and we continue to add more peerings. To give you an idea, in the last two months alone we’ve added more than 200 new peering sessions.
Similar to any other routing protocol, BGP will deliver a packet using the shortest path to a destination. In the case of BGP the shortest path is determined by looking at the ASPATH. This is a sequence of Autonomous Systems (AS) numbers. Each AS number represents a network or service provider. For example the OpenDNS AS number is 36692.
We use BGP to announce the same IP address ranges from all our locations in the world and utilize the Internet routing system to make sure that our users will use the DNS server that is closest to them.
Let’s look at an example for a user in Miami and assume this user is a customer of a regional Internet Service provider in Florida. The ISP has a direct peering with OpenDNS and also has connections to two different large (Tier1) transit providers (illustrated as provider A and B in the drawing below).
In this scenario, OpenDNS traffic for the user will go to our data center in Miami and is routed over the direct path between the ISP and OpenDNS as we have a direct BGP peering here (Illustrated as the orange line). Direct connections are preferred as they have a shorter ASPATH, save both parties money and typically have a lower RTT.
The diagram above describes a typical scenario where we peer with local and regional ISPs and also have two or three transit providers at each site. When looking at this diagram it’s pretty obvious that this setup offers a lot of redundancy (many connections). Should something happen with the peering connection, traffic will fail over to the one of the numerous alternative transit paths.
Should something happen to OpenDNS’s Miami data center, the routes are automatically withdrawn and BGP will quickly re-route to the closest alternative OpenDNS site. As routing protocols typically select the route with the shortest path this user will most likely be routed to one of our servers in Dallas or Ashburn.
Luckily, failing over to a different data center doesn’t happen very often but it’s good to know it will work seamlessly when needed. I should also mention that within each of the data centers we run a number of identical instances of our DNS servers and each instance has the same IP addresses here as well. So as you can see we use the same Anycast technology within each data center as well and rely on standard routing technologies for load balancing.
Monitoring and Management
Using Anycast alone would result in some interesting challenges. For example when we need to collect performance and health statistics from each server. Imagine a simple health check, for example a DNS request to the Anycast address 18.104.22.168. This would only test the DNS server closest to our monitoring servers. To solve this each DNS server has multiple addresses, a Unicast address for management and monitoring purposes as well as an Anycast address. This allows us to have detailed performance statistics for each server and helps us with troubleshooting.
It’s probably clear now why we decided to use Anycast technology for our services. Anycast allows us to easily scale our service globally by just adding more data centers and servers, all with the same IP address. We use BGP to achieve load balancing and failover within the data center as well as globally between data centers.
All of this works on the network layer and in the case of a failover there is no need for making changes to load balancers, proxy servers, DNS servers, etc. This makes a failover event totally transparent to OpenDNS users. Because the IP address doesn’t change no changes are required on the client side, no TTL caching issues, etc.
Finally we have many direct peering connections with other content providers and ISPs to make sure we have the shortest possible routes to our users. All this results in fast lookups and awesome availability.