Every day, the Cisco Umbrella global network processes over 250 billion recursive DNS requests. Simply processing these recursive DNS requests is a huge job, but we’re also tasked with ensuring that each of these queries is answered as quickly as possible. One of the technologies that helps us maintain our great availability and speed is called anycast routing. In this blog post we’ll explain what anycast routing is, how we use it, and how it helps us maintain our 100% uptime and availability for our customers.
The problem with duplicate IP addresses
Conventional networking wisdom tells us that every IP address for each individual host should be unique. Two hosts that are broadcasting the same IP address could lead to packets being misdirected, which could lead to some unexpected results. However, there are scenarios in which multiple hosts with the same IP address can work together effectively. This is exactly what anycast routing is.
With anycast routing, the same IP address (for example, the Cisco Umbrella nameservers 208.67.222.222 & 208.67.220.220) exists on multiple servers around the world. Umbrella currently operates clusters of nameservers at 30+ unique locations around the world, each with numerous DNS resolver instances. All these DNS servers operate with the same IP addresses, which means that there are hundreds of machines across the globe with an IP address of 208.67.222.222.
Border Gateway Protocol (BGP) and peering for anycast routing
So how does this all work? The answer lies in the protocol that makes routing possible. The Border Gateway Protocol (BGP) is the routing protocol used by all internet service providers today to connect their networks with others across the globe.
Cisco Umbrella has many direct connections with other providers via a process called BGP peering. Over each peering, we run BGP as a routing protocol. Both parties announce via BGP what network prefixes can be reached through this connection.
When selecting data centers, we look for locations that enable us to “peer” with many other networks. Having all those direct connections with other networks improves redundancy as it increases the number of available paths to our users and content providers. Best of all, it improves speed and reduces latency because the round-trip time (RTT) is reduced.
Just like other network protocols like TCP, BGP will deliver a packet using the shortest path to a destination. In the case of BGP, the shortest path is determined by looking at the ASPATH. This is a sequence of Autonomous Systems (AS) numbers. Each AS number represents a network or service provider (like Cisco Umbrella, which has the AS number of 36692).
Cisco Umbrella uses BGP to announce the same IP address ranges from all data center locations in the world, and uses the internet routing system to make sure that our users will use whichever DNS server is closest to them (or, phrased another way, uses the shortest number of network hops to get to its destination).
How Cisco Umbrella uses anycast routing
When you connect to a cloud security service, performance is critical. Using that service cannot break or slow down your internet connection. Cisco Umbrella has delivered 100% business uptime since 2006 by using Anycast routing. Umbrella’s network of data centers is co-located with the top internet exchange points (IXPs) on every continent where we have a presence. IXPs are locations where organizations either physically or virtually connect their routers to exchange data.
Let’s look at an example for a user in Miami. This user is a customer of a regional ISP based in southern Florida. The ISP has a direct peering with Cisco Umbrella and has connections to two different large (Tier 1) transit providers (which are illustrated as provider A and provider B in the drawing below).
In this scenario, Umbrella traffic to and from this user will go to our data center in Miami and is routed over the direct path between the ISP and Umbrella. This is because we have a direct BGP peering here (shown by the green line). Direct connections are preferred as they have a shorter ASPATH, which saves both parties money and typically results in a lower RTT.
The diagram above describes a typical scenario where we peer with local and regional ISPs, but we also have two or three transit providers at each site. Looking at the diagram, it’s obvious that this setup offers a variety of different paths for traffic from this user to get to Umbrella, or redundancy (many connections). If something happens to disable the peering connection, the traffic will automatically fail over to the one of the numerous alternative transit paths. Like a detour on the highway, the traffic can continue on its merry way to the destination, and the user has an undisturbed internet experience.
Let’s imagine something happens to Umbrella’s Miami data center, like a major hurricane that knocks out the power for a long time. (Note: all data centers have backup power generators to prevent sudden failures, but those data centers may be shut down carefully during prolonged outages to protect them in case of power spikes). The routes to this data center would be automatically withdrawn, and BGP will quickly re-route any traffic from this region to the closest alternative Umbrella data center. As routing protocols typically select the route with the shortest path, this user will most likely be routed to one of our servers in Texas.
Luckily, failing over to a different data center doesn’t happen very often, but it’s good to know it will work seamlessly when needed. Failures can happen within a data center, too. Within each Umbrella data center, we run several identical instances of our DNS servers. Each DNS instance has the same IP addresses, and we use the same anycast routing technology within each data center as well. All of this is invisible to the user, but it boosts reliability.
Monitoring and management
Using anycast routing, while useful, can result in some interesting challenges. Imagine a simple health check, where we need to collect performance and health statistics from each DNS server. If we send a DNS request to the anycast address 208.67.222.222, this will only test the DNS server closest to our monitoring servers. To solve this problem, each DNS server can be identified by multiple addresses. Each server has a unicast address for management and monitoring purposes as well as an anycast address. This allows us to have detailed performance statistics for each specific server and helps us with troubleshooting any issues with a DNS instance.
Anycast routing is the key to reliability
To sum it up: anycast routing allows us to easily scale our cloud security service globally by just adding more data centers and servers, all with the same IP address. No involvement is needed from our users when we add new server capacity. We use BGP to achieve load balancing and failover within the data centers as well as globally between data centers.
In the unusual case of a failover, there is no need for making changes to load balancers, proxy servers, DNS servers, etc. This would make a failover event or other configuration change totally transparent to Cisco Umbrella users. Because the IP address doesn’t change, no changes are required on the client side, and there are no other challenges like TTL caching issues.
Cisco Umbrella has 30+ data centers across the globe and direct peering connections with 1,000+ CDNs and ISPs to make sure we have the shortest possible routes to our users, wherever they are. Our more than 100+ million global enterprise and consumer users get fast, reliable, secure internet access with 100% uptime since 2006.
But that’s not all Umbrella can do. Learn more about how Cisco Umbrella DNS-layer security protects from cyberattacks before they even start.