It’s been a week since the Akamai Edge DNS Service outage. In that time, a bevy of news articles, op-eds, and think pieces have popped up discussing how this most recent domain name system (DNS) failure proves that major enterprises like Amazon, American Airlines, Oracle Cloud, and UPS need to improve their network infrastructure.
This consensus, while true, is unhelpful for most internet users. Small businesses and large corporations may rely on digital entities Amazon or Oracle Cloud, but they have no power over how these corporations invest their infrastructure dollars. So when DNS outages occur, most internet users resign themselves to waiting.
But, what if there was a better way to deal with DNS outages from the end user side? Well, as it turns out, there is. On July 22, 2021 – while most people on the internet were stuck reading variations on an error message – many companies deploying Cisco Umbrella kept up with critical business operations. That’s because our DNS-layer security platform comes with a feature called SmartCache. SmartCache allows Cisco Umbrella recursive DNS servers to bypass unresponsive authoritative DNS servers and connect to websites using the last known IP address. Few, if any, DNS security providers offer this kind of functionality to their customers. And, for users looking to avoid the downtime caused by DNS failures out of their control, Cisco Umbrella’s SmartCache represents an easy-to-implement, consumer-centric solution.
What caused the Akamai DNS outage?
Akamai Technologies operates a network of authoritative DNS servers, marketed as the Akamai Edge DNS Platform. On July 22, Akamai engineers pushed out a software configuration update that triggered a bug in their DNS system. This bug caused users to experience widespread DNS failures when trying to access thousands of websites. After a little over an hour, Akamai engineers rolled back the update and the Akamai Edge DNS servers resumed normal operation.1
If the connection between a single bug in a DNS server and a massive internet outage feels a bit nebulous to you, you’re not alone – we even put together a blog post to help users understand how the Domain Name System works. But for the sake of understanding the Akamai outage – and planning for similar future outages – there are three things about the Domain Name System you need to know:
- Every computer on the internet, including the servers that host websites, has a unique IP address that is used to connect to it.
- Authoritative DNS servers, like those operated by Akamai, store a list of regional domain names and their associated IP addresses, much like a phone book.
- Recursive DNS servers, like those operated by your Internet Service Provider (ISP) or Cisco Umbrella, query authoritative DNS servers to convert the domain name you input into your browser to an IP address your computer can use to connect to a website server.
When the Akamai authoritative DNS servers went down, the phone book containing IP addresses for sites like Home Depot, Costco, and LastPass became inaccessible. And because recursive DNS servers couldn’t find the IP addresses they were looking for, they provided internet users with an error message informing them of the DNS failure.
Are DNS outages common?
Considering DNS forms the foundation of internet connectivity, DNS outages are surprisingly common. In 2020, a DNS failure at CloudFlare took sites like Discord, Politico, and Shopify offline. In 2016, a Distributed Denial of Service (DDoS) attack took DYN DNS servers offline, keeping users from sites for Squarespace, Verizon, Twitter, and the Swedish Government, among others. And these are just a few large examples. Smaller DNS outages occur more commonly and can affect a single entity – like the DNS failure that took Salesforce offline in 2021 or the DNS public resolver issue at Google that crippled user connectivity in 2018.
In today’s interconnected, cloud-based world, each of these outages represents hours of unplanned and unproductive downtime for companies large and small. This is where Cisco Umbrella comes in.
How Cisco Umbrella can help you prepare for the next DNS outage
Remember when we discussed how recursive DNS servers query authoritative DNS servers to find the IP address associated with a domain name? Most recursive DNS servers, like those used by your ISP, only cache these records for a short period of time. After the time limit expires, the record is deleted and the recursive DNS server has to query the authoritative DNS server again to find the right IP address for a given domain name.
However, the Cisco Umbrella recursive DNS servers don’t delete the expired records. Instead, they mark them as expired and store them in a database containing records gathered from billions of recursive DNS queries. Most of the time, these expired records aren’t used for very much. But during an authoritative DNS outage, Cisco Umbrella recursive DNS servers can use these expired records to connect users to the last known IP address for the domain they’re trying to access.
So, how did this feature play out during the Akamai outage? Our engineers tracked DNS timeouts for Cisco Umbrella users in the graph below during the event:
The red area of the graph indicates DNS failures caused by an inability to obtain IP addresses from authoritative DNS servers. The sudden spike near the left-hand side of the graph was caused by the Akamai outage – you’ll notice the number of failures drops dramatically near the right-hand side of the graph when the issue is resolved.
The green area of the graph is far more interesting, though. It indicates situations where the Cisco Umbrella recursive DNS servers were able to connect users to websites using the last known IP address. This is SmartCache functionality in action. For the duration of the Akamai outage, Cisco Umbrella recursive DNS servers were able to complete anywhere between 40% to 50% of queries using SmartCache data, allowing many customers to connect to sites they regularly use while large portions of the Internet were down. And since the Cisco Umbrella network continues to grow, you can expect the number of records in the SmartCache system to expand.
Don’t let DNS failures slow you down
Of course, safeguarding connectivity during a DNS outage is just one of the ways our team provides value to customers. Want to learn more about the capabilities of Cisco Umbrella? Register for one of our live demos or sign up for a Cisco Umbrella free trial today!
1 @akamai “Akamai Summarizes Service Disruption [Tweet]” Twitter.