Editor’s note: This post is the second in a two-part series on BGP. Read part one here.
The last half-decade has witnessed a number of events that perfectly exemplify what happens when BGP goes wrong. Whether caused purposely by malicious attackers or by accident from routing leaks, the effect is essentially the same: everyone has a bad time. In these cases, many companies and content providers end up becoming collateral damage even if they aren’t directly involved. BGP includes a concerning element in its ability to affect nearly every major Internet service within a short time frame. But what’s most surprising about the protocol, which is in charge of routing nearly all the Internet’s traffic, is that it runs on a system of trust between providers.
The Internet’s Telephone Game
BGP can get complicated, being at the core of what can go wrong with Internet routing. It starts with what are known as Autonomous systems (AS).
An autonomous system is a collection of IP networks managed autonomously–hence the name–under one routing policy.
Each AS gets a unique number (ASN) to identify itself from the regional Internet registry. Each AS is connected to at least one other AS, which will connect it to the rest of the Internet.
A single AS can be connected to dozens or even hundreds of other ASNs, through what is called peering (see part one). It’s easy to picture the entire Internet as a graph, with each node on the graph representing an AS.
Each AS in the graph tells the rest of the Internet via its neighbors for which IP networks (often called prefixes) it is willing to receive traffic. This process is often referred to as “announcing.”
Andree Toonk, founder of BGPmon and manager of network engineering at OpenDNS, says that announcing is the way for a company to establish which prefixes it would like the world to know. Essentially it’s a way of posting a billboard on the Internet superhighway to say, “Hey, I’m GoodShoes, Inc.! And I’m over here!” Routers then use that information to find the best route to that company.
This process is important because there are announcements that are only supposed to stay between two ASNs, not intended for further external use. When one of those ASNs propagates the announcement for the prefixes to the world incorrectly, it is called a “leak.” Leaks happen fairly regularly, and are typically due to configuration a mistake. As a result, traffic is redirected through a different network, which can cause issues like performance problems and security related concerns.
BGP hijacks are a similar problem. With a BGP leak, an AS inserts itself in the middle of a path, but in a hijack an AS simply claims it owns the prefixes and should receive all traffic for it.
BGP Leaks: Simple, Yet Costly
Tier 1 provider Level3 recently became an unfortunate case study on the drastic effects of a mistaken BGP leak.
It started when a Malaysian ISP incorrectly announced a large amount of routes–BGPmon estimates about 179,000 of them.
Level3 then incorrectly accepted those routes as good, and re-routed a massive flood of traffic to the ISP in Malaysia, by propagating the new routes to all its customers. It was a flood Telekom Malaysia did not intend, and was not equipped to handle, which then began dropping packets on a large scale. This event is essentially a mistaken “man-in-the-middle.” As a result, a number of major Internet services like Snapchat, Skype, and Google ended up having services degraded for about three hours.
Three hours might not seem like much time, but in Internet time where millions of transactions take place every second, it can be extremely costly. And, according to Toonk, there’s a number of different ways these can occur and aside from some basic filtering, there’s not much to prevent them from happening.
“There’s a whole philosophical debate as to who owns more responsibility in incidents like these,” he said. “Given how large Level3 is, and its pivotal role on the Internet, it has the greater responsibility in fixing issues like this because it’s what the company does. But obviously if the Malaysian telecom [company] had not made the error in the first place, it never would have happened.”
BGP Hijacks: Simple, Yet Profitable
One famous example of an entity conducting a BGP hijack on purpose was the Turkish hijack of Google DNS, OpenDNS, and Level3 in the spring of 2014. At the time, a corruption scandal was brewing, and to circumvent blocks placed on Twitter and YouTube by large ISPs in Turkey, civilians began using free DNS services. In a ploy to stop the dissenters from spreading their message, Turk Telekom began hijacking DNS traffic using BGP announcements.
Malicious hijacking can cause some damaging results. Toonk commented that if a hijack is successful, “the rate of impact can be severe.”
But not all are politically motivated. Some are monetarily motivated, like the case of a large-scale hijack affecting at least 19 ISPs conducted by an attacker looking to steal Bitcoins. Going back to the concept of time during these sorts of attacks–or even in mistaken leaks like the one involving Malaysia Telekom–every second is damaging. The hacker who perpetrated the Bitcoin theft averaged around 30 seconds during each traffic hijack. Even so, after at least 22 attacks, it was enough time for him to steal around $9,000 a day, and an estimated $83,000 total.
Routing Better: Time for a BGP Change
Hijacks and leaks occur because the BGP framework–the system that runs the entire Internet–currently operates on a system of trust, and wasn’t designed with security built into it. There is currently no easy way to verify information from a provider.
“In BGP I just have to trust whatever other providers are telling me, is true,” Toonk said. “But there’s no way to verify for sure. On the web, for instance, we have HTTPs. With a certain degree of trust, you can tell if the site you’re on is legitimate. With BGP, if you say you can reach Google, I just have to trust that is true.”
And currently, Toonk said, there’s no easy way for the owner of an IP network to even prevent leaks and hijacks from occurring. ISPs and users of BGP can only monitor for incorrect routing events. This reality was confirmed by Job Snijders, an IP developer for NTT America, a major Tier 1 ISP.
According to Snijders, there were a number of measures Level3 could have taken before and following the major route leak on June 12. He referenced an open letter to Level3 that suggested a number of measures the provider could have taken to prevent the leak from happening in the first place. One was maximum prefix limits–a way of setting a maximum number of new prefixes a peer can announce. Snijders said it’s up to providers and routers to change their behavior to prevent new announcements and routing changes from being accepted in bulk automatically.
Some of the responsibility to prevent leaks also falls on manufacturers. Routers are by default often set to advertise every route they can possibly reach. “It would be a great win if manufacturers would provide network operators with a configuration knob that changes this default behavior ,” he said. “Leaks occur when filters don’t properly prevent this from happening.”
A public key infrastructure called RPKI (resource public key infrastructure) could also help validate ASNs and IPs, but similar to DNSSEC, it’s running into difficulties with adoption, as not everyone in the industry is convinced it is the best fix.
However, not all preventative measures need to be highly technical. Toonk says monitoring like the type that BGPMon provides can be a huge help in limiting attack or leak time and troubleshooting efforts. Proper filtering on the ISP side can also prevent a lot of issues.
Or, as Snijders added, sometimes it might be as simple as picking up the phone. “When you see a route leak incident happen, get on the phone,” he said. “It doesn’t matter whether you are a customer, supplier, or just a bystander. Literally keep calling until you get through to someone who can fix the issue.”
In the end, the system of trust that is holding up the Internet can also be used for good. “Perseverance always wins,” Snijders said. “We’re in this together. We need to help each other.”