Following up with our data visualization series, today we’re going to explore a brand new idea with OpenGraphiti. Everyday, network engineers manipulate real and virtual wires to connect people in the most efficient and reliable way possible. After decades of construction and evolution, it is fascinating to step back and contemplate the shape of this giant ecosystem. In this blog post, we will share some techniques to visualize the structure of the AS network: the backbone of our modern communication.
What is an Autonomous System?
First things first, let’s explain a little bit what we’re dealing with here. An autonomous system (AS) is a collection of routers whose prefixes and routing policies are under common administrative control. That could be an ISP, a big company (Google, Facebook, …), a university, or any other large organization. Virtually, an AS represents a group of IP prefixes that have been assigned to that organization and exposes the same routing rules outside the AS for the whole infrastructure.
An unique number is assigned for each AS, we refer to them as Autonomous System Numbers or ASNs. These are essential because the ASNs uniquely identify each network on the Internet. In order to maintain the stability and consistency of the whole network, AS’ use a well defined routing and communication protocol. The Border Gateway Protocol (or BGP) is designed for that matter. It can exchange routing information, and also reachability information between various autonomous systems.
In summary, autonomous systems can be seen as high-level prefix routers. The ASN network is a dynamic graph evolving and changing everyday. It varies on the sources, but there are about 47,000 autonomous systems today linked by a multitude of routing rules. In this article we will showcase an interesting way to explore this large graph in 3D using OpenGraphiti.
The first step in this visualization process is to build graph datasets based on real BGP data. There are several sources on the Internet, but for our research we’re going to use routeviews.org. The URL given in the reference section points to a repository of BGP routing tables updated every 2 hours. This repository also contains small changes on those tables updated every 15 minutes. They are stored in a binary format that requires to be decoded by the bgpdump tool.
Once you decode a file, this is how it looks:
... TABLE_DUMP2|1415052005|B|126.96.36.199|8492|188.8.131.52/24|8492 9002 2914 36692|IGP|184.108.40.206|0|0|8492:1101 9002:9002 9002:64615|NAG|| TABLE_DUMP2|1415052005|B|220.127.116.11|293|18.104.22.168/24|293 2914 36692|IGP|22.214.171.124|0|0||NAG|| TABLE_DUMP2|1415052005|B|126.96.36.199|200130|188.8.131.52/24|200130 1299 3356 36692|IGP|184.108.40.206|0|0|1299:4000 1299:20000 1299:20500|NAG|| ...
We will only focus on certain fields for this article, more precisely the 7th and the 8th:
... 220.127.116.11/24|8492 9002 2914 36692 18.104.22.168/24|293 2914 36692 22.214.171.124/24|200130 1299 3356 36692 ...
Those specific fields define the BGP routing tables: The first one is the IP prefix, the second one represents the AS path to reach it. For instance, any IP belonging to the AS 8492 wanting to reach any IP in 126.96.36.199/24 will have to go through 9002, 2914 and finally 36692. In this case, 36692 is the OpenDNS ASN. Since AS 36692 relies on AS 2914 for its routing, we say that AS 2914 is an upstream provider for 36692 and 36692 is a downstream provider for 2914. In these few lines, we observe that AS 36692 has 2 upstream providers (2914 and 3356) which also have upstream providers and so on.
Here is a diagram representing the BGP routing information of those last 3 lines:
Great! Now we understand how to read part of these BGP routing tables but what does that tell us? Well, if we read the complete file, we can build a list of every AS and its upstream and downstream providers. In other words, we have enough information to create the full AS graph, where each node would be an AS and each edge would represent a BGP route (upstream/downstream relationship). In our case, we chose to create a directed graph with all the upstream relationships: AS node A is directly connected to B, if B is an upstream provider of A. Now, note that this doesn’t necessarily means that A is a downstream provider for B, hence the directed relationship (See ‘directed graph’ in the reference section).
Enriching the model with RIR information
Let’s take it a step further: We decided to enrich our graph dataset with some country code information. To do that we had to parse some RIR data. A RIR (or Regional Internet Registry) is an organization that manages the allocation and registration of Internet resources within a particular region of the world. That include IP addresses and, of course, AS numbers.
There are five of those:
- AfriNIC for Africa.
- ARIN for the United States, Canada, Antartica and some parts of the Caribbean region.
- APNIC for Asia, Australia, New Zealand and neighboring countries.
- LACNIC for Latin America and parts of the Caribbean region.
- RIPE NCC for Europe, Russia, the Middle East and Central Asia.
You can find a couple of links to the RIR data in the references section. As you can see, this data is pretty straightforward to parse: Each line contains the ASN number and a country code which can be extracted and be used as attributes of each AS node in our graph.
AS Network: Country View
As we mention a little bit earlier in this article, the AS network is fairly large. Visualizing such a graph requires a process that is a bit out of the scope of this article—it will definitely be discussed in a later article. Today, we will break down our huge AS graph into smaller pieces and see what goes on at the country level. In order to understand how each country establishes its connectivity to the Internet, we designed an algorithm that extract a subgraph of all the ASNs with a given country code and their adjacent neighbors.
The algorithm works as such:
For each country code, we extract all BGP edges connecting at least 1 AS with country code ‘UA’ (Ukraine). We then extract all the AS nodes connected by the edges previously extracted in step 2. Finally, we store the result in a JSON file, and voilà! After running this program, we obtain a list of JSON files, each containing the subgraph of its country code and adjacent ASNs.
Results & Visuals
The Canadian network has some interesting properties. It is big, highly connected and fairly complex. Its structure is pretty standard for any big country with a developed internet infrastructure. On the picture below we can observe its major hotspots (in bright red). The 2 main ones are: #852: TELUS Communications Inc., and #577: BACOM – Bell Canada.
The Singaporean network exposes noticeable features: First, it’s obviously much smaller than the previous one. But in this one we can clearly see that even though most ASNs are Singaporean, the ‘Internet frontier’ with adjacent countries usually relies on only one or two ASNs. Example on the left of the picture with Bangladesh. This topology is indeed pretty common and is even more apparent in the next example.
Here the topology is even more apparent. This picture highlights 2 big clusters. The one on the right almost entirely Ukrainian and the one on the left with a very high number of connections being almost exclusively composed of non-Ukrainian ASNs (Russia, US, UK …). Meaning that most of Ukraine relies on that huge ASN (#9002: RETN Limited) to access the Web.
Different levels of connectivity
Those 3 pictures give us a fascinating and unique way to look at the topology of our Internet infrastructure at a country view. Now what does that tell us? In network engineering we can differentiate 3 types of providers: Tier 1, Tier 2 and Tier 3. Tier 1 is at the core, and offers the best communication channels—there are only couple of them in the world. Tier 2s purchase transit from Tier 1s, and usually offer access to Tier 3s who typically rely only on them. This 3-level backbone infrastructure is applied in many variations depending on the geography, politics and economy of each country. It is indeed fascinating to study.
Today, we’ve exposed a unique way to explore our Internet backbone and infrastructure. There are a lot more things to be said about it which will be discussed in another article.
We will see that taking a step back and looking at a system as a whole can offer a perspective that holds the key to many topology-based detection algorithms. We hope you enjoyed reading this article and hope you’ll be waiting for the next one. In the meantime, we are happy to share with you a video of the Ukrainian network. This dataset was presented at BlackHat USA 2014 this summer: