Researchers and scientists use data visualizations to better understand data and communicate results. Good visualizations can provide insight into a dataset that might otherwise be overlooked. In this post, we’ll go through the process of creating graphic insight from an abstract dataset by building an actual data visualization step-by-step.
First: The Data
To build this visualization, we start with a 10-minute log chunk of raw DNS data that gets dumped into an Amazon S3. Log chunks are the rawest form of data the OpenDNS research team uses to do analysis, and they make for a good place to start talking about the life of a data visualization. If you want to know more about the process of getting log chunks, check out this post from Josh Pyorre.
Log chunks are text data that won’t be useful for our visualization without some cleaning and parsing. The goal of the visualization is to see what the traffic looks like when connecting resolver requests based on the order they were received. To create latitude and longitude coordinates with IP addresses, I used the Maxmind API available via pip. This is all put together in a browser-consumable .csv file (see some of the python code below).
Next: The Visualization
After a first step of creating a scene with lighting, next a globe on which all of the lines will sit should be created, which is easy in Three.js. To help make your planet look more realistic, you can find earth textures that contain elevation/bump maps with a quick Google search; GPUs can then sample those 2D texture images and project the texture on your 3D globe object using UV mapping. You can use as many textures as you like to render your earth replica, but only basic earth texture mapping is required. For the sake of clarity, less is definitely more here; the simpler your graph or chart is, the more likely its insight is to be easily understood.
After finding appropriate textures, you can use the handy ImageUtils and loadTexture method to load the image, making sure to pass in the THREE.UV Mapping object. Then, Three.js will map your textures onto a sphere geometry with dimensions you select to create your planet.
With a nice planet now in place, it’s time to make our line segments. Each line is a spline curve, which is a general curve defined by piecewise polynomial functions. Each spline is split into 8 pieces, evenly split from the starting to the ending latitude/longitude pair. 3D coordinates are created from each of the points using an arc radius determined by using the radius of our earth replica and the current part of the spline.
The spline is used with the getPoint method to create small line segments that are joined together to form a geometric curve which connects domains that were resolved after each other. The number of line segments is determined by control points; the larger the number of control points, the more computationally expensive each line is to draw because each segment requires its own attributes to direct placement and coloration of the line segment.
Below you can see how line segments that have been individually colored with random colors look.
Finally: The Completed Visualization
These are some images from the interactive version of the visualization.
Here, Los Angeles logs are blueish green and Chicago logs are white:
Miami query logs, one of our busiest resolvers:
Hong Kong query logs are green:
Amsterdam query logs are white:
Any number of improvements to this visualization would help it reveal better patterns. For instance, encoding the color of the line segments with Investigate data would give insight on attacks, be relatively easy to do and create a beautiful effect.
Another interesting improvement would be to render the connections using edge bundling, a force-directed visualization technique that bundles line segments along the same path, much the same the way one might organize wires. The effect removes visual clutter and helps make patterns stand out more clearly. To learn more about how it works, check out edge bundling research online.