At OpenDNS, we frequently observe hosts and domain names whose features are similar to what we’ve previously seen with hosts serving malware, but still don’t exactly match a known pattern.

While we want to block malicious content as soon as possible, we also want to keep false positives to a minimum. We don’t want to label a benign web site as Trojan/Generic.Win32 just because it is using Nginx, a non-English name, a previously inactive IP, and a dynamic DNS service. We need to gather evidence.

Injected iframes on compromised sites can often be spotted right on the home page of a web site – but other infection vectors, other stages and C&C servers are less obvious. They might not use HTTP at all, and when they do, the exact path has to be known.

This is why we occasionally need to study what kind of traffic is being sent to these names before labeling and possibly permanently blocking a new cluster of suspicious names. DNS records used by malware are rarely signed, and a DNS resolver can be used to temporarily redirect queries sent to highly suspicious domains to a box dedicated to passive traffic analysis instead, often referred to as a “sinkhole”. In this post, we’ll take a closer look at how we built a scalable DNS sinkhole.

The C10M problem.

It’s time for web servers to handle ten thousand clients simultaneously, don’t you think?

Introduced circa 1999, Dan Kegel’s “C10K problem” refers to different techniques to handle ten thousand clients simultaneously on a single server, even though the traditional BSD sockets API had clearly not been designed to do so.

Networking libraries such as Google’s Nitro hide the complexity of these techniques.

Leveraging the same techniques, a high number of simultaneous clients has been achieved on different platforms such as Node.JS, Clojure or Erlang.

However, these achievements still required beefy servers, and a large amount of memory.

Robert Graham’s excellent talk Defending the Internet at scale pointed out that the kernel network stack is a major bottleneck for scaling network servers. Data has to bounce back and forth between the kernel and userland processes, requiring scheduling, dispatching, copying and system calls.

Nonetheless, 10 million connections or more can easily be achieved on commodity hardware by bypassing the kernel, and letting userland servers do the heavy lifting. He demonstrated this by releasing Masscan, a tool that can scan the entire Internet in under 6 minutes.

The TCP 3-way handshake

A TCP packet includes a header and an optional payload. When a connection to a server is being initiated by a client, two important 32-bit values are involved: a sequence number, and an acknowledgment number.

A typical TCP connection begins with the following handshake:
– The client sends a packet that includes Sc0, a client-chosen sequence number.
– The server replies with a packet that includes Ss0, a server-chosen sequence number, as well as (Sc0 + 1)
– The client sends a packet that includes (Sc0 + 1), as well as (Ss0 + 1). This third packet can, and usually does, include a payload.

Although extensions exist in order to speed up the initial handshake (TFO, TCPCT), they are not widely deployed yet.

Between the second and the third steps, the client checks that the packet received from the server actually includes (Sc0 + 1). After the third step, the server checks that the received packet actually includes (Ss0 + 1). If this is not the case, the packet is discarded. Provided that Sc0 and Ss0 cannot be predicted by an attacker, this mechanism makes blind injections more time-consuming than non-connected protocols such as UDP.

The flip side is that the server has to remember what value had been chosen for Sc0 in order to verify the acknowledgement number sent by the client at the 3rd step.

In addition, keeping states can quickly consume a lot of memory and CPU time. SYN flood attacks used to be a popular type of denial-of-service attacks exploiting this.

Terminating a TCP connection

In order to gracefully terminate a connection, three or four packets are normally required. Each side has to indicate its intent to close the connection, after verifying that the previously received packet acknowledges the correct sequence number. This usually means that states have to be kept until the very last packet.

Although the intent is usually to indicate an error, sending a packet with the RST flag also terminates a connection. And unlike the previous scheme, the connection is terminated immediately and both ways.


Released circa 2000, IPtrap was a tool to log unexpected connections to well-known TCP ports. IPtrap2 is a complete rewrite, designed to securely and reliably process a high amount of traffic.

The host we are running IPtrap2 on has two physical network interfaces:

-The first interface is a management interface, that can only be reached from our local network.

-The second interface is active, but not configured. It doesn’t have any IP addresses and the kernel doesn’t know of any routes to send packets through this interface. In fact, the kernel TCP/IP stack totally ignores it.

In this context, how can packets get routed to this interface?

All it actually takes for this to happen is to have an entry for it in the ARP table of the router.

Even if the kernel TCP/IP stack ignores a physical network adapter, raw ethernet frames received by it can still be inspected by userland applications, provided that they have enough privileges to do so; raw frames can be injected as well.

Thus, we run a minimal application that listens to ARP requests coming from the router, and replies with the IP address and the MAC address we want the router to fill its ARP table with.

This is enough to have the router accept packets sent through this interface, as well as having it forward packets received for the sinkhole IP address.

IPtrap2 itself also bypasses the kernel and directly reads and injects ethernet frames from/to the non-configured interface.

When a TCP packet to initiate a new connection is received, and this packet’s sequence number is Sc0 , IPtrap2 replies with an acknowledgement whose sequence number Ss0 is the output of a keyed hash function:

SS0 = Hk(source-ip || destination-ip || source-port || destination-port || uts)

(with uts being a global time stamp updated every 64 seconds.)

After receiving this acknowledgement, a client can follow through and send a new packet, this time with a payload.

In order to mitigate spoofing attacks, this packet must include an acknowledgment number (Ss0 + 1).

When such a packet is received, the source and destination IP addresses and ports are decoded and a possible valid authentication tag for this packet is computed:

t1 = 1 + Hk(source-ip || destination-ip || source-port || destination-port || uts)

If t1 = (Ss0 + 1), the client is very likely to have initiated the connection.

If the received tag doesn’t match what was expected, it might be due to the global time stamp having changed between the acknowledgment sent by the server, and the tag verification. We thus compute a valid tag for the previous value of uts:

t2 = 1 + Hk(source-ip || destination-ip || source-port || destination-port || uts - 64)

If the acknowledgment number (S0 + 1) sent by the client doesn’t match t2 either, the packet is ignored. This is a common technique to mitigate some denial-of-service attacks (SYN Cookies).

If (Ss0 + 1) matches t1 or t2, the metadata and the payload are sent to a ZeroMQ socket for further inspection (see below), and a packet acknowledging the client data and closing the connection (ACK/RST) is sent as a response.

Note that we never ever have to keep a state. Using a keyed hash function let us verify acknowledgment numbers sent by clients without having to keep track of what sequence numbers were initially sent to them.

As a result, memory usage is constant and no lookups are required, no matter how many clients are simultaneously connected or connecting.

Full-featured TCP/IP stack are way more complicated. In particular, IPtrap2 doesn’t deal with fragmentation. However, its simplicity makes it fast, scalable, and a relief for system operators since it is isolated from the kernel network stack, and no kernel tuning is required.

Processing the output

Because it is decoding raw ethernet frames, IPtrap2 doesn’t “listen” to a specific set of TCP ports. It actually captures any TCP packet, on any port. Obviously, none of these can be used for amplification attacks, as the server negotiates connections but never sent any payloads.

The output of nmap scanning the sinkhole is thus quite unusual.

Only the first packet containing a payload is accepted, and the connection is dropped immediately after. This initial packet is all we need for our research. It is enough to have an idea of what protocol is being used, and get a path in addition to a domain name hosting a HTTP service.

Sinkholed data is primarily meant to be processed by real time classifiers. This is why the output of IPtrap2 is also a ZeroMQ stream, allowing multiple models to simultaneously process the same data.

Packets are currently encoded as JSON objects, soon to be replaced by Cap’n Proto in order to handle binary payloads better and faster.

The OpenDNS sinkhole is a welcome addition to our toolbox in order to keep blocking malware as soon as possible while still having a very low number of false positives.

This post is categorized in: