Some time ago, my teammate Yariv blogged about the OpenDNS Intelligent Proxy, which allows us to go beyond the DNS layer and block malicious HTTP traffic. Our team has since been focused on other projects, like taking ownership of and consolidating one of the oldest parts of our infrastructure called the landers, which freed up more than 70 servers as a result — as well as some exciting new features we’ll talk about when it’s time.
Today I want to go over the Intelligent Proxy — and the technology that powers it — in little more detail, namely Nginx.
Conventionally a proxy is configured explicitly, either in your OS’s network settings or within a particular program, like Chrome or Firefox in case of HTTP proxying.
In addition, protocols are in place to ensure the proxy server can always determine what the client’s intended destination was at the time of the request. But as Yariv explained in his post, we’ve taken an unconventional approach and instead of proxying everything (explicit or not) we selectively re-route requests to suspicious domains to our proxy via the DNS layer. This selectivity is great for reducing latency, load, and impact but it also introduces some interesting engineering challenges — mainly around identifying users and determining what was the original destination.
For example, when a user tries to browse to “some-website.net,” the OpenDNS resolvers return the IP address of the nearest Intelligent Proxy server if the domain is classified as suspicious. The client, e.g. Firefox or Chrome, has no knowledge of this and assumes the IP address it received belongs to the server actually hosting “some-website.net.” In the case of plain HTTP, it’s easy to determine what the original destination was, because HTTP/1.1 requires the Host header to be set with each HTTP request, and modern browsers will correctly include this header. Shared hosting providers, as an example, rely on this header when serving multiple websites behind a single IP. Similarly, HTTPS traffic can be proxied by taking advantage of the Server Name Indication (SNI) extension of the TLS protocol. The process is more complex (even impossible) for other ports and protocols.
Another important concept is the idea of a “forward” proxy vs a “reverse” proxy. A forward proxy serves a group of clients, acting as a single point of access and querying origin server(s) on behalf of the clients. This is the type of proxy you use when configuring one in your OS or in a browser like Firefox, as mentioned earlier.
A reverse proxy does the opposite and acts as a single point of access for multiple server components, such as CGI scripts, file servers, or databases. These proxies are also commonly used as load balancers and SSL termination points.
Based on this, our Intelligent Proxy is a forward proxy when it comes to serving client requests that have been routed to it. But it also has some reverse proxying to do internally, especially as we add new features and new layers of data inspection. I had also mentioned at the beginning that the technology we chose is Nginx, and readers familiar with Nginx will know it’s designed to be purely a reverse proxy.
I’ll discuss some more of the unconventional approaches we’ve taken as a result, and challenges we had to solve, in my next post.