If you’re using a dual-stacked machine and have multiple search domains in your resolv.conf, you can potentially end up resolving multiple addresses which correspond to completely different hosts.
At OpenDNS we care a lot about DNS and issues related to DNS resolution, so when we came across some odd behaviors in resolving hosts on dual-stacked machines (meaning they have both IPv4 and IPv6 addresses), we wanted the world to know. Hopefully the following can help prepare users for some of the gotchas and maybe even inspire certain operating-system maintainers to release a few patches.
Our test environment was fairly straight forward, we set up an authoritative nameserver with 3 different sub-domains defined. We gave the first sub-domain an IPv4 address only, the second an IPv6 address only, and the third was given both IPv4 and IPv6 addresses.
# Domain TTL Type Address gw.network1.opendns.com 30 a 1.1.1.1 gw.network2.opendns.com 30 aaaa 2001:470:b:1fc::1 gw.network3.opendns.com 30 a 2.2.2.2 gw.network3.opendns.com 30 aaaa 2001:470:b:1fc::2
We then went onto four different machines with four different operating system, configured them to be dual-stacked, and configured their resolv.conf to have network{1,2,3}.opendns.com in their search domains:
# configuration syntax varies across OS’s nameserver 67.215.93.85 search network1.opendns.com network2.opendns.com network3.opendns.com
We then ran tcpdump to watch the outbound DNS requests and used a simple wrapper program around getaddrinfo() to perform the lookups.
What should happen when we try to resolve the hostname “gw”?
We would expect, if an OS is following the “prefer IPv6” standard that we should get “2001:470:b:1fc::1” (the IPv6 address of gw.network2.opendns.com). We expect this because according to the specified search domains, the OS should go through the search domains in the order network1, network2, network3, but also because the OS should be preferring IPv6.
For OS’s that follow the Happy-Eyeballs RFC / style we would expect to get “1.1.1.1” as this would be the first thing that would be found as defined by the search domains order.
So let’s have a look at what the OS’s actually did…
OSX – 10.9.3
# sudo tcpdump -n -s 1500 -i en0 udp port 53 10.10.10.165.63835 > 67.215.93.85.53: 4769+ A? gw.network1.opendns.com. (41) 10.10.10.165.55987 > 67.215.93.85.53: 4925+ AAAA? gw.network1.opendns.com. (41) 67.215.93.85.53 > 10.10.10.165.63835: 4769 1/0/0 A 1.1.1.1 (57) 67.215.93.85.53 > 10.10.10.165.55987: 4925 0/0/0 (41) 10.10.10.165.62135 > 67.215.93.85.53: 3489+ AAAA? gw.network2.opendns.com. (41) 67.215.93.85.53 > 10.10.10.165.62135: 3489 1/0/0 AAAA 2001:470:b:1fc::1 (69)
From the tcpdump output we can see that OSX has decided to send out A (IPv4) and AAAA (IPv6) requests to the first search domain (network1.opendns.com) simultaneously.
When the A request gets a valid response, but the AAAA response receives a NODATA response, the OS continues down the search domains list to try and resolve a AAAA answer. The OS then requests a AAAA from network2.opendns, which responds with a valid AAAA answer. Lets now look at what was actually returned by the getaddrinfo call:
# ./getaddrinfo gw AF_INET6 (IPv6) - SOCK_STREAM - IPPROTO_TCP - 2001:470:b:1fc::1 AF_INET (IPv4) - SOCK_STREAM - IPPROTO_TCP - 1.1.1.1
This where things get odd, the OS’s underlying resolving library has returned us two answers to our request for the domain “gw”, but the actual corresponding host behind each IP are completely different. The IPv4 “1.1.1.1” is for gw.network1.opendns.com and the IPv6 “2001:470:b:1fc::1” answer is for “gw.network2.opendns.com”. At this point, it would be up to the calling application to choose which answer or protocol it prefers, but there is no indication that these correspond to completely different hosts… That’s bad…
Linux – Ubuntu 12.04
67.215.92.110.14766 > 67.215.93.85.53: 1096+ AAAA? gw.network1.opendns.com. (41) 67.215.93.85.53 > 67.215.92.110.14766: 1096 0/0/0 (41) 67.215.92.110.21993 > 67.215.93.85.53: 51327+ AAAA? gw.network2.opendns.com. (41) 67.215.93.85.53 > 67.215.92.110.21993: 51327 1/0/0 AAAA 2001:470:b:1fc::1 (69) 67.215.92.110.32972 > 67.215.93.85.53: 50275+ A? gw.network1.opendns.com. (41) 67.215.93.85.53 > 67.215.92.110.32972: 50275 1/0/0 A 1.1.1.1 (57)
AF_INET6 (IPv6) - SOCK_STREAM - IPPROTO_TCP - 2001:470:b:1fc::1 AF_INET (IPv4) - SOCK_STREAM - IPPROTO_TCP - 1.1.1.1
Linux doesn’t fair much better. Although it does start resolving for an IPv6 host first, once it’s found one, it continues to resolve an IPv4 address as well. Also, just like OSX, Linux’s getaddrinfo() returns addresses for both gw.network1.opendns.com and gw.network2.opendns.com.
FreeBSD 10.*
10.11.13.129.34506 > 67.215.93.85.53: 44935+ A? gw.network1.opendns.com. (41) 67.215.93.85.53 > 10.11.13.129.34506: 44935 1/0/0 A 1.1.1.1 (57) 10.11.13.129.55819 > 67.215.93.85.53: 44936+ AAAA? gw.network1.opendns.com. (41) 67.215.93.85.53 > 10.11.13.129.55819: 44936 0/0/0 (41)
AF_INET (IPv4) - SOCK_STREAM - IPPROTO_TCP - 1.1.1.1
FreeBSD restores our faith in resolving principals. From the tcpdump we can see that it has tried to resolve both an A and AAAA record for the first domain in the resolve search list, and because it found a match (on the A record) it stops there. The getaddrinfo() has returned only the matching IPv4 address.
If we repeat this test but move the search domain of “network2” ahead of “network1”, we can see that now (correctly) only the IPv6 address is returned.
10.11.13.129.34506 > 67.215.93.85.53: 44935+ A? gw.network2.opendns.com. (41) 67.215.93.85.53 > 10.11.13.129.34506: 44935 NXDomain 0/1/0 (116) 10.11.13.129.55819 > 67.215.93.85.53: 44936+ AAAA? gw.network2.opendns.com. (41) 67.215.93.85.53 > 10.11.13.129.55819: 44936 1/0/0 AAAA[|domain]
AF_INET6 (IPv6) - SOCK_STREAM - IPPROTO_TCP - 2001:470:b:1fc::1
If the test is then repeated with “network3” as the first search domain, we (correctly) get both the IPv4 and IPv6 addresses of gw.network3.opendns.com
AF_INET6 (IPv6) - SOCK_STREAM - IPPROTO_TCP - 2001:470:b:1fc::2 AF_INET (IPv4) - SOCK_STREAM - IPPROTO_TCP - 2.2.2.2
Windows 8.1
Microsoft’s current offering also gets a passing grade for dual-stacked search domain use. Windows 8.1 performed exactly the same as FreeBSD in all tests scenarios, giving all the expected answers. This shouldn’t come as a big surprise as most of the Windows network stack is derived from BSD source, and we’d be surprised if the resolving libraries weren’t used as well.
As we can see from the above results, resolving domains on a dual-stacked machine that uses search domains can lead to some confusing results. The recent OS offerings from OSX and Ubuntu leaves the onus on the calling application to decide how to deal with the coming dual stacked world. Fortunately though, the offerings from FreeBSD and Windows do show that some implementations foresaw this issue and implemented the expected functionality.