I want to include a caching name server in an appliance (BIND 9.3.4). The /etc/resolv.conf points to localhost and the caching BIND server forwards every request to the DHCP-assigned DNS server, in case of Fusion's NAT to 172.16.8.2. Fusion's name server seems to act a little odd. First, it overwrites all the TTLs to 5 seconds. What is the reason for doing so?
But where it breaks is with IPv6 queries. Programs like ntpd try to get an AAAA record first, and if that fails they try to find an A record. When asked for a non-existent AAAA record, the response from 172.16.8.2 somehow poisons the BIND cache: in the next 5 seconds, a request for A also returns just a CNAME, i.e. the program cannot resolve the host. I tried to forward to the upstream DNS server instead, which works fine. Also VirtualBox's NAT DNS server works. Any ideas?
I just noticed the same thing on my VMware Fusion 3, with the exactly same symptoms.
To begin demonstrating the problem, the DNS label "purple.the-7.net" holds the following resource records (RRs):
$ dig @ns1.the-7.net purple.the-7.net IN ANY +norecurse +vc +noall +answer ; <<>> DiG 9.6.1-P1 <<>> @ns1.the-7.net purple.the-7.net IN ANY +norecurse +vc +noall +answer ; (2 servers found) ;; global options: +cmd purple.the-7.net. 300 IN A 64.71.156.44 purple.the-7.net. 300 IN KEY 512 3 3 CL6UZhTjW3mcP7QP5dtOVD1AO0OHjHLhbVIU0JJXoxCt85nFNyx01r6q eGswFz05tWc/Mpuk+E3sybnt1shzJWLWLaiSTUoJC6+RszLNQfHQep2P GLiQqTbZUPZZ45trDuppON79Sl71WZZyy2u0FLSGrrV5tb6AvRgX32wE EOoRW2O9QR0LG0oQXbJZL3/WpTpd33kSs+8nyV+bW7BfjtsQqydcfNvV tOEdoPUBtu/q5bCqefmvyoowuTlQG9NHW73E8j0OQkEgeg1xlS++91Bg vkkTyONfUePIL81Q2+qEHZPOyg67KWtK+66z6qW3EUrQ+K13R/7ZZtMt s4uMw8eb+8UvsKOKF4YS9vRvgQu71BkXU4uAudJTSEgVjOQyZaj4XbYv vBfwwSU7u2RWrsKPB3kkohq7mZWPcbWF4cdOCnrecJQEG+Q9POFsdG/U x7eoOtMoQs6UX4kFTTrZlL7MQV4Gw738Caoq6cWIM6xAuEReFJjgJqZt /7SNXV/P6SsRVAPDS6OPr4UgdDhxv9EUYOiL purple.the-7.net. 0 IN AAAA 2001:470:1f01:622::c purple.the-7.net. 300 IN A6 64 ::c colo1-net.the-7.net. $
As seen above, the label holds no record, for instance, of the type SRV.
Now, when a correctly operating nameserver (10.0.0.1 in this case) is queried for a label with a nonexistent type, the response should include no "answer" records:
$ dig @10.0.0.1 purple.the-7.net IN SRV ; <<>> DiG 9.6.1-P1 <<>> @10.0.0.1 purple.the-7.net IN SRV ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44166 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;purple.the-7.net. IN SRV ;; AUTHORITY SECTION: the-7.net. 60 IN SOA ns1.the-7.net. hostmaster.the-7.net. 2006090954 10800 3600 604800 60 ;; Query time: 281 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Jan 8 18:11:53 2010 ;; MSG SIZE rcvd: 85 $
However, VMware Fusion's DNS proxy (192.168.240.2) behaves differently:
$ dig @192.168.240.2 purple.the-7.net IN SRV ; <<>> DiG 9.6.1-P1 <<>> @192.168.240.2 purple.the-7.net IN SRV ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45144 ;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;purple.the-7.net. IN SRV ;; ANSWER SECTION: purple.the-7.net. 5 IN CNAME purple.the-7.net. purple.the-7.net. 5 IN A 64.71.156.44 ;; Query time: 60 msec ;; SERVER: 192.168.240.2#53(192.168.240.2) ;; WHEN: Fri Jan 8 18:13:14 2010 ;; MSG SIZE rcvd: 80 $
Right now this problem causes my FreeBSD guest to emit spurious warnings:
$ grep 'AAAA' /var/log/messages | tail Jan 8 18:07:55 blue firefox-bin: gethostby*.getanswer: asked for "www.bind9.net IN AAAA", got type "A" Jan 8 18:07:56 blue firefox-bin: gethostby*.getanswer: asked for "www.zytrax.com IN AAAA", got type "A" Jan 8 18:07:56 blue firefox-bin: gethostby*.getanswer: asked for "www.faqs.org IN AAAA", got type "A" Jan 8 18:08:13 blue firefox-bin: gethostby*.getanswer: asked for "ftp.is.co.za IN AAAA", got type "A" Jan 8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "www.dnssec-tools.org IN AAAA", got type "A" Jan 8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "www.freesoft.org IN AAAA", got type "A" Jan 8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "dnsjava.org IN AAAA", got type "A" Jan 8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "dnsruby.rubyforge.org IN AAAA", got type "A" Jan 8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "lists.isc.org IN AAAA", got type "A" Jan 8 18:18:53 blue firefox-bin: gethostby*.getanswer: asked for "versioncheck.addons.mozilla.org IN AAAA", got type "A" $
Here, Firefox is trying to resolve those domain names into IPv6 address records (AAAA), and the resolver library (called by Firefox) is aggravated by the fact that the DNS server (VMware's proxy) insists upon returning records of a wrong type (A).
Could we expect this problem to be fixed in a future version, hopefully soon?
Thanks,
Eugene
P.S. By the way, the DNS records shown there are live examples under my administration, not just fictitious ones; developers are encouraged to make experimental queries.
I reproduced the same results you presented. Short of a fix in the dns proxy, you can also reconfigure VMware dhcpd to return an upstream DNS server or if you roam, you can select a public DNS provider like OpenDNS or Google DNS.
This is the section I modified in /Library/Application Support/VMware Fusion/vmnet8:
subnet 172.16.208.0 netmask 255.255.255.0 { range 172.16.208.128 172.16.208.254; option broadcast-address 172.16.208.255; option domain-name-servers 208.67.222.222; option domain-name localdomain; default-lease-time 1800; # default is 30 minutes max-lease-time 7200; # default is 2 hours option routers 172.16.208.2; }
To take effect, I restarted Fusion nat services with vmnet-cli --stop and --start as root in /Library/Application Support/VMware Fusion. After renewing my DHCP client lease, /etc/resolv.conf had my reconfigured DNS server and dig reports the correct results.
Yes, the problem is solved by adding those lines, mostly. The only remaining bug is the TTL rewrite (it is fixed at 5 seconds for some reason); although it does violate the DNS standards, I don't think it will cause serious problems in practice.
Thank you,
Eugene
Are you sure it happens with prohibitHostLookup set? 5 seconds is used only when natd is inventing reply altogether - either when prohibitHostLookup is not set, and request contained ".localdomain" suffix, or if prohibitHostLookup is not set, and response received had zero ancount. If prohibitHostLookup is set, you should get exact record host's res_nsend() returns for request you sent from guest.
Yes, it does happen with prohibitHostLookup set. In fact, the TTL seems rewritten in all the records returned by the VMware's proxy (192.168.240.2):
$ dig @192.168.240.2 purple.the-7.net IN SRV # to make sure prohibitHostLookup is working; purple.the-7.net has no SRV records ; <<>> DiG 9.6.1-P1 <<>> @192.168.240.2 purple.the-7.net IN SRV ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27180 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;purple.the-7.net. IN SRV ;; AUTHORITY SECTION: the-7.net. 5 IN SOA ns1.the-7.net. hostmaster.the-7.net. 2006090954 10800 3600 604800 60 ;; Query time: 3 msec ;; SERVER: 192.168.240.2#53(192.168.240.2) ;; WHEN: Wed Jan 13 23:40:08 2010 ;; MSG SIZE rcvd: 85 $ dig @192.168.240.2 google.com IN A ; <<>> DiG 9.6.1-P1 <<>> @192.168.240.2 google.com IN A ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55425 ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 5 IN A 74.125.19.103 google.com. 5 IN A 74.125.19.104 google.com. 5 IN A 74.125.19.105 google.com. 5 IN A 74.125.19.106 google.com. 5 IN A 74.125.19.147 google.com. 5 IN A 74.125.19.99 ;; AUTHORITY SECTION: google.com. 5 IN NS ns1.google.com. google.com. 5 IN NS ns3.google.com. google.com. 5 IN NS ns4.google.com. google.com. 5 IN NS ns2.google.com. ;; Query time: 4 msec ;; SERVER: 192.168.240.2#53(192.168.240.2) ;; WHEN: Wed Jan 13 23:40:13 2010 ;; MSG SIZE rcvd: 196 $ dig @192.168.240.2 google.com IN A ; <<>> DiG 9.6.1-P1 <<>> @192.168.240.2 google.com IN A ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45467 ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 5 IN A 74.125.19.99 google.com. 5 IN A 74.125.19.103 google.com. 5 IN A 74.125.19.104 google.com. 5 IN A 74.125.19.105 google.com. 5 IN A 74.125.19.106 google.com. 5 IN A 74.125.19.147 ;; AUTHORITY SECTION: google.com. 5 IN NS ns2.google.com. google.com. 5 IN NS ns3.google.com. google.com. 5 IN NS ns4.google.com. google.com. 5 IN NS ns1.google.com. ;; Query time: 3 msec ;; SERVER: 192.168.240.2#53(192.168.240.2) ;; WHEN: Wed Jan 13 23:40:15 2010 ;; MSG SIZE rcvd: 196 $ dig @10.0.0.1 google.com IN A ; <<>> DiG 9.6.1-P1 <<>> @10.0.0.1 google.com IN A ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62337 ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 264 IN A 74.125.19.147 google.com. 264 IN A 74.125.19.99 google.com. 264 IN A 74.125.19.103 google.com. 264 IN A 74.125.19.104 google.com. 264 IN A 74.125.19.105 google.com. 264 IN A 74.125.19.106 ;; AUTHORITY SECTION: google.com. 155100 IN NS ns1.google.com. google.com. 155100 IN NS ns3.google.com. google.com. 155100 IN NS ns4.google.com. google.com. 155100 IN NS ns2.google.com. ;; Query time: 2 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Wed Jan 13 23:40:23 2010 ;; MSG SIZE rcvd: 196 $ dig @10.0.0.1 google.com IN A ; <<>> DiG 9.6.1-P1 <<>> @10.0.0.1 google.com IN A ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12864 ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 262 IN A 74.125.19.106 google.com. 262 IN A 74.125.19.147 google.com. 262 IN A 74.125.19.99 google.com. 262 IN A 74.125.19.103 google.com. 262 IN A 74.125.19.104 google.com. 262 IN A 74.125.19.105 ;; AUTHORITY SECTION: google.com. 155098 IN NS ns3.google.com. google.com. 155098 IN NS ns1.google.com. google.com. 155098 IN NS ns2.google.com. google.com. 155098 IN NS ns4.google.com. ;; Query time: 4 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Wed Jan 13 23:40:25 2010 ;; MSG SIZE rcvd: 196 $
As shown above, TTL is fixed at 5 seconds in all the results from VMware's proxy, but decrements in real time in the results from the upstream DNS server as expected.
On your point about VMware Fusion's verbatim use of the result returned by res_nsend(): I wrote a simple program to check if Mac OS X's implementation of res_nsend() is the culprit that rewrites TTL, but res_nsend() seems just fine:
$ cat res_send_test.c #include <sys/types.h> #include <netinet/in.h> #include <arpa/nameser.h> #include <err.h> #include <resolv.h> #include <stdio.h> #include <string.h> #include <sysexits.h> int main (int argc, char **argv, char **envp) { unsigned char query[1024], reply[1024]; int query_len, reply_len; struct __res_state res0; res_state res = &res0; if (argc != 2) errx(EX_USAGE, "usage: res_send_test <domain name>"); memset(res, 0, sizeof(*res)); res_ninit(res); res->options |= RES_DEBUG; /* this parse-prints result received from server */ query_len = res_mkquery(ns_o_query, /* op */ argv[1], ns_c_in, ns_t_a, /* dname, class, type */ NULL, 0, NULL, /* data, datalen, newrr_in */ query, sizeof(query) /* buf, buflen */); if (query_len == -1) errx(EX_UNAVAILABLE, "res_mkquery() failed"); reply_len = res_nsend(res, query, query_len, reply, sizeof(reply)); if (reply_len == -1) errx(EX_UNAVAILABLE, "res_nsend() failed"); return 0; } $ cc -g -O0 -Wall -Werror res_send_test.c -o res_send_test -lresolv $ ./res_send_test google.com ;; res_send() ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48117 ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; google.com, type = A, class = IN ;; Querying server (# 1) address = 10.0.0.1 ;; new DG socket ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48117 ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0 ;; google.com, type = A, class = IN google.com. 3m23s IN A 74.125.19.104 google.com. 3m23s IN A 74.125.19.105 google.com. 3m23s IN A 74.125.19.106 google.com. 3m23s IN A 74.125.19.147 google.com. 3m23s IN A 74.125.19.99 google.com. 3m23s IN A 74.125.19.103 google.com. 1d18h16m14s IN NS ns4.google.com. google.com. 1d18h16m14s IN NS ns3.google.com. google.com. 1d18h16m14s IN NS ns1.google.com. google.com. 1d18h16m14s IN NS ns2.google.com. $
Hope this helps,
Eugene