6 Replies Latest reply on Jan 14, 2010 12:38 AM by EugeneKim

    Caching DNS behind Fusion's NAT

    cernvm Lurker

      I want to include a caching name server in an appliance (BIND 9.3.4).  The /etc/resolv.conf points to localhost and the caching BIND server forwards every request to the DHCP-assigned DNS server, in case of Fusion's NAT to 172.16.8.2.  Fusion's name server seems to act a little odd.  First, it overwrites all the TTLs to 5 seconds.  What is the reason for doing so?

       

      But where it breaks is with IPv6 queries.  Programs like ntpd try to get an AAAA record first, and if that fails they try to find an A record.  When asked for a non-existent AAAA record, the response from 172.16.8.2 somehow poisons the BIND cache: in the next 5 seconds, a request for A also returns just a CNAME, i.e. the program cannot resolve the host.  I tried to forward to the upstream DNS server instead, which works fine.  Also VirtualBox's NAT DNS server works.  Any ideas?

        • 1. Re: Caching DNS behind Fusion's NAT
          EugeneKim Novice

          I just noticed the same thing on my VMware Fusion 3, with the exactly same symptoms.

           

          To begin demonstrating the problem, the DNS label "purple.the-7.net" holds the following resource records (RRs):

           

          $ dig @ns1.the-7.net purple.the-7.net IN ANY +norecurse +vc +noall +answer
          
          ; <<>> DiG 9.6.1-P1 <<>> @ns1.the-7.net purple.the-7.net IN ANY +norecurse +vc +noall +answer
          ; (2 servers found)
          ;; global options: +cmd
          purple.the-7.net.     300     IN     A     64.71.156.44
          purple.the-7.net.     300     IN     KEY     512 3 3 CL6UZhTjW3mcP7QP5dtOVD1AO0OHjHLhbVIU0JJXoxCt85nFNyx01r6q eGswFz05tWc/Mpuk+E3sybnt1shzJWLWLaiSTUoJC6+RszLNQfHQep2P GLiQqTbZUPZZ45trDuppON79Sl71WZZyy2u0FLSGrrV5tb6AvRgX32wE EOoRW2O9QR0LG0oQXbJZL3/WpTpd33kSs+8nyV+bW7BfjtsQqydcfNvV tOEdoPUBtu/q5bCqefmvyoowuTlQG9NHW73E8j0OQkEgeg1xlS++91Bg vkkTyONfUePIL81Q2+qEHZPOyg67KWtK+66z6qW3EUrQ+K13R/7ZZtMt s4uMw8eb+8UvsKOKF4YS9vRvgQu71BkXU4uAudJTSEgVjOQyZaj4XbYv vBfwwSU7u2RWrsKPB3kkohq7mZWPcbWF4cdOCnrecJQEG+Q9POFsdG/U x7eoOtMoQs6UX4kFTTrZlL7MQV4Gw738Caoq6cWIM6xAuEReFJjgJqZt /7SNXV/P6SsRVAPDS6OPr4UgdDhxv9EUYOiL
          purple.the-7.net.     0     IN     AAAA     2001:470:1f01:622::c
          purple.the-7.net.     300     IN     A6     64 ::c colo1-net.the-7.net.
          $ 

           

          As seen above, the label holds no record, for instance, of the type SRV.

           

          Now, when a correctly operating nameserver (10.0.0.1 in this case) is queried for a label with a nonexistent type, the response should include no "answer" records:

           

          $ dig @10.0.0.1 purple.the-7.net IN SRV 
          
          ; <<>> DiG 9.6.1-P1 <<>> @10.0.0.1 purple.the-7.net IN SRV
          ; (1 server found)
          ;; global options: +cmd
          ;; Got answer:
          ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44166
          ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
          
          ;; QUESTION SECTION:
          ;purple.the-7.net.          IN     SRV
          
          ;; AUTHORITY SECTION:
          the-7.net.          60     IN     SOA     ns1.the-7.net. hostmaster.the-7.net. 2006090954 10800 3600 604800 60
          
          ;; Query time: 281 msec
          ;; SERVER: 10.0.0.1#53(10.0.0.1)
          ;; WHEN: Fri Jan  8 18:11:53 2010
          ;; MSG SIZE  rcvd: 85
          
          $ 

           

          However, VMware Fusion's DNS proxy (192.168.240.2) behaves differently:

           

          $ dig @192.168.240.2 purple.the-7.net IN SRV
          
          ; <<>> DiG 9.6.1-P1 <<>> @192.168.240.2 purple.the-7.net IN SRV
          ; (1 server found)
          ;; global options: +cmd
          ;; Got answer:
          ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45144
          ;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
          
          ;; QUESTION SECTION:
          ;purple.the-7.net.          IN     SRV
          
          ;; ANSWER SECTION:
          purple.the-7.net.     5     IN     CNAME     purple.the-7.net.
          purple.the-7.net.     5     IN     A     64.71.156.44
          
          ;; Query time: 60 msec
          ;; SERVER: 192.168.240.2#53(192.168.240.2)
          ;; WHEN: Fri Jan  8 18:13:14 2010
          ;; MSG SIZE  rcvd: 80
          
          $ 

           

          Right now this problem causes my FreeBSD guest to emit spurious warnings:

           

          $ grep 'AAAA' /var/log/messages | tail  
          Jan  8 18:07:55 blue firefox-bin: gethostby*.getanswer: asked for "www.bind9.net IN AAAA", got type "A"
          Jan  8 18:07:56 blue firefox-bin: gethostby*.getanswer: asked for "www.zytrax.com IN AAAA", got type "A"
          Jan  8 18:07:56 blue firefox-bin: gethostby*.getanswer: asked for "www.faqs.org IN AAAA", got type "A"
          Jan  8 18:08:13 blue firefox-bin: gethostby*.getanswer: asked for "ftp.is.co.za IN AAAA", got type "A"
          Jan  8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "www.dnssec-tools.org IN AAAA", got type "A"
          Jan  8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "www.freesoft.org IN AAAA", got type "A"
          Jan  8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "dnsjava.org IN AAAA", got type "A"
          Jan  8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "dnsruby.rubyforge.org IN AAAA", got type "A"
          Jan  8 18:08:23 blue firefox-bin: gethostby*.getanswer: asked for "lists.isc.org IN AAAA", got type "A"
          Jan  8 18:18:53 blue firefox-bin: gethostby*.getanswer: asked for "versioncheck.addons.mozilla.org IN AAAA", got type "A"
          $ 

           

          Here, Firefox is trying to resolve those domain names into IPv6 address records (AAAA), and the resolver library (called by Firefox) is aggravated by the fact that the DNS server (VMware's proxy) insists upon returning records of a wrong type (A).

           

          Could we expect this problem to be fixed in a future version, hopefully soon?

           

          Thanks,

          Eugene

           

          P.S. By the way, the DNS records shown there are live examples under my administration, not just fictitious ones; developers are encouraged to make experimental queries. 

          • 2. Re: Caching DNS behind Fusion's NAT
            rcardona2k Champion

            I reproduced the same results you presented.  Short of a fix in the dns proxy, you can also reconfigure VMware dhcpd to return an upstream DNS server or if you roam, you can select a public DNS provider like OpenDNS or Google DNS.

             

            This is the section I modified in /Library/Application Support/VMware Fusion/vmnet8:

            subnet 172.16.208.0 netmask 255.255.255.0 {
                 range 172.16.208.128 172.16.208.254;
                 option broadcast-address 172.16.208.255;
                 option domain-name-servers 208.67.222.222;
                 option domain-name localdomain;
                 default-lease-time 1800;                # default is 30 minutes
                 max-lease-time 7200;                    # default is 2 hours
                 option routers 172.16.208.2;
            }

             

            To take effect, I restarted Fusion nat services with vmnet-cli --stop and --start as root in /Library/Application Support/VMware Fusion.  After renewing my DHCP client lease, /etc/resolv.conf had my reconfigured DNS server and dig reports the correct results.

            • 3. Re: Caching DNS behind Fusion's NAT
              petr Champion
              VMware Employees

              Can you verify that adding

               

              [dns]
              prohibitHostLookup = 1

               

              to "/Library/Application Support/VMware Fusion/vmnet8/nat.conf" & restarting natd fixes things for your setup?

               

              Message was edited by petr to add  around so it is not link somewhere...

              • 4. Re: Caching DNS behind Fusion's NAT
                EugeneKim Novice

                Yes, the problem is solved by adding those lines, mostly.  The only remaining bug is the TTL rewrite (it is fixed at 5 seconds for some reason); although it does violate the DNS standards, I don't think it will cause serious problems in practice.

                 

                Thank you,

                Eugene

                • 5. Re: Caching DNS behind Fusion's NAT
                  petr Champion
                  VMware Employees

                  Are you sure it happens with prohibitHostLookup set?  5 seconds is used only when natd is inventing reply altogether - either when prohibitHostLookup is not set, and request contained ".localdomain" suffix, or if prohibitHostLookup is not set, and response received had zero ancount.  If prohibitHostLookup is set, you should get exact record host's res_nsend() returns for request you sent from guest.

                  • 6. Re: Caching DNS behind Fusion's NAT
                    EugeneKim Novice

                    Yes, it does happen with prohibitHostLookup set.  In fact, the TTL seems rewritten in all the records returned by the VMware's proxy (192.168.240.2):

                     

                    $ dig @192.168.240.2 purple.the-7.net IN SRV # to make sure prohibitHostLookup is working; purple.the-7.net has no SRV records
                    
                    ; <<>> DiG 9.6.1-P1 <<>> @192.168.240.2 purple.the-7.net IN SRV
                    ; (1 server found)
                    ;; global options: +cmd
                    ;; Got answer:
                    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27180
                    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
                    
                    ;; QUESTION SECTION:
                    ;purple.the-7.net.          IN     SRV
                    
                    ;; AUTHORITY SECTION:
                    the-7.net.          5     IN     SOA     ns1.the-7.net. hostmaster.the-7.net. 2006090954 10800 3600 604800 60
                    
                    ;; Query time: 3 msec
                    ;; SERVER: 192.168.240.2#53(192.168.240.2)
                    ;; WHEN: Wed Jan 13 23:40:08 2010
                    ;; MSG SIZE  rcvd: 85
                    
                    $ dig @192.168.240.2 google.com IN A
                    
                    ; <<>> DiG 9.6.1-P1 <<>> @192.168.240.2 google.com IN A
                    ; (1 server found)
                    ;; global options: +cmd
                    ;; Got answer:
                    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55425
                    ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0
                    
                    ;; QUESTION SECTION:
                    ;google.com.               IN     A
                    
                    ;; ANSWER SECTION:
                    google.com.          5     IN     A     74.125.19.103
                    google.com.          5     IN     A     74.125.19.104
                    google.com.          5     IN     A     74.125.19.105
                    google.com.          5     IN     A     74.125.19.106
                    google.com.          5     IN     A     74.125.19.147
                    google.com.          5     IN     A     74.125.19.99
                    
                    ;; AUTHORITY SECTION:
                    google.com.          5     IN     NS     ns1.google.com.
                    google.com.          5     IN     NS     ns3.google.com.
                    google.com.          5     IN     NS     ns4.google.com.
                    google.com.          5     IN     NS     ns2.google.com.
                    
                    ;; Query time: 4 msec
                    ;; SERVER: 192.168.240.2#53(192.168.240.2)
                    ;; WHEN: Wed Jan 13 23:40:13 2010
                    ;; MSG SIZE  rcvd: 196
                    
                    $ dig @192.168.240.2 google.com IN A
                    
                    ; <<>> DiG 9.6.1-P1 <<>> @192.168.240.2 google.com IN A
                    ; (1 server found)
                    ;; global options: +cmd
                    ;; Got answer:
                    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45467
                    ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0
                    
                    ;; QUESTION SECTION:
                    ;google.com.               IN     A
                    
                    ;; ANSWER SECTION:
                    google.com.          5     IN     A     74.125.19.99
                    google.com.          5     IN     A     74.125.19.103
                    google.com.          5     IN     A     74.125.19.104
                    google.com.          5     IN     A     74.125.19.105
                    google.com.          5     IN     A     74.125.19.106
                    google.com.          5     IN     A     74.125.19.147
                    
                    ;; AUTHORITY SECTION:
                    google.com.          5     IN     NS     ns2.google.com.
                    google.com.          5     IN     NS     ns3.google.com.
                    google.com.          5     IN     NS     ns4.google.com.
                    google.com.          5     IN     NS     ns1.google.com.
                    
                    ;; Query time: 3 msec
                    ;; SERVER: 192.168.240.2#53(192.168.240.2)
                    ;; WHEN: Wed Jan 13 23:40:15 2010
                    ;; MSG SIZE  rcvd: 196
                    
                    $ dig @10.0.0.1 google.com IN A
                    
                    ; <<>> DiG 9.6.1-P1 <<>> @10.0.0.1 google.com IN A
                    ; (1 server found)
                    ;; global options: +cmd
                    ;; Got answer:
                    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62337
                    ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0
                    
                    ;; QUESTION SECTION:
                    ;google.com.               IN     A
                    
                    ;; ANSWER SECTION:
                    google.com.          264     IN     A     74.125.19.147
                    google.com.          264     IN     A     74.125.19.99
                    google.com.          264     IN     A     74.125.19.103
                    google.com.          264     IN     A     74.125.19.104
                    google.com.          264     IN     A     74.125.19.105
                    google.com.          264     IN     A     74.125.19.106
                    
                    ;; AUTHORITY SECTION:
                    google.com.          155100     IN     NS     ns1.google.com.
                    google.com.          155100     IN     NS     ns3.google.com.
                    google.com.          155100     IN     NS     ns4.google.com.
                    google.com.          155100     IN     NS     ns2.google.com.
                    
                    ;; Query time: 2 msec
                    ;; SERVER: 10.0.0.1#53(10.0.0.1)
                    ;; WHEN: Wed Jan 13 23:40:23 2010
                    ;; MSG SIZE  rcvd: 196
                    
                    $ dig @10.0.0.1 google.com IN A
                    
                    ; <<>> DiG 9.6.1-P1 <<>> @10.0.0.1 google.com IN A
                    ; (1 server found)
                    ;; global options: +cmd
                    ;; Got answer:
                    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12864
                    ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0
                    
                    ;; QUESTION SECTION:
                    ;google.com.               IN     A
                    
                    ;; ANSWER SECTION:
                    google.com.          262     IN     A     74.125.19.106
                    google.com.          262     IN     A     74.125.19.147
                    google.com.          262     IN     A     74.125.19.99
                    google.com.          262     IN     A     74.125.19.103
                    google.com.          262     IN     A     74.125.19.104
                    google.com.          262     IN     A     74.125.19.105
                    
                    ;; AUTHORITY SECTION:
                    google.com.          155098     IN     NS     ns3.google.com.
                    google.com.          155098     IN     NS     ns1.google.com.
                    google.com.          155098     IN     NS     ns2.google.com.
                    google.com.          155098     IN     NS     ns4.google.com.
                    
                    ;; Query time: 4 msec
                    ;; SERVER: 10.0.0.1#53(10.0.0.1)
                    ;; WHEN: Wed Jan 13 23:40:25 2010
                    ;; MSG SIZE  rcvd: 196
                    
                    $ 
                    

                     

                     

                    As shown above, TTL is fixed at 5 seconds in all the results from VMware's proxy, but decrements in real time in the results from the upstream DNS server as expected.

                     

                    On your point about VMware Fusion's verbatim use of the result returned by res_nsend(): I wrote a simple program to check if Mac OS X's implementation of res_nsend() is the culprit that rewrites TTL, but res_nsend() seems just fine:

                     

                    $ cat res_send_test.c
                    #include <sys/types.h>
                    #include <netinet/in.h>
                    #include <arpa/nameser.h>
                    #include <err.h>
                    #include <resolv.h>
                    #include <stdio.h>
                    #include <string.h>
                    #include <sysexits.h>
                    
                    int
                    main
                    (int argc, char **argv, char **envp)
                    {
                        unsigned char query[1024], reply[1024];
                        int query_len, reply_len;
                        struct __res_state res0;
                        res_state res = &res0;
                    
                        if (argc != 2)
                         errx(EX_USAGE, "usage: res_send_test <domain name>");
                    
                        memset(res, 0, sizeof(*res));
                    
                        res_ninit(res);
                        res->options |= RES_DEBUG; /* this parse-prints result received from server */
                    
                        query_len = res_mkquery(ns_o_query, /* op */
                                       argv[1], ns_c_in, ns_t_a, /* dname, class, type */
                                       NULL, 0, NULL, /* data, datalen, newrr_in */
                                       query, sizeof(query) /* buf, buflen */);
                        if (query_len == -1)
                         errx(EX_UNAVAILABLE, "res_mkquery() failed");
                    
                        reply_len = res_nsend(res, query, query_len, reply, sizeof(reply));
                        if (reply_len == -1)
                         errx(EX_UNAVAILABLE, "res_nsend() failed");
                    
                        return 0;
                    }
                    $ cc -g -O0 -Wall -Werror res_send_test.c -o res_send_test -lresolv
                    $ ./res_send_test google.com
                    ;; res_send()
                    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48117
                    ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
                    ;;     google.com, type = A, class = IN
                    ;; Querying server (# 1) address = 10.0.0.1
                    ;; new DG socket
                    ;; got answer:
                    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48117
                    ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0
                    ;;     google.com, type = A, class = IN
                    google.com.          3m23s IN A     74.125.19.104
                    google.com.          3m23s IN A     74.125.19.105
                    google.com.          3m23s IN A     74.125.19.106
                    google.com.          3m23s IN A     74.125.19.147
                    google.com.          3m23s IN A     74.125.19.99
                    google.com.          3m23s IN A     74.125.19.103
                    google.com.          1d18h16m14s IN NS  ns4.google.com.
                    google.com.          1d18h16m14s IN NS  ns3.google.com.
                    google.com.          1d18h16m14s IN NS  ns1.google.com.
                    google.com.          1d18h16m14s IN NS  ns2.google.com.
                    $ 
                    

                     

                     

                    Hope this helps,

                    Eugene