heinemml
Contributor
Contributor

vmware giving malformed dns response

Hi,

I'm running a Xubuntu 16.04 guest in VMWare Fusion 8.1 installation on OS X 10.11.5. The VM is using a NAT Interface.

When I'm issuing a dns request to a not fully qualified domain name the response takes roughly 10 seconds.

~$ time host foobar

;; connection timed out; no servers could be reached

real 0m10.026s

user 0m0.020s

sys 0m0.000s

So I took a look at what is going on on the network. The queries and responses leaving and coming to my mac look fine.

But when I look at what is going on on the vmnet8 interface I see malformed responses by the VM Ware DNS.

So here is my outgoing request:

Frame 34: 66 bytes on wire (528 bits), 66 bytes captured (528 bits) on interface 0

Ethernet II, Src: Vmware_6c:df:ea (00:0c:29:6c:df:ea), Dst: Vmware_eb:ae:81 (00:50:56:eb:ae:81)

Internet Protocol Version 4, Src: 172.16.251.200, Dst: 172.16.251.2

User Datagram Protocol, Src Port: 33964 (33964), Dst Port: 53 (53)

Domain Name System (query)

    Transaction ID: 0xe702

    Flags: 0x0100 Standard query

        0... .... .... .... = Response: Message is a query

        .000 0... .... .... = Opcode: Standard query (0)

        .... ..0. .... .... = Truncated: Message is not truncated

        .... ...1 .... .... = Recursion desired: Do query recursively

        .... .... .0.. .... = Z: reserved (0)

        .... .... ...0 .... = Non-authenticated data: Unacceptable

    Questions: 1

    Answer RRs: 0

    Authority RRs: 0

    Additional RRs: 0

    Queries

        foobar: type A, class IN

            Name: foobar

            [Name Length: 6]

            [Label Count: 1]

            Type: A (Host Address) (1)

            Class: IN (0x0001)

And this is the response I'm getting from the DNS.

Frame 36: 94 bytes on wire (752 bits), 94 bytes captured (752 bits) on interface 0

Ethernet II, Src: Vmware_eb:ae:81 (00:50:56:eb:ae:81), Dst: Vmware_6c:df:ea (00:0c:29:6c:df:ea)

Internet Protocol Version 4, Src: 172.16.251.2, Dst: 172.16.251.200

User Datagram Protocol, Src Port: 53 (53), Dst Port: 33964 (33964)

Domain Name System (query)

    Transaction ID: 0x4500

    Flags: 0x0034 Standard query

        0... .... .... .... = Response: Message is a query

        .000 0... .... .... = Opcode: Standard query (0)

        .... ..0. .... .... = Truncated: Message is not truncated

        .... ...0 .... .... = Recursion desired: Don't do query recursively

        .... .... .0.. .... = Z: reserved (0)

        .... .... ..1. .... = AD bit: Set

        .... .... ...1 .... = Non-authenticated data: Acceptable

    Questions: 1624

    Answer RRs: 16384

    Authority RRs: 16401

    Additional RRs: 58740

[Malformed Packet: DNS]

    [Expert Info (Error/Malformed): Malformed Packet (Exception occurred)]

        [Malformed Packet (Exception occurred)]

        [Severity level: Error]

        [Group: Malformed]

So the transaction ID is clearly not matching the one of the query and all other fields look quite broken and the payload is too short.

This causes linux not to recognize the response and waiting until a timeout occurs. That's why it takes 10 seconds.

I can reproduce this with other linux guests as well.

Is this a known problem? Are there any known fixes?

Cheers,

Michael

PS: Here are the binary dumps of the two packets:

Query:

0000   00 50 56 eb ae 81 00 0c 29 6c df ea 08 00 45 00  .PV.....)l....E.

0010   00 34 06 58 40 00 40 11 e5 74 ac 10 fb c8 ac 10  .4.X@.@..t......

0020   fb 02 84 ac 00 35 00 20 f9 a2 e7 02 01 00 00 01  .....5. ........

0030   00 00 00 00 00 00 06 66 6f 6f 62 61 72 00 00 01  .......foobar...

0040   00 01                                            ..

Response:

0000   00 0c 29 6c df ea 00 50 56 eb ae 81 08 00 45 00  ..)l...PV.....E.

0010   00 50 ff 10 00 00 80 11 ec 9f ac 10 fb 02 ac 10  .P..............

0020   fb c8 00 35 84 ac 00 3c 7a c6 45 00 00 34 06 58  ...5...<z.E..4.X

0030   40 00 40 11 e5 74 ac 10 fb c8 ac 10 fb 02 84 ac  @.@..t..........

0040   00 35 00 20 f9 a2 e7 02 01 00 00 01 00 00 00 00  .5. ............

0050   00 00 06 66 6f 6f 62 61 72 00 00 01 00 01        ...foobar.....

Tags (2)
11 Replies
Mikero
Community Manager
Community Manager

If it's the issue I'm thinking of, we fixed the bug in 8.1.1.

If you are using 8.1.1 and the issue persists I'd love to know about it so we can get our engineers to take a look.

-
Michael Roy - PM/PMM: Fusion & Workstation
heinemml
Contributor
Contributor

Hi,

I'm using 'Version 8.1.1 (3771013)'.

If this might be of interest: My Installation was initially installed as 7.x on Yosemite and was upgraded all the way to 8.1.1 when updates were available. The Problem happens with a fresh installed VM as well with a VM that was created with an earlier Version.

0 Kudos
Mikero
Community Manager
Community Manager

Hm, interesting... We should look into that.

fubvmware

-
Michael Roy - PM/PMM: Fusion & Workstation
0 Kudos
fubvmware
VMware Employee
VMware Employee

Our networking engineer is looking into it.

nancyz
VMware Employee
VMware Employee

  1. ~$ time host foobar 
  2. ;; connection timed out; no servers could be reached 
  3.  
  4.  
  5. real 0m10.026s 
  6. user 0m0.020s 
  7. sys 0m0.000s 

So  what's the result of

~$ time host foobar

on your Mac host?


I tried it on my VM, and could get the answer:

test.localdomain is an alias for host.example.com.

host.example.com has address 10.x.x.x.

host.example.com is an alias for host.example.com.

host.example.com has address 10.x.x.x.

real 0m0.524s

user 0m0.016s

sys 0m0.004s

0 Kudos
heinemml
Contributor
Contributor

On my Mac I get the correct response

time host foo                                                                                                                 [17:45:53]

Host foo not found: 2(SERVFAIL)

host foo  0.00s user 0.01s system 13% cpu 0.113 total

What you are showing in your example is that you try to resolve a host that actually exists. (test.localdomain is an alias for host.example.com. show that). Yes that works.

The problem is that when you use a not qualified domain name the response is broken.

When I do a "host foobar" actually two queries are sent to the dns server. "foobar.localdomain" and "foobar". The response for "foobar.localdomain" comes instantly and is correct, telling that the host is unknown. The response for "foobar" also comes instantly but is malformed (as shown in my post above). host discards this answer and waits for a correct one that is never coming, hence the timeout.

The communication between VMWare and the outer network is working correctly (I can see that in Wireshark), the malformed response must be generated by the VMWare internal DNS.

0 Kudos
heinemml
Contributor
Contributor

I just tested with a windows 8 VM and got the same behaviour. (using ping as host is not available). `ping foo.foo` gives an instant error `ping foo` takes multiple seconds until giving an error when the request times out.

0 Kudos
nancyz
VMware Employee
VMware Employee

Hi heinemml

  1. Frame 36: 94 bytes on wire (752 bits), 94 bytes captured (752 bits) on interface 0 
  2. Ethernet II, Src: Vmware_eb:ae:81 (00:50:56:eb:ae:81), Dst: Vmware_6c:df:ea (00:0c:29:6c:df:ea) 
  3. Internet Protocol Version 4, Src: 172.16.251.2, Dst: 172.16.251.200 
  4. User Datagram Protocol, Src Port: 53 (53), Dst Port: 33964 (33964) 
  5. Domain Name System (query) 
  6.     Transaction ID: 0x4500 
  7.     Flags: 0x0034 Standard query 
  8.         0... .... .... .... = Response: Message is a query 
  9.         .000 0... .... .... = Opcode: Standard query (0) 
  10.         .... ..0. .... .... = Truncated: Message is not truncated 
  11.         .... ...0 .... .... = Recursion desired: Don't do query recursively 
  12.         .... .... .0.. .... = Z: reserved (0) 
  13.         .... .... ..1. .... = AD bit: Set 
  14.         .... .... ...1 .... = Non-authenticated data: Acceptable 
  15.     Questions: 1624 
  16.     Answer RRs: 16384 
  17.     Authority RRs: 16401 
  18.     Additional RRs: 58740 
  19. [Malformed Packet: DNS] 
  20.     [Expert Info (Error/Malformed): Malformed Packet (Exception occurred)] 
  21.         [Malformed Packet (Exception occurred)] 
  22.         [Severity level: Error] 
  23.         [Group: Malformed] 

This should be the packet DNS server sends to VM, but why it shows '0... .... .... .... = Response: Message is a query '? It should be a response .

Also,I still could not reproduce your problem in my environment.

I installed xubunt vm , this is the result of the VM.

vmware@ubuntu:~/Desktop$ time host foo

Host foo not found: 3(NXDOMAIN)

real 0m0.466s

user 0m0.020s

sys  0m0.000s

and I didn't found '

  1. [Malformed Packet: DNS] '

in my sniffer file.

What does your /etc/resolve.conf look like?

0 Kudos
heinemml
Contributor
Contributor

Hi,

thanks for your time trying to reproduce this.

I took another look into the problem. It appears on all our macbook at our office when using our internal network which has it's own Windows Server based DNS Server.

I now tried different DNS Servers. I changed the DNS in the OS X preferences. Not the VM.  When I use the one of our provider I get an instant response inside the VM.

So I compared the different responses. I found a pattern.

I always query with the command "host foobar"

Result when I use the DNS of our provider. Or the Google one (8.8.8.8) the server response code is:

"No such name"

Flags: 0x8183 Standard query response, No such name

    1... .... .... .... = Response: Message is a response

    .000 0... .... .... = Opcode: Standard query (0)

    .... .0.. .... .... = Authoritative: Server is not an authority for domain

    .... ..0. .... .... = Truncated: Message is not truncated

    .... ...1 .... .... = Recursion desired: Do query recursively

    .... .... 1... .... = Recursion available: Server can do recursive queries

    .... .... .0.. .... = Z: reserved (0)

    .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server

    .... .... ...0 .... = Non-authenticated data: Unacceptable

    .... .... .... 0011 = Reply code: No such name (3)

which is the happily passed down to the VM.

Result when I use our internal DNS Server:

"Server failure"

Flags: 0x8182 Standard query response, Server failure

    1... .... .... .... = Response: Message is a response

    .000 0... .... .... = Opcode: Standard query (0)

    .... .0.. .... .... = Authoritative: Server is not an authority for domain

    .... ..0. .... .... = Truncated: Message is not truncated

    .... ...1 .... .... = Recursion desired: Do query recursively

    .... .... 1... .... = Recursion available: Server can do recursive queries

    .... .... .0.. .... = Z: reserved (0)

    .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server

    .... .... ...0 .... = Non-authenticated data: Unacceptable

    .... .... .... 0010 = Reply code: Server failure (2)

and this is where the problem starts. The VMWare DNS is then retrying multiple times before sending a malformed response to the VM.

Screen Shot 2016-06-17 at 10.42.34.png

Okay, so how could you reproduce this? I found another case where this is happening.

If you are using a DNS which refuses to serve you you get a similar behaviour.

So try using the DNS of T-Online (217.5.100.185) which will refuse to serve you as it is for internal customers only. You should get a similar behaviour.

Summary:

if the DNS gives a "No such Name" response it is passed down to the VM and everything is fine. If the DNS gives a "Server failure" or "REFUSED" response VMWare retries multiple times and is then sending a malformed response to the VM.

0 Kudos
ChipMcK
Hot Shot
Hot Shot

using a NAT Interface

NAT does not appear on the network.

Do you want Bridged, which allows the VM to be seen/addressable by  the internet?

0 Kudos
heinemml
Contributor
Contributor

ChipMcK‌ sorry, I don't understand what you are referring to. Could you elaborate?

0 Kudos