MihailsA
Contributor
Contributor

DLR control VM can't ping anything

Greetings all,

I've stumbled upon strange behavior of a DLR router on one installation (NSX v.6.3). Long story short, although it operates as router and routes packets between internal networks with no issues detected, it seems to be unable to complete ARP requests for himself.

What is working:

1. Routing between internal networks: VMs from one Logical Switch can ping VMs in other Logical Switch, with gateway interfaces for each network being on the affected DLR.

2. VMs can ping affected router's interface IPs.

3. EDGE router, that is a default gateway for the affected DLR,can ping DLR's IP on the interface in the Interconnect network (separate Logical Switch).

4. Separate interface on EDGE router, created for testing purposes only, connected to one of the intenal networks can ping VMs in corresponding Logical Switch.

What is not working:

1. DLR cannot ping its default gateway (EDGE router).

2. DLR cannot ping VMs mentioned above.

3. DLR cannot DHCP Relay.

4. DLR cannot participate is OSPF with EDGE router.

5. When I run 'show apr' on DLR control VM, the result is empty - there are no ARP entries.

Firewalls are disabled for now. So, it seems to me that the problem is that the DLR cannot complete ARP requests for his own OS - not the routing part (as it would affect communications between networks, which is not the case).

Can anybody help identify the problem here?

5 Replies
MihailsA
Contributor
Contributor

Update:

I tried the same approach as described in KB2117818 - Ping from the uplink interface of a NSX DLR to Edge fails (2117818) | VMware KB  that was written for NSX 6.0.x and 6.1.x, but seems to be the case for 6.3.x as well. As soon as I configured OSPF between DLR and EDGE the pings from DLR to EDGE started to flow.

However, no DLR -> VM pings yet. I checked on one other installation and I see the same behavior there. Maybe that is an expected behavior? Can somebody comment if you can ping from DLR any VMs on internal interfaces? I think that I have seen that in the past, that's why I take it for granted, but those might be just false memories.

0 Kudos
RaymundoEC
VMware Employee
VMware Employee

hi there sir,

can you please be so kind to check if DLR instances in ESXi are correct?

the procedure is as follows:

connect to ESXi host via ssh session

#net-vdr -I -l     <dash uppercase I as in India, dash lower case l as in Lemon>

use the VDR name as an input to the next command

net-vdr -l --route <name of instance of DLR in the ESXi host according to last run command>

with that you can verify the routing for the instance and even if the instance is running ok in the ESXi host.

hope this helps

+vRay
0 Kudos
MihailsA
Contributor
Contributor

Hi Raymundo,

From the commands you advised it seems everything is OK. The last three rows in the routing table are the internal networks in which the VMs that I try to ping are. I'm not sure how to interpret Gateway "0.0.0.0" here. Can you post some example from one of your DLRs?

Vdr Name:               default+edge-30
Vdr Id:                 0x00001770
Number of Lifs:         4
Number of Routes:       31
Number of Neighbors:    4
State:                  Enabled
Controller IP:          10.11.12.212
Control Plane IP:       10.11.12.41
Control Plane Active:   Yes
Num unique nexthops:    1
Generation Number:      0
Edge Active:            Yes

Destination  GenMask      Gateway      FlagsRef Origin   UpTime Interface
-----------  -------      -------      -------- ------   ------ ---------
0.0.0.0      0.0.0.0      10.69.64.1   UG   1   AUTO 758429 177000000002
10.1.1.0     255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
10.11.12.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
10.69.0.0    255.255.255.252  10.69.64.1   UG   1   AUTO 758423 177000000002
10.69.0.4    255.255.255.252  10.69.64.1   UG   1   AUTO 758423 177000000002
10.69.16.0   255.255.255.248  10.69.64.1   UG   1   AUTO 758423 177000000002
10.69.17.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
10.69.31.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
10.69.32.0   255.255.255.248  10.69.64.1   UG   1   AUTO 758423 177000000002
10.69.33.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
10.69.47.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
10.69.48.0   255.255.255.248  10.69.64.1   UG   1   AUTO 758423 177000000002
10.69.49.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
10.69.63.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
10.69.64.0   255.255.255.248  0.0.0.0      UCI  1   MANUAL   758442 177000000002
10.69.65.0   255.255.255.00.0.0.0      UCI  1   MANUAL   758437 17700000000a
10.69.66.0   255.255.255.00.0.0.0      UCI  1   MANUAL   754664 17700000000c
10.69.79.0   255.255.255.00.0.0.0      UCI  1   MANUAL   758437 17700000000b
0 Kudos
RaymundoEC
VMware Employee
VMware Employee

in my case I have or this customer only one ULS attached to the UDLR with ip 198.4.138.0 and 0.0.0.0 means that I don't have a gateway defined since I'm learning all the routes using OSPF too here

[root@esxmgmt01:~] net-vdr -l --route default+edge-fa537a63-a674-4d89-84d4-5c333334b678

VDR default+edge-fa537a63-a674-4d89-84d4-5c333334b678 Route Table

Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]

Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination      GenMask          Gateway          Flags    Ref Origin   UpTime     Interface

-----------      -------          -------          -----    --- ------   ------     ---------

192.168.216.0    255.255.255.128  0.0.0.0          UCI      1   MANUAL   733031     4e2000000002

192.168.216.128  255.255.255.128  0.0.0.0          UCI      1   MANUAL   657212     4e2000000003

198.4.138.0      255.255.255.0    0.0.0.0          UCI      1   MANUAL   733030     4e200000000a

so for your case those three are your networks for the LS

10.69.65.0  255.255.255.00.0.0.0     UCI 1   MANUAL   75843717700000000a
10.69.66.0  255.255.255.00.0.0.0     UCI 1   MANUAL   75466417700000000c
10.69.79.0  255.255.255.00.0.0.0     UCI 1   MANUAL   75843717700000000b

so try to get a pkcapt like this:

#net-stats -l

capture the PortNum where is connected you VM

#pktcap-uw --switchport <SwitchPortID> --dir 0 -o /tmp/LEAVINGDLR.pcap

#pktcap-uw --switchport <SwitchPortID> --dir 1 -o /tmp/ENTERINGDLR.pcap

then

#tcpdump-uw -enr  /tmp/LEAVINGDLR.pcap

#tcpdump-uw -enr  /tmp/ENTERINGDLR.pcap

and check where is the drop of your ping.

hope this helps

+vRay
0 Kudos
MrSegga
Contributor
Contributor

As far as I uderstand this is expected behavior. See VMware Knowledge Base .