Greetings all,
I've stumbled upon strange behavior of a DLR router on one installation (NSX v.6.3). Long story short, although it operates as router and routes packets between internal networks with no issues detected, it seems to be unable to complete ARP requests for himself.
What is working:
1. Routing between internal networks: VMs from one Logical Switch can ping VMs in other Logical Switch, with gateway interfaces for each network being on the affected DLR.
2. VMs can ping affected router's interface IPs.
3. EDGE router, that is a default gateway for the affected DLR,can ping DLR's IP on the interface in the Interconnect network (separate Logical Switch).
4. Separate interface on EDGE router, created for testing purposes only, connected to one of the intenal networks can ping VMs in corresponding Logical Switch.
What is not working:
1. DLR cannot ping its default gateway (EDGE router).
2. DLR cannot ping VMs mentioned above.
3. DLR cannot DHCP Relay.
4. DLR cannot participate is OSPF with EDGE router.
5. When I run 'show apr' on DLR control VM, the result is empty - there are no ARP entries.
Firewalls are disabled for now. So, it seems to me that the problem is that the DLR cannot complete ARP requests for his own OS - not the routing part (as it would affect communications between networks, which is not the case).
Can anybody help identify the problem here?
Update:
I tried the same approach as described in KB2117818 - Ping from the uplink interface of a NSX DLR to Edge fails (2117818) | VMware KB that was written for NSX 6.0.x and 6.1.x, but seems to be the case for 6.3.x as well. As soon as I configured OSPF between DLR and EDGE the pings from DLR to EDGE started to flow.
However, no DLR -> VM pings yet. I checked on one other installation and I see the same behavior there. Maybe that is an expected behavior? Can somebody comment if you can ping from DLR any VMs on internal interfaces? I think that I have seen that in the past, that's why I take it for granted, but those might be just false memories.
hi there sir,
can you please be so kind to check if DLR instances in ESXi are correct?
the procedure is as follows:
connect to ESXi host via ssh session
#net-vdr -I -l <dash uppercase I as in India, dash lower case l as in Lemon>
use the VDR name as an input to the next command
net-vdr -l --route <name of instance of DLR in the ESXi host according to last run command>
with that you can verify the routing for the instance and even if the instance is running ok in the ESXi host.
hope this helps
Hi Raymundo,
From the commands you advised it seems everything is OK. The last three rows in the routing table are the internal networks in which the VMs that I try to ping are. I'm not sure how to interpret Gateway "0.0.0.0" here. Can you post some example from one of your DLRs?
Vdr Name: | default+edge-30 |
Vdr Id: | 0x00001770 |
Number of Lifs: | 4 |
Number of Routes: | 31 |
Number of Neighbors: | 4 |
State: | Enabled |
Controller IP: | 10.11.12.212 |
Control Plane IP: | 10.11.12.41 |
Control Plane Active: | Yes |
Num unique nexthops: | 1 |
Generation Number: | 0 |
Edge Active: | Yes |
Destination | GenMask | Gateway | Flags | Ref Origin UpTime | Interface | |
----------- | ------- | ------- | ----- | --- ------ ------ | --------- | |
0.0.0.0 | 0.0.0.0 | 10.69.64.1 | UG | 1 AUTO | 758429 | 177000000002 |
10.1.1.0 | 255.255.255.0 | 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 |
10.11.12.0 | 255.255.255.0 | 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 |
10.69.0.0 | 255.255.255.252 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 | |
10.69.0.4 | 255.255.255.252 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 | |
10.69.16.0 | 255.255.255.248 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 | |
10.69.17.0 | 255.255.255.0 | 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 |
10.69.31.0 | 255.255.255.0 | 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 |
10.69.32.0 | 255.255.255.248 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 | |
10.69.33.0 | 255.255.255.0 | 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 |
10.69.47.0 | 255.255.255.0 | 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 |
10.69.48.0 | 255.255.255.248 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 | |
10.69.49.0 | 255.255.255.0 | 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 |
10.69.63.0 | 255.255.255.0 | 10.69.64.1 | UG | 1 AUTO | 758423 | 177000000002 |
10.69.64.0 | 255.255.255.248 0.0.0.0 | UCI | 1 MANUAL 758442 | 177000000002 | ||
10.69.65.0 | 255.255.255.0 | 0.0.0.0 | UCI | 1 MANUAL 758437 | 17700000000a | |
10.69.66.0 | 255.255.255.0 | 0.0.0.0 | UCI | 1 MANUAL 754664 | 17700000000c | |
10.69.79.0 | 255.255.255.0 | 0.0.0.0 | UCI | 1 MANUAL 758437 | 17700000000b |
in my case I have or this customer only one ULS attached to the UDLR with ip 198.4.138.0 and 0.0.0.0 means that I don't have a gateway defined since I'm learning all the routes using OSPF too here
[root@esxmgmt01:~] net-vdr -l --route default+edge-fa537a63-a674-4d89-84d4-5c333334b678
VDR default+edge-fa537a63-a674-4d89-84d4-5c333334b678 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]
Destination GenMask Gateway Flags Ref Origin UpTime Interface
----------- ------- ------- ----- --- ------ ------ ---------
192.168.216.0 255.255.255.128 0.0.0.0 UCI 1 MANUAL 733031 4e2000000002
192.168.216.128 255.255.255.128 0.0.0.0 UCI 1 MANUAL 657212 4e2000000003
198.4.138.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 733030 4e200000000a
so for your case those three are your networks for the LS
10.69.65.0 | 255.255.255.0 | 0.0.0.0 | UCI | 1 MANUAL 758437 | 17700000000a | |
10.69.66.0 | 255.255.255.0 | 0.0.0.0 | UCI | 1 MANUAL 754664 | 17700000000c | |
10.69.79.0 | 255.255.255.0 | 0.0.0.0 | UCI | 1 MANUAL 758437 | 17700000000b |
so try to get a pkcapt like this:
#net-stats -l
capture the PortNum where is connected you VM
#pktcap-uw --switchport <SwitchPortID> --dir 0 -o /tmp/LEAVINGDLR.pcap
#pktcap-uw --switchport <SwitchPortID> --dir 1 -o /tmp/ENTERINGDLR.pcap
then
#tcpdump-uw -enr /tmp/LEAVINGDLR.pcap
#tcpdump-uw -enr /tmp/ENTERINGDLR.pcap
and check where is the drop of your ping.
hope this helps
As far as I uderstand this is expected behavior. See VMware Knowledge Base .