5 Replies Latest reply on Nov 20, 2017 12:06 AM by MrSegga

    DLR control VM can't ping anything

    MihailsA Lurker

      Greetings all,

       

      I've stumbled upon strange behavior of a DLR router on one installation (NSX v.6.3). Long story short, although it operates as router and routes packets between internal networks with no issues detected, it seems to be unable to complete ARP requests for himself.

       

      What is working:

      1. Routing between internal networks: VMs from one Logical Switch can ping VMs in other Logical Switch, with gateway interfaces for each network being on the affected DLR.

      2. VMs can ping affected router's interface IPs.

      3. EDGE router, that is a default gateway for the affected DLR,can ping DLR's IP on the interface in the Interconnect network (separate Logical Switch).

      4. Separate interface on EDGE router, created for testing purposes only, connected to one of the intenal networks can ping VMs in corresponding Logical Switch.

       

      What is not working:

      1. DLR cannot ping its default gateway (EDGE router).

      2. DLR cannot ping VMs mentioned above.

      3. DLR cannot DHCP Relay.

      4. DLR cannot participate is OSPF with EDGE router.

      5. When I run 'show apr' on DLR control VM, the result is empty - there are no ARP entries.

       

      Firewalls are disabled for now. So, it seems to me that the problem is that the DLR cannot complete ARP requests for his own OS - not the routing part (as it would affect communications between networks, which is not the case).

       

      Can anybody help identify the problem here?

        • 1. Re: DLR control VM can't ping anything
          MihailsA Lurker

          Update:

          I tried the same approach as described in KB2117818 - Ping from the uplink interface of a NSX DLR to Edge fails (2117818) | VMware KB  that was written for NSX 6.0.x and 6.1.x, but seems to be the case for 6.3.x as well. As soon as I configured OSPF between DLR and EDGE the pings from DLR to EDGE started to flow.

           

          However, no DLR -> VM pings yet. I checked on one other installation and I see the same behavior there. Maybe that is an expected behavior? Can somebody comment if you can ping from DLR any VMs on internal interfaces? I think that I have seen that in the past, that's why I take it for granted, but those might be just false memories.

          • 2. Re: DLR control VM can't ping anything
            RaymundoEC Hot Shot
            vExpertVMware Employees

            hi there sir,

             

            can you please be so kind to check if DLR instances in ESXi are correct?

             

            the procedure is as follows:

             

            connect to ESXi host via ssh session

            #net-vdr -I -l     <dash uppercase I as in India, dash lower case l as in Lemon>

            use the VDR name as an input to the next command

            net-vdr -l --route <name of instance of DLR in the ESXi host according to last run command>

             

            with that you can verify the routing for the instance and even if the instance is running ok in the ESXi host.

             

            hope this helps

            • 3. Re: DLR control VM can't ping anything
              MihailsA Lurker

              Hi Raymundo,

               

              From the commands you advised it seems everything is OK. The last three rows in the routing table are the internal networks in which the VMs that I try to ping are. I'm not sure how to interpret Gateway "0.0.0.0" here. Can you post some example from one of your DLRs?

               

              Vdr Name:               default+edge-30
              Vdr Id:                 0x00001770
              Number of Lifs:         4
              Number of Routes:       31
              Number of Neighbors:    4
              State:                  Enabled
              Controller IP:          10.11.12.212
              Control Plane IP:       10.11.12.41
              Control Plane Active:   Yes
              Num unique nexthops:    1
              Generation Number:      0
              Edge Active:            Yes

               

               

              Destination  GenMask      Gateway      FlagsRef Origin   UpTime Interface
              -----------  -------      -------      -------- ------   ------ ---------
              0.0.0.0      0.0.0.0      10.69.64.1   UG   1   AUTO 758429 177000000002
              10.1.1.0     255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
              10.11.12.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.0.0    255.255.255.252  10.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.0.4    255.255.255.252  10.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.16.0   255.255.255.248  10.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.17.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.31.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.32.0   255.255.255.248  10.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.33.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.47.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.48.0   255.255.255.248  10.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.49.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.63.0   255.255.255.010.69.64.1   UG   1   AUTO 758423 177000000002
              10.69.64.0   255.255.255.248  0.0.0.0      UCI  1   MANUAL   758442 177000000002
              10.69.65.0   255.255.255.00.0.0.0      UCI  1   MANUAL   758437 17700000000a
              10.69.66.0   255.255.255.00.0.0.0      UCI  1   MANUAL   754664 17700000000c
              10.69.79.0   255.255.255.00.0.0.0      UCI  1   MANUAL   758437 17700000000b
              • 4. Re: DLR control VM can't ping anything
                RaymundoEC Hot Shot
                VMware EmployeesvExpert

                in my case I have or this customer only one ULS attached to the UDLR with ip 198.4.138.0 and 0.0.0.0 means that I don't have a gateway defined since I'm learning all the routes using OSPF too here

                 

                 

                [root@esxmgmt01:~] net-vdr -l --route default+edge-fa537a63-a674-4d89-84d4-5c333334b678

                 

                 

                VDR default+edge-fa537a63-a674-4d89-84d4-5c333334b678 Route Table

                Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]

                Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

                 

                 

                Destination      GenMask          Gateway          Flags    Ref Origin   UpTime     Interface

                -----------      -------          -------          -----    --- ------   ------     ---------

                192.168.216.0    255.255.255.128  0.0.0.0          UCI      1   MANUAL   733031     4e2000000002

                192.168.216.128  255.255.255.128  0.0.0.0          UCI      1   MANUAL   657212     4e2000000003

                198.4.138.0      255.255.255.0    0.0.0.0          UCI      1   MANUAL   733030     4e200000000a

                 

                so for your case those three are your networks for the LS

                10.69.65.0  255.255.255.00.0.0.0     UCI 1   MANUAL   75843717700000000a
                10.69.66.0  255.255.255.00.0.0.0     UCI 1   MANUAL   75466417700000000c
                10.69.79.0  255.255.255.00.0.0.0     UCI 1   MANUAL   75843717700000000b

                 

                so try to get a pkcapt like this:

                 

                #net-stats -l

                capture the PortNum where is connected you VM

                #pktcap-uw --switchport <SwitchPortID> --dir 0 -o /tmp/LEAVINGDLR.pcap

                #pktcap-uw --switchport <SwitchPortID> --dir 1 -o /tmp/ENTERINGDLR.pcap

                then

                #tcpdump-uw -enr  /tmp/LEAVINGDLR.pcap

                #tcpdump-uw -enr  /tmp/ENTERINGDLR.pcap

                and check where is the drop of your ping.

                 

                hope this helps

                • 5. Re: DLR control VM can't ping anything
                  MrSegga Lurker

                  As far as I uderstand this is expected behavior. See VMware Knowledge Base .