3 Replies Latest reply on Apr 7, 2015 8:14 AM by chanaka_ek

    NSX - Ping issues between DLR and Edge gateway on the transit network

    chanaka_ek Novice
    vExpert

      Hi,

       

      I've deployed a NSX in a POC environment and having some weird issues. I've deployed a distributed router (DLR) with 2 internal interfaces (connected to app & web NW segments) and an uplink interface connected to a transit network (192.168.10.0/29). I've also deployed a Edge services gateway with an internal link connected to the same transit interface (192.168.10.0/29) and an uplink interface connected to the outside world.

       

      The issue is, when I putty on the Edge service gateway and ping the DLR's uplink interface using its transit network ip address (192.168.10.2), I don't get a response. The firewall is set to accept all traffic on both the DLR and the Edge.

       

      Does anyone have any ideas? Note that the DLR's had the default gateway configured which is pointing at the Edge gateway's IP on the transit network (as this is the only north bound connection DLR has)

       

      Cheers

       

      Attached is a rough drawing of the topology. Ping fails from 192.168.10.1 to 192.168.10.2

        • 1. Re: NSX - Ping issues between DLR and Edge gateway on the transit network
          grosas Enthusiast
          VMware Employees

          Hi chanaka_ek

           

          The DLR gateway configuration shouldn't matter for this type of connectivity.  Your intra-Logical Switch communications though is totally dependent on VXLAN functioning correctly though.  

           

          Do you have control over the hosts in your POC? Is the PGW01 Edge Gateway in the same cluster as your DLR?  If so, you can place the VMs together to isolate the issue, if you can achieve reachability by placing the VMs together, you may have an issue at the upstream switch or at the VTEP.

           

          Are you able to test VTEP to VTEP successfully between all hosts?  It sounds to me at glance like the VTEP function may not be working wherever you have PGW01 deployed.  I would first review the ESXi hosts where your PGW Edge is deployed to ensure the VXLAN configuration is healthy. You could deploy a tiny VM to the transit network for easier troubleshooting. 

           

          What do you see for the following:

           

          esxcli network vswitch dvs vmware vxlan list (are you seeing your VXLAN VMKNIC counted?)

           

          esxcli network vswitch dvs vmware vxlan network mapping list --vds-name [vdsname] --vxlan-id [vxlan-id for your transit network]

           

          As long as all the components (DLR Interface and Edge Interface in this case) are correctly connected to the same Logical Switch with the FW off, then you should be focusing on VXLAN functionality for each esxi host subscribed to that Logical Switch.

          • 2. Re: NSX - Ping issues between DLR and Edge gateway on the transit network
            chanaka_ek Novice
            vExpert

            Hi Grosas,

             

            Thanks for the reply....

             

            Yes all the components are in the same compute & Edge cluster and VXLAN communication is fine between all hosts. (VTEP and VXLAN comms were fine)

             

            I logged a call with GSS at the end and one of the engineer's found out that the netcpa service on each host was somewhat buggered in that it didn't have the correct information of logical router instances. A restart of the netcpa service seemed to have re-established the connection back to the controller node (rabbit MQ service on port 5671) and are now up to date with the configuration spec. I can also now ping the previously unpingable IP's

             

            They are doing a root cause analysis to see why the netcpa service went funny and will get back to, I will post an update once they do

             

            Thanks for your help

             

            Cheers

             

            Chan

            • 3. Re: NSX - Ping issues between DLR and Edge gateway on the transit network
              chanaka_ek Novice
              vExpert

              FYI - turned out to be an issue with NSX 6.1.2 which is fixed on 6.1.3. No KB for the issue as of yet nor is it mentioned as being fixed in 6.1.3 in the release notes but VMware GSS engineer confirmed that its fixed.

               

              temporary work around is to stop and start the netcpa daemon on the ESXi hosts of the compute & edge cluster

               

              See more details on my blog   http://chansblog.com/nsx-6-1-2-bug-dlr-interface-communication-issues-how-to-troubleshoot-using-net-vdr-command/http://chansblog.com/nsx-6-1-2-bug-dlr-interface-communication-issues-how-to-troubleshoot-using-net-vdr-command/