5 Replies Latest reply on Jan 8, 2018 6:57 AM by canero

    Local Egress not work

    cspenpen_Yotta Lurker

      Hi All,

       

      I run a test lab like below picture.

      When I set locale ID in ULDR. VM in two site can ping each other.

      But they don't go north/south egress, just like L2 path.

       

      I check everything, like

      UCC(Universal Controller Cluster) can ping ESXi host each other.

      UCC(Universal Controller Cluster) can ping ULDR each other.

      UCC(Universal Controller Cluster) can ping NSX Manager each other.

      localegress01.png

       

      but still not work like local egress.

       

      LAB environment:

      Each site has oonly ne esxi host for lab.

      NSX version is 6.3

      VCSA version is 6.5

       

      Is there any thing I can troubleshooting ?

       

      Thanks a lot.

       

       

       

      localegress.png

        • 1. Re: Local Egress not work
          lhoffer Hot Shot
          VMware EmployeesvExpert

          Assuming that you've also already set the locale ID on the applicable clusters as well (Installation > Host Preparation > Actions > Change Locale ID) and confirmed that traffic isn't getting blocked by DFW or the ESG firewalls, how far does a trace route from VMs inside the environment to whatever you're trying to reach on the outside get (i.e. does it reach the UDLR, ESG, etc)?

           

          Also, it appears that you've configured default gateways for egress routing (make sure you've got appropriate locale ID configured on the ESG default gateway config too) so assuming that to be the case, how are you advertising the 10.10.10.0/24 prefix from UDLR to ESG and ESG to physical network?

          • 2. Re: Local Egress not work
            canero Hot Shot

            As addition there may be need for 2 UDLR Control VMs, UDLR Control VM for SiteA programs the DLR instances on the ESX hosts in  SiteA, and UDLR Control VM for SiteB programs the DLR instances on ESX hosts in SiteB. During normal operations, DLR instances on SiteA should see ESG01-A internal IP 192.168.55.1 as its default gateway, and DLR instances on SiteB should see ESG01-B internal IP 192.168.62.2 as their default gateway.

             

            Is there a dynamic routing protocol (OSPF or BGP)  configured for local egress or ingress ?

             

            Is it possible to check the routing tables on ESX hosts on SiteA and SiteB for the Next-hops for the routes that is tested? If the traceroute from SiteB VM 10.10.10.3 towards 192.168.64.2 goes as 10.10.10.1-->192.168.55.1-->192.168.63.2-->192.168.101.2, then most probably UDLR Control VM on SiteB is not deployed or may have problems with routing adjacencies?

             

            Also how does the routing change if ESG01-A is powered off?

             

            https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/com.vmware.nsx.troubleshooting.doc/GUID-18EDB577-1903-4110-8A0B-FE9647ED82B6.html

             

            The routing table may be observed also from the Central CLI on the NSX manager with this command:

            Check the routing table on the host

            show logical-router host hostID dlr dlrID route

            http://cloudmaniac.net/nsx-central-cli-operations-troubleshooting/

            HostID can be found with:

            • To retrieve controllers information / ID: show controller list all
            • To retrieve clusters information / ID: show cluster all
              • To retrieve hosts information / ID in a specific cluster: show cluster cluster-id
            • To retrieve logical switches information / ID: show logical‐switch list all
            • To retrieve distributed logical routers information / ID: show logical‐router list all
            • To retrieve edges information / ID: show edge all

             

             

            dlrID on the host can be found with:

            https://kb.vmware.com/s/article/2145273

            • 3. Re: Local Egress not work
              cspenpen_Yotta Lurker

              Assuming that you've also already set the locale ID on the applicable clusters as well (Installation > Host Preparation > Actions > Change Locale ID) and confirmed that traffic isn't getting blocked by DFW or the ESG firewalls, how far does a trace route from VMs inside the environment to whatever you're trying to reach on the outside get (i.e. does it reach the UDLR, ESG, etc)?

               

              VM in both site can reach UDLR,ESG, ESXi, UCC in both site. DFW and ESG firewalls set Accept for action.

               

              Also, it appears that you've configured default gateways for egress routing (make sure you've got appropriate locale ID configured on the ESG default gateway config too) so assuming that to be the case, how are you advertising the 10.10.10.0/24 prefix from UDLR to ESG and ESG to physical network?

              UDLR, ESG and physical network run OSPF.

               

              make sure you've got appropriate locale ID configured on the ESG default gateway config too

              Do you mean set locale ID in UDLR? like below picture.

               

              Thanks for reply.

              • 4. Re: Local Egress not work
                cspenpen_Yotta Lurker

                Hi,

                 

                I changed ip address between vyos and ESG, see below picture.

                 

                As addition there may be need for 2 UDLR Control VMs, UDLR Control VM for SiteA programs the DLR instances on the ESX hosts in  SiteA, and UDLR Control VM for SiteB programs the DLR instances on ESX hosts in SiteB. During normal operations, DLR instances on SiteA should see ESG01-A internal IP 192.168.55.1 as its default gateway, and DLR instances on SiteB should see ESG01-B internal IP 192.168.62.2 as their default gateway.

                Yes, see below picture.

                Is there a dynamic routing protocol (OSPF or BGP)  configured for local egress or ingress ?

                 

                 

                only OSPF.

                 

                Is it possible to check the routing tables on ESX hosts on SiteA and SiteB for the Next-hops for the routes that is tested?

                ESXi on SiteA and SiteB can ping VM 10.10.10.2-3 and gateway.

                 

                If the traceroute from SiteB VM 10.10.10.3 towards 192.168.64.2 goes as 10.10.10.1-->192.168.55.1-->192.168.63.2-->192.168.101.2, then most probably UDLR Control VM on SiteB is not deployed or may have problems with routing adjacencies?

                Here is the result as below picture.

                SiteB VM traceroute SiteA VM.

                SiteB VM traceroute vyos01-B.

                SiteB VM traceroute vyos01-A.

                 

                Also how does the routing change if ESG01-A is powered off?

                When ESG01-A powered off, the 10.2.31.0/24, 10.2.32.0/24, 192.168.101.0/24 subnet will be removed. As below picture.

                And VM on siteA and SiteB still can ping each other.

                 

                Thanks for reply.

                • 5. Re: Local Egress not work
                  canero Hot Shot

                  On the Site2 UDLR is it possible that there is a(forgotten)  static route for 10.2.31.0/24 with next-hop 192.168.61.2, because at the beginning of the line there is S Letter which indicates it is a static route. Since Static route has priority over Ospf learned routes this may be preferred.

                   

                  Similarly on Site1 UDLR  there is a static route for 10.2.32.0/24 with next-hop 192.168.62.2. Since Edge and Vyos-A and Vyos-B are also Ospf enabled is it possible to remove these static routes and test again?

                   

                  The requirement is the traceroute from SiteB VM for Vyos-A to go through 192.168.62.2? (SiteB VM for Vyos-B already has local egress as it goes 192.168.62.2, so loacal egress seems to be working for Ospf learned routes)