VMware Networking Community
cspenpen_Yotta
Contributor
Contributor

Local Egress not work

Hi All,

I run a test lab like below picture.

When I set locale ID in ULDR. VM in two site can ping each other.

But they don't go north/south egress, just like L2 path.

I check everything, like

UCC(Universal Controller Cluster) can ping ESXi host each other.

UCC(Universal Controller Cluster) can ping ULDR each other.

UCC(Universal Controller Cluster) can ping NSX Manager each other.

localegress01.png

but still not work like local egress.

LAB environment:

Each site has oonly ne esxi host for lab.

NSX version is 6.3

VCSA version is 6.5

Is there any thing I can troubleshooting ?

Thanks a lot.

localegress.png

Reply
0 Kudos
5 Replies
lhoffer
VMware Employee
VMware Employee

Assuming that you've also already set the locale ID on the applicable clusters as well (Installation > Host Preparation > Actions > Change Locale ID) and confirmed that traffic isn't getting blocked by DFW or the ESG firewalls, how far does a trace route from VMs inside the environment to whatever you're trying to reach on the outside get (i.e. does it reach the UDLR, ESG, etc)?

Also, it appears that you've configured default gateways for egress routing (make sure you've got appropriate locale ID configured on the ESG default gateway config too) so assuming that to be the case, how are you advertising the 10.10.10.0/24 prefix from UDLR to ESG and ESG to physical network?

Reply
0 Kudos
cnrz
Expert
Expert

As addition there may be need for 2 UDLR Control VMs, UDLR Control VM for SiteA programs the DLR instances on the ESX hosts in  SiteA, and UDLR Control VM for SiteB programs the DLR instances on ESX hosts in SiteB. During normal operations, DLR instances on SiteA should see ESG01-A internal IP 192.168.55.1 as its default gateway, and DLR instances on SiteB should see ESG01-B internal IP 192.168.62.2 as their default gateway.

Is there a dynamic routing protocol (OSPF or BGP)  configured for local egress or ingress ?

Is it possible to check the routing tables on ESX hosts on SiteA and SiteB for the Next-hops for the routes that is tested? If the traceroute from SiteB VM 10.10.10.3 towards 192.168.64.2 goes as 10.10.10.1-->192.168.55.1-->192.168.63.2-->192.168.101.2, then most probably UDLR Control VM on SiteB is not deployed or may have problems with routing adjacencies?

Also how does the routing change if ESG01-A is powered off?

https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/com.vmware.nsx.troubleshooting.doc/GUID-18EDB5...

The routing table may be observed also from the Central CLI on the NSX manager with this command:

Check the routing table on the host

show logical-router host hostID dlr dlrID route

http://cloudmaniac.net/nsx-central-cli-operations-troubleshooting/

HostID can be found with:

  • To retrieve controllers information / ID: show controller list all
  • To retrieve clusters information / ID: show cluster all
    • To retrieve hosts information / ID in a specific cluster: show cluster cluster-id
  • To retrieve logical switches information / ID: show logical‐switch list all
  • To retrieve distributed logical routers information / ID: show logical‐router list all
  • To retrieve edges information / ID: show edge all

dlrID on the host can be found with:

https://kb.vmware.com/s/article/2145273

Reply
0 Kudos
cspenpen_Yotta
Contributor
Contributor

Assuming that you've also already set the locale ID on the applicable clusters as well (Installation > Host Preparation > Actions > Change Locale ID) and confirmed that traffic isn't getting blocked by DFW or the ESG firewalls, how far does a trace route from VMs inside the environment to whatever you're trying to reach on the outside get (i.e. does it reach the UDLR, ESG, etc)?

pastedImage_1.pngpastedImage_2.png

VM in both site can reach UDLR,ESG, ESXi, UCC in both site. DFW and ESG firewalls set Accept for action.

Also, it appears that you've configured default gateways for egress routing (make sure you've got appropriate locale ID configured on the ESG default gateway config too) so assuming that to be the case, how are you advertising the 10.10.10.0/24 prefix from UDLR to ESG and ESG to physical network?

UDLR, ESG and physical network run OSPF.

make sure you've got appropriate locale ID configured on the ESG default gateway config too

Do you mean set locale ID in UDLR? like below picture.

pastedImage_6.png pastedImage_8.png

Thanks for reply.

Reply
0 Kudos
cspenpen_Yotta
Contributor
Contributor

Hi,

I changed ip address between vyos and ESG, see below picture.

pastedImage_0.png

As addition there may be need for 2 UDLR Control VMs, UDLR Control VM for SiteA programs the DLR instances on the ESX hosts in  SiteA, and UDLR Control VM for SiteB programs the DLR instances on ESX hosts in SiteB. During normal operations, DLR instances on SiteA should see ESG01-A internal IP 192.168.55.1 as its default gateway, and DLR instances on SiteB should see ESG01-B internal IP 192.168.62.2 as their default gateway.

Yes, see below picture.

pastedImage_3.pngpastedImage_2.png

Is there a dynamic routing protocol (OSPF or BGP)  configured for local egress or ingress ?

only OSPF.

Is it possible to check the routing tables on ESX hosts on SiteA and SiteB for the Next-hops for the routes that is tested?

ESXi on SiteA and SiteB can ping VM 10.10.10.2-3 and gateway.

pastedImage_9.pngpastedImage_8.png

If the traceroute from SiteB VM 10.10.10.3 towards 192.168.64.2 goes as 10.10.10.1-->192.168.55.1-->192.168.63.2-->192.168.101.2, then most probably UDLR Control VM on SiteB is not deployed or may have problems with routing adjacencies?

Here is the result as below picture.

SiteB VM traceroute SiteA VM.

SiteB VM traceroute vyos01-B.

SiteB VM traceroute vyos01-A.

pastedImage_15.png

Also how does the routing change if ESG01-A is powered off?

When ESG01-A powered off, the 10.2.31.0/24, 10.2.32.0/24, 192.168.101.0/24 subnet will be removed. As below picture.

And VM on siteA and SiteB still can ping each other.

pastedImage_17.png

Thanks for reply.

Reply
0 Kudos
cnrz
Expert
Expert

On the Site2 UDLR is it possible that there is a(forgotten)  static route for 10.2.31.0/24 with next-hop 192.168.61.2, because at the beginning of the line there is S Letter which indicates it is a static route. Since Static route has priority over Ospf learned routes this may be preferred.

Similarly on Site1 UDLR  there is a static route for 10.2.32.0/24 with next-hop 192.168.62.2. Since Edge and Vyos-A and Vyos-B are also Ospf enabled is it possible to remove these static routes and test again?

The requirement is the traceroute from SiteB VM for Vyos-A to go through 192.168.62.2? (SiteB VM for Vyos-B already has local egress as it goes 192.168.62.2, so loacal egress seems to be working for Ospf learned routes)

pastedImage_2.png

Reply
0 Kudos