roity57
Enthusiast
Enthusiast

NSX 6.3.5 Route Reflection issue

Jump to solution

I have a scenario whereby I'm seeing routes from a Juniper SRX firewall being reflected back to it from a DLR behind an ESG in NSX.  The SRX ECMP peers with two ESGs and what's happening is that the ESGs are advertising to the DLR and the DLR is advertising routes BACK to one of the two ESGs (the "second one") and that ESG is advertising the route back to the SRX and presto, a route loop.  AS Path loop prevention does not work as the ESG is only advertising it's own next-hop AS, not the whole AS Path.

Scenario #1 – physical networks advertised to ESG.
2 x ESG (PLR) learning from Physical Firewall and distributed to UDLR and local DLR.
The DLR is then reflecting the learned route BACK to the second PLR.
Second PLR re-advertises the network it’s learning from the Physical Firewall back to the Physical Firewall.
Physical Firewall is installing the route! (luckily benign as OSPF has the cheapest cost)

Scenario #2 – virtual wires advertised to ESG from UDLR.
2 x ESG (PLR) learning Virtual Wires from UDLR and distributed to local DLR.
The DLR is then reflecting the learned route BACK to the second PLR.
ESG installs the route, but it’s benign as it can see the real preferred path via its local AS direct connected network instead of the peer AS path.

I've attached a diagram for Clarity.  I've logged a support request with VMware already, but figured I'd post here and see if anyone else has observed such an issue.  The obscure thing is that I can't reproduce this in NSX 6.3.2 setup in practically the same fashion.  The workaround in 6.3.5 is that I've setup ingress route filters on the second ESG to prevent it from receiving the routes.

1 Solution

Accepted Solutions
roity57
Enthusiast
Enthusiast

For the benefit of the community I can now advise that GSS investigated this issue and published a KB: VMware Knowledge Base , I've pasted it below.

Document Id


53221


Symptoms


This article applies if your NSX BGP configuration satisfies all of the following conditions:

  • You are using a private AS range to peer your BGP domain. (AS Numbers 64512 to 65535)
  • The DLR, ESGs, or upstream routers are peering through eBGP with their private AS.
  • You are utilizing ECMP between your DLR and ESGs.
  • You are running NSX for vSphere 6.3.5.

When all of the above conditions are observed, you might experience routing loops in your NSX domain:

  • Running the show ip bgp command from the DLR or ESG, routes from eBGP neighbors do not display the full AS path.
  • On one or more ESGs, you observe that northbound routes are pointing to the DLR as the next hop.


Cause


There are two parts to this problem. Together they cause this issue.

  1. In NSX for vSphere 6.3.5 and earlier, the ESG/DLR strips the private AS before advertising to eBGP peers, causing loss of AS-Path information.
  2. In NSX for vSphere 6.3.5, send-side loop detection is disabled for BGP, which can potentially cause a routing loop.

Impact / Risks



Resolution



Workaround


To work around this issue, either do one of the following:

  • Allow NSX internal routes to upstream neighbors for your DLR. You can implement the outbound filters on the DLR towards the ESGs to only allow the internal networks.

    OR
  • Configure iBGP between the DLR and ESGs.

    Note: For the second workaround, you need to add a default gateway on the DLR, or the network between the ESG and upstream router, with the ESG as the next hop. or, implement default information originate on all of the ESGs. If you implement default information originate on your ESGs, you will need to apply the appropriate filters to block the propagation of the default route on the upstream router.

View solution in original post

1 Reply
roity57
Enthusiast
Enthusiast

For the benefit of the community I can now advise that GSS investigated this issue and published a KB: VMware Knowledge Base , I've pasted it below.

Document Id


53221


Symptoms


This article applies if your NSX BGP configuration satisfies all of the following conditions:

  • You are using a private AS range to peer your BGP domain. (AS Numbers 64512 to 65535)
  • The DLR, ESGs, or upstream routers are peering through eBGP with their private AS.
  • You are utilizing ECMP between your DLR and ESGs.
  • You are running NSX for vSphere 6.3.5.

When all of the above conditions are observed, you might experience routing loops in your NSX domain:

  • Running the show ip bgp command from the DLR or ESG, routes from eBGP neighbors do not display the full AS path.
  • On one or more ESGs, you observe that northbound routes are pointing to the DLR as the next hop.


Cause


There are two parts to this problem. Together they cause this issue.

  1. In NSX for vSphere 6.3.5 and earlier, the ESG/DLR strips the private AS before advertising to eBGP peers, causing loss of AS-Path information.
  2. In NSX for vSphere 6.3.5, send-side loop detection is disabled for BGP, which can potentially cause a routing loop.

Impact / Risks



Resolution



Workaround


To work around this issue, either do one of the following:

  • Allow NSX internal routes to upstream neighbors for your DLR. You can implement the outbound filters on the DLR towards the ESGs to only allow the internal networks.

    OR
  • Configure iBGP between the DLR and ESGs.

    Note: For the second workaround, you need to add a default gateway on the DLR, or the network between the ESG and upstream router, with the ESG as the next hop. or, implement default information originate on all of the ESGs. If you implement default information originate on your ESGs, you will need to apply the appropriate filters to block the propagation of the default route on the upstream router.

View solution in original post