VMware Networking Community
Taupin
Contributor
Contributor

Wrong BGP next hop programming

Hi,

Please consider below topology:

DLR1(AS65001)             --->           ESG1(AS65001)        ------------------------------> Physical Router(AS65002)

192.168.1.1/30                    192.168.1.2/30                      192.168.100.3/24            192.168.100.1/24

- Static route 172.16.0.0/19 is configured on DLR1 and redistributed via BGP to ESG1

- ESG1 advertises stiatic route 172.16.0.0/29 to Physical Router

- static route 172.16.0.0/19 is advertised from ESG1 to Physical router with 192.168.1.1 as Next Hop which is wrong and Physical router didn't install the static route

I believe the Next Hop attribute programming is incorrect when ESG1 advertise the route to physical router, NEXT Hop must be 192.168.100.3 instead of physical Router IP address (ie 192.168.1.1)

Please find attached output from ESG, the output includes:

- Routes advertised to Physical router 192.168.100.1

- Routes received from DLR1 : 192.168.1.1

- directly connected routes on ESG1

Best Regards

Abdelfatah ELARFAOUI

0 Kudos
21 Replies
Taupin
Contributor
Contributor

to reproduce the issue please consider the order of vaporization as below:

- Creates static route then redistribute the static route

or

- Activate the redistribution and check static route/connected then create static route then clear bgp peer with the physical router

0 Kudos
Taupin
Contributor
Contributor

Sorry for typo:

Correct description:

- Static route 172.16.0.0/19 is configured on DLR1 and redistributed via BGP to ESG1

- ESG1 advertises stiatic route 172.16.0.0/29 to Physical Router

- static route 172.16.0.0/19 is advertised from ESG1 to Physical router with 192.168.100.1 as Next Hop which is wrong and Physical router didn't install the static route

I believe the Next Hop attribute programming is incorrect when ESG1 advertise the route to physical router, NEXT Hop must be 192.168.100.3 instead of physical Router IP address (ie 192.168.100.1)

Please find attached output from ESG, the output includes:

- Routes advertised to Physical router 192.168.100.1

- Routes received from DLR1 : 192.168.1.1

- directly connected routes on ESG1

0 Kudos
Taupin
Contributor
Contributor

If I remove static route then recreate it, the NEXT HOP programming is correct and route is advertised to Physical Router with ESG1 IP address 192.168.100.3 Thus Route is installed on Physical router. Please check attached output from ESG1

Once the clear bgp session with physical router, ESG1 will advertise the static route with 192.168.100.1 as NEXT HOP which is wrong

0 Kudos
bayupw
Leadership
Leadership

Hi, BGP next hop is not changed on iBGP sessions.

If we refer to the VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0​, it has explanation on this topic too on page 68

In eBGP/iBGP route exchange, when a route is advertised into iBGP, the next hop is carried unchanged into the iBGP domain.

This may create dependencies on external routing domain stability or connectivity.

To avoid external route reachability issues, the BGP next-hop-self feature or redistribution of a connected interface from which the next hop is learned is required.

The BGP next-hop-self is not supported in current implementation, thus it is necessary to redistribute the ESG uplink interface (e.g., two VLANs that connect to physical routers) into the

iBGP session towards the DLR. Proper filtering should be enabled on the ESG to make sure the uplinks’ addresses are not advertised back to physical routers as this can cause loops/failures.

The solution is to redistribute the ESG1 uplink 192.168.100.3/24 into BGP towards the DLR so DLR can reach the physical router 192.168.100.1

If you need more info on BGP around this specific topic, see these links

BGP: Frequently Asked Questions - Cisco

http://blog.ipspace.net/2011/08/bgp-next-hop-processing.html

http://www.getnetworking.net/bgp/bgp-next-hop-self

Question, what is the requirement behind the static route on DLR1?
If you are going to configure static route to summarise the logical switch networks behind DLR, this normally done on the ESG as per design guide

pastedImage_17.png

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos
bayupw
Leadership
Leadership

There is probably a routing loop where the physical router advertise back the static route to the ESG or static route with that next hop.

Could you share your static routes and the route filtering configuration?

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos
Taupin
Contributor
Contributor

Hi,

Please review my topology, ESG1-Physical router is an EBGP session!! so as per RFC and normal EBGP behavior NEXT HOP will be changed!

Best Regards

Abdelfatah ELARFAOUI

IP/MPLS Expert, CCIE R&S

0 Kudos
Taupin
Contributor
Contributor

There is no loop my friend, the topology is really a Flat topology!!

I have included the procedure to reproduce this strange behavior which is not aligned with BGP RFC

Best Regards

Abdelfatah ELARFAOUI

IP/MPLS Expert | CCIE R&S

https://www.linkedin.com/in/elarfaouiabdelfatah/

0 Kudos
bayupw
Leadership
Leadership

Sorry I missed the ASN# detail in your first post. Which NSX version are you using? I'll try to simulate too in my lab

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos
bayupw
Leadership
Leadership

I couldn't find the attachment on this reply

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos
Taupin
Contributor
Contributor

Hi,

Thanks for your reply, NSX version 6.3.1

To reproduce this issue please follow below procedure:

- Topology:

DLR2 (with default GW and directly connected subnets 172.16.1.0/24, 172.16.2.0/24,172.16.3.0/24) [192.168.1.1/30]----------> [192.168.1.2/30] DLR1 [192.168.13.1/24]-------------> [192.168.13.2/24]  ESG1 [192.168.100.3/24]---------------------------------------> [192.168.100.1/24] Physical Router

- DLR2 uses default GW 0.0.0.0/0 pointing to DLR1 which in fact an ESG

- DLR1 and ESG1 are on the same AS 65001

- Physical Router on AS 65002

- DLR1 is configured with summary route 172.16.0.0/21 pointing to DLR2

- DLR1 redistribute static route and directly connected routes to ESG1

- ESG1 redistribute directly connected routes

To reproduce :

- create static route Then redistribute the static route

as Workaround: I am deleting the static route and recreate it then NEXT HOP is reprogrammed correctly by ESG1 but once I clear BGP sessions, the issue is reproduced

Best Regards

Abdelfatah ELARFAOUI

IP/MPLS Expert , CCIE R&S

https://www.linkedin.com/in/elarfaouiabdelfatah/

0 Kudos
bayupw
Leadership
Leadership

Before I continue to test the scenario, are you saying that you have DLR2 behind DLR1?

Please note that building a multi-tier topology using only DLR instances is not supported and connecting multiple DLRs to a single ESG on shared VXLAN logical switch is also not supported

See VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0 page 73 on Unsupported Topologies

pastedImage_0.png

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos
Taupin
Contributor
Contributor

Please consider DLR1 as ESG,

Best Regards

Abdelfatah ELARFAOUI

0 Kudos
Taupin
Contributor
Contributor

Hi,

Did you get the chance to simulate your Lab?

Thanks

0 Kudos
Taupin
Contributor
Contributor

Hi,

Cloud you please share your finding, I can help with debug if you couldn't successfully reproduce the issue

Thanks

0 Kudos
bayupw
Leadership
Leadership

Not yet, probably this weekend. Will let you know how it goes

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos
Taupin
Contributor
Contributor

Hi,

Did you get the chance to reproduce the issue?

Thanks

Abdelfatah ELARFAOUI

IP/MPLS Expert, CCIE R&S, JNCIE-SP

https://www.linkedin.com/in/elarfaouiabdelfatah/

0 Kudos
Taupin
Contributor
Contributor

Last try ... If no answer from your side I will consider it as a Bug!!

Thanks

0 Kudos
cnrz
Expert
Expert

EBPG and IBGP next-hop processing is different and as in the previous messages next-hop self solves need for reachability of next-hop address since in that case static (or another Routing Protocol as OSPF for the unreachable next-hops from the physical to the DLR Ip address.

For Optimal routing, next-hop announced to IBGP Peers is not changed, so if the next-hop (which is the DLR IP address of DLR-PLR transit link)  is not reachable on the Physical Router,  then routes (Logical Switch Subnets) announced from DLR to the Edge are not put into to routing table of the physical router being flagged as unreachable.

EBGP has different next-hop announcement than IBPG, the next-hop announced by the Edge to the Physical router is the Edge itself instead of DLR IP address, and in that case next-hop self may not be necessary since Physical router already knows the directly connected link Edge-Physical connected IP address.

One exception may be if the routes are redistributed into EBGP (static or ospf)  instead of originating locally on the Edge, or being learned from another BGP neighbor. In that case EBGP may behave similar to IBPG (i,e announcing the DLR IP address as next-hop instead of itself). Although again for route optimality, this may create problems again without next-hop self.

This link may be helpful about the next-hops on BGP different scenarios:

http://blog.ipspace.net/2011/08/bgp-next-hop-processing.html

BGP next hop of a locally originated routes

When a router originates a BGP route configured with a network router configuration command or through route redistribution (redistribute router configuration command), it sets the BGP next hop to the IGP next hop (the same value you’d find in the IP routing table). BGP next hop is set to 0.0.0.0 for routes with unknown next hops – connected interfaces, static routes to null 0 or summary routes configured with aggregate-address router configuration command.

When a BGP route with missing next hop is sent to BGP neighbors, the BGP next hop is set to the source IP address of the BGP session.

If this is per RFC every BGP implementation should behave like this, but if left optional then this may change from different products.

So if the same test is done by configuring BGP between DLR and Edge  instead of static route redistribution on the Edge, the next-hop may change to the Edge IP address. DLR-Edge may be IBGP as well as EBGP, since the Edge in both cases learns these routes from BGP instead of locally originating due to static routes.

On SDDC Reference some designs Ospf is recommended, and for some designs EBGP or IBGP is recommended:

https://pubs.vmware.com/vmware-validated-design-41/topic/com.vmware.ICbase/PDF/vmware-validated-desi...

The VMware Validated Design documentation supports the following configuration:

  • Use eBGP between the physical environment (ToR) and ECMP-enabled NSX Edge (ESG) devices.
  • Use iBGP between NSX ESGs and UDLRs and DLRs.
  • On the NSX ESGs, configure route redistribution between the physical and software-defined infrastructure
0 Kudos
Taupin
Contributor
Contributor

Hi,

Please refer to previous replies if you are interested in reproducing this issue!!!

Best Regards

Abdelfatah ELARFAOUI

IP/MPLS Expert CCIE/JNCIE/VCIX6-NV certified

0 Kudos