Hi,
Please consider below topology:
DLR1(AS65001) ---> ESG1(AS65001) ------------------------------> Physical Router(AS65002)
192.168.1.1/30 192.168.1.2/30 192.168.100.3/24 192.168.100.1/24
- Static route 172.16.0.0/19 is configured on DLR1 and redistributed via BGP to ESG1
- ESG1 advertises stiatic route 172.16.0.0/29 to Physical Router
- static route 172.16.0.0/19 is advertised from ESG1 to Physical router with 192.168.1.1 as Next Hop which is wrong and Physical router didn't install the static route
I believe the Next Hop attribute programming is incorrect when ESG1 advertise the route to physical router, NEXT Hop must be 192.168.100.3 instead of physical Router IP address (ie 192.168.1.1)
Please find attached output from ESG, the output includes:
- Routes advertised to Physical router 192.168.100.1
- Routes received from DLR1 : 192.168.1.1
- directly connected routes on ESG1
Best Regards
Abdelfatah ELARFAOUI
to reproduce the issue please consider the order of vaporization as below:
- Creates static route then redistribute the static route
or
- Activate the redistribution and check static route/connected then create static route then clear bgp peer with the physical router
Sorry for typo:
Correct description:
- Static route 172.16.0.0/19 is configured on DLR1 and redistributed via BGP to ESG1
- ESG1 advertises stiatic route 172.16.0.0/29 to Physical Router
- static route 172.16.0.0/19 is advertised from ESG1 to Physical router with 192.168.100.1 as Next Hop which is wrong and Physical router didn't install the static route
I believe the Next Hop attribute programming is incorrect when ESG1 advertise the route to physical router, NEXT Hop must be 192.168.100.3 instead of physical Router IP address (ie 192.168.100.1)
Please find attached output from ESG, the output includes:
- Routes advertised to Physical router 192.168.100.1
- Routes received from DLR1 : 192.168.1.1
- directly connected routes on ESG1
If I remove static route then recreate it, the NEXT HOP programming is correct and route is advertised to Physical Router with ESG1 IP address 192.168.100.3 Thus Route is installed on Physical router. Please check attached output from ESG1
Once the clear bgp session with physical router, ESG1 will advertise the static route with 192.168.100.1 as NEXT HOP which is wrong
Hi, BGP next hop is not changed on iBGP sessions.
If we refer to the VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0, it has explanation on this topic too on page 68
In eBGP/iBGP route exchange, when a route is advertised into iBGP, the next hop is carried unchanged into the iBGP domain.
This may create dependencies on external routing domain stability or connectivity.
To avoid external route reachability issues, the BGP next-hop-self feature or redistribution of a connected interface from which the next hop is learned is required.
The BGP next-hop-self is not supported in current implementation, thus it is necessary to redistribute the ESG uplink interface (e.g., two VLANs that connect to physical routers) into the
iBGP session towards the DLR. Proper filtering should be enabled on the ESG to make sure the uplinks’ addresses are not advertised back to physical routers as this can cause loops/failures.
The solution is to redistribute the ESG1 uplink 192.168.100.3/24 into BGP towards the DLR so DLR can reach the physical router 192.168.100.1
If you need more info on BGP around this specific topic, see these links
BGP: Frequently Asked Questions - Cisco
http://blog.ipspace.net/2011/08/bgp-next-hop-processing.html
http://www.getnetworking.net/bgp/bgp-next-hop-self
Question, what is the requirement behind the static route on DLR1?
If you are going to configure static route to summarise the logical switch networks behind DLR, this normally done on the ESG as per design guide
There is probably a routing loop where the physical router advertise back the static route to the ESG or static route with that next hop.
Could you share your static routes and the route filtering configuration?
Hi,
Please review my topology, ESG1-Physical router is an EBGP session!! so as per RFC and normal EBGP behavior NEXT HOP will be changed!
Best Regards
Abdelfatah ELARFAOUI
IP/MPLS Expert, CCIE R&S
There is no loop my friend, the topology is really a Flat topology!!
I have included the procedure to reproduce this strange behavior which is not aligned with BGP RFC
Best Regards
Abdelfatah ELARFAOUI
IP/MPLS Expert | CCIE R&S
Sorry I missed the ASN# detail in your first post. Which NSX version are you using? I'll try to simulate too in my lab
I couldn't find the attachment on this reply
Hi,
Thanks for your reply, NSX version 6.3.1
To reproduce this issue please follow below procedure:
- Topology:
DLR2 (with default GW and directly connected subnets 172.16.1.0/24, 172.16.2.0/24,172.16.3.0/24) [192.168.1.1/30]----------> [192.168.1.2/30] DLR1 [192.168.13.1/24]-------------> [192.168.13.2/24] ESG1 [192.168.100.3/24]---------------------------------------> [192.168.100.1/24] Physical Router
- DLR2 uses default GW 0.0.0.0/0 pointing to DLR1 which in fact an ESG
- DLR1 and ESG1 are on the same AS 65001
- Physical Router on AS 65002
- DLR1 is configured with summary route 172.16.0.0/21 pointing to DLR2
- DLR1 redistribute static route and directly connected routes to ESG1
- ESG1 redistribute directly connected routes
To reproduce :
- create static route Then redistribute the static route
as Workaround: I am deleting the static route and recreate it then NEXT HOP is reprogrammed correctly by ESG1 but once I clear BGP sessions, the issue is reproduced
Best Regards
Abdelfatah ELARFAOUI
IP/MPLS Expert , CCIE R&S
Before I continue to test the scenario, are you saying that you have DLR2 behind DLR1?
Please note that building a multi-tier topology using only DLR instances is not supported and connecting multiple DLRs to a single ESG on shared VXLAN logical switch is also not supported
See VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0 page 73 on Unsupported Topologies
Please consider DLR1 as ESG,
Best Regards
Abdelfatah ELARFAOUI
Hi,
Did you get the chance to simulate your Lab?
Thanks
Hi,
Cloud you please share your finding, I can help with debug if you couldn't successfully reproduce the issue
Thanks
Not yet, probably this weekend. Will let you know how it goes
Hi,
Did you get the chance to reproduce the issue?
Thanks
Abdelfatah ELARFAOUI
IP/MPLS Expert, CCIE R&S, JNCIE-SP
Last try ... If no answer from your side I will consider it as a Bug!!
Thanks
EBPG and IBGP next-hop processing is different and as in the previous messages next-hop self solves need for reachability of next-hop address since in that case static (or another Routing Protocol as OSPF for the unreachable next-hops from the physical to the DLR Ip address.
For Optimal routing, next-hop announced to IBGP Peers is not changed, so if the next-hop (which is the DLR IP address of DLR-PLR transit link) is not reachable on the Physical Router, then routes (Logical Switch Subnets) announced from DLR to the Edge are not put into to routing table of the physical router being flagged as unreachable.
EBGP has different next-hop announcement than IBPG, the next-hop announced by the Edge to the Physical router is the Edge itself instead of DLR IP address, and in that case next-hop self may not be necessary since Physical router already knows the directly connected link Edge-Physical connected IP address.
One exception may be if the routes are redistributed into EBGP (static or ospf) instead of originating locally on the Edge, or being learned from another BGP neighbor. In that case EBGP may behave similar to IBPG (i,e announcing the DLR IP address as next-hop instead of itself). Although again for route optimality, this may create problems again without next-hop self.
This link may be helpful about the next-hops on BGP different scenarios:
http://blog.ipspace.net/2011/08/bgp-next-hop-processing.html
When a router originates a BGP route configured with a network router configuration command or through route redistribution (redistribute router configuration command), it sets the BGP next hop to the IGP next hop (the same value you’d find in the IP routing table). BGP next hop is set to 0.0.0.0 for routes with unknown next hops – connected interfaces, static routes to null 0 or summary routes configured with aggregate-address router configuration command.
When a BGP route with missing next hop is sent to BGP neighbors, the BGP next hop is set to the source IP address of the BGP session.
If this is per RFC every BGP implementation should behave like this, but if left optional then this may change from different products.
So if the same test is done by configuring BGP between DLR and Edge instead of static route redistribution on the Edge, the next-hop may change to the Edge IP address. DLR-Edge may be IBGP as well as EBGP, since the Edge in both cases learns these routes from BGP instead of locally originating due to static routes.
On SDDC Reference some designs Ospf is recommended, and for some designs EBGP or IBGP is recommended:
The VMware Validated Design documentation supports the following configuration:
Hi,
Please refer to previous replies if you are interested in reproducing this issue!!!
Best Regards
Abdelfatah ELARFAOUI
IP/MPLS Expert CCIE/JNCIE/VCIX6-NV certified