DLR Control VM is in Active - Standby
OSPF is the routing protocol used between DLR & Edge.
If the active Control VM fails, will there be any impact to the North - South & East - West communication.
If so how for how many seconds. The OSPF is configured with Default timers (10/40)
http://www.routetocloud.com/2014/06/nsx-distributed-logical-router/#DLR_High_Availability
Page 132 of the design guide says , with ECMP on Edges & when the Active Control VM fails, there will be communication stop for the north - south traffic.
Does any one has experience of this & if so what was the down time you experienced with OSPF using default timers (10- Hello, 40 - Dead)
North-South routing = upstream router of DLR to DLR which normally NSX Edge VM.
Do you have NSX Edge as the upstream router of the DLR?
Do you use HA or ECMP?
As per design guide, for ECMP the recommendation is to tune the hello/dead timers to 1/3 seconds
and 30/120 seconds for Edge HA
To handle North-South traffic outage, you can create summary static routes on the upstream routers (NSX edge) summarising all the network behind DLR.
The same static route can also be used to send a summary route to physical router.
With this static routes, failure of DLR Active Control VM results no outage of North-South traffic.
South-North traffic should still be forwarded based on the routing/forwarding tables in the ESXi regardless of DLR Control VM availability
All of the screenshots above are taken from the design guide VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0
North-South routing = upstream router of DLR to DLR which normally NSX Edge VM.
Do you have NSX Edge as the upstream router of the DLR? - YES
Do you use HA or ECMP? - Edge Gateways are in ECMP & DLR Control VM are in HA
So my question is with the above scenario , with default OSPF timers when the active Control VM fails , there will be impact for the North - South traffic.
Let me know if my understanding is right.
Regardles off routing timers settings, the North-South traffic will still be impacted due to flap in the dynamic routing adjacency.
Here's how the show ip route looks like when the Active DLR Control VM is still online
And here's how the show ip route when the Active DLR Control VM is failed
and here's how it looks like when you have DLR Control VM failure with static route
You can also take a look at VMworld 2016 session NET8131R - NSX for vSphere Logical Routing Deep Dive
slide: NSX for vSphere Logical Routing Deep Dive