Solved: Re: DLR Control VM - Static Route

rajeevsrikant · ‎01-16-2017

Below is the function of the DLR Control VM.

Control Plane represented by a virtual machine called Logical Router (LR) Control VM. Dynamic routing protocols such as OSPF, BGP, IS-IS run between the Control VM and the upper layer, on NSX represented by the NSX Edge Gateway.

The question is if instead of dynamic routing protocols like OSPF or BGP if static routes are used what will be the role of the DLR control VM.

The reason for asking this is because with OSPF as routing protocol when the Active DLR Control VM fails, there is down time of nearly 25 ~ 30 seconds.

The 25 ~ 30 seconds down time is due to the HA to bring the stand by VMup and then for the OSPF convergence.

25 ~ 30 seconds downtime is huge. Is there any way to reduce the down time.

If not whether static route can address this problem. If static route what will be the role of DLR control VM.

cnrz · ‎01-21-2017

The reason why not flushed a non technical explanation would be because it is better than nothing, because the DLR instances don't have control plane on themselves, but rely on Control VM for learning routes.(or use Control VM as the control plane). So it may be a better choice to continue using these routes than flushing them and dropping the packets for 15 seconds until new routes are learned from the secondary control VM. Also I I think there's no signalling mechanism so that the DLR instances are aware if Control VM is up or down.Instead of talking directly to the DLR instance, Control VM uses the netcpa process, which in turn uses the Controller Cluster to update, so it may not be possible to learn this.

Would be good if there is more technical explanation, but according to understanding it is by DLR design and not configurable or changeable.

Some routing protocols such as RIP may depend on continuous refresh of the routing entries on the routing table by the routing process, but other protocols such as OSPF and BGP needs an event that creates a topology change, so they may stay long time as days months if the network is stable: (For DLR, OSPF and BGP are used and both are event driven).

https://supportforums.cisco.com/discussion/11602701/routing-table-time-last-update

"You may notice that for event-driven protocols such as OSPF, IS-IS, EIGRP or BGP, the time may go up to days, weeks, even years if the network is that stable. For timer-driven protocols like RIP, this timer should always show a value lower than the Update interval (30 seconds by default for RIP), as in these protocols, the route is essentially reinstalled into the routing table each time an update arrives that confirms the existence of this network."

http://www.routetocloud.com/2014/06/nsx-distributed-logical-router/#DLR_Control_VM_communications

On this basis the routing update happens in the following manner:

Step (1) DLR Control VM learn new route information (from the dynamic routing as an example) to update the NSX-v controller,
Step (2) the DLR will use the internal channel inside the ESXi01 host called the “Virtual Machine Communication Interface” (VMCI). VMCI will open a socket to transfer learned routes as Routing Information Base (RIB) information to the netcpa service daemon.
Step (3) The netcpa service demon will send the RIB information to the NSX-v controller. The flow of routing information passes through the Management VMkernel interface of the ESXi host, which means that the NSX-v controllers do not need a new interface to communicate to the DLR control VM. The protocol and port used for this communication is TCP/1234.
Step (4) NSX Controller will forward the DLR RIB to all netcpa service daemons on the ESXi host.
Step (5) netcpa will forward the FIB’s to the DLR route instance.

View solution in original post

hansroeder · ‎01-16-2017

When you use only static routes, your traffic might be blackholed if the router on the other side goes down. To my knowledge there isn't really a way in NSX to detect this. However, when you use a dynamic routing protocol like OSPF or BGP, you don't have the same risks. When a router on the other side goes down, based on the timers you set, routes will be removed from the routing table. Thus, traffic will not be blackholed. Also, when using a dynamic routing protocol, you can use ECMP to load balance your traffic along multiple paths and this also implicitly takes care of your High Availability needs.

Now, regarding static routes, it might be a good idea to use them as well. But you should use them as "floating/backup" static routes, with a higher Administrative Distance than OSPF or BGP. This way, when all dynamic routing goes down (for whatever reason), you still have the floating/backup static route. In this scenario you still run the risk of blackholing your traffic if the router that the static route is pointing to goes down, but I think this is an acceptable risk.

My advice would be to use both, but to base your design on dynamic routing and only use static routes for when dynamic routing fails.

cnrz · ‎01-16-2017

an addition may be it is possible to use DLR without DLR Control VM if only static routes are used. The main purpose of the Control VM is to update the routing tables of the DLR instances on ESXi hosts through dynamic neighborships with Edge (or physical) routers. No traffic passes through the DLR Control VM.

https://vzealand.com/2016/10/01/vcap6-nv-3v0-643-study-guide-part-7/

Select Deploy Edge Appliance (Control VM) if you want to utilise dynamic routing protocols or the firewall. Otherwise no control VM will be deployed and you will need to use static routes to route L3 traffic between logical switches.

Also DLR does not support IS-IS:

The ESG supports static, OSPF, BGP and IS-IS routing protocols. The DLR supports all with the exception of IS-IS protocol.

http://blog.ipcraft.net/vmware-nsx-routing-topologies/

rajeevsrikant · ‎01-16-2017

@canero

In the link you have shared the below point is mentioned. In this it also mentioned about firewall. Does this mean if no control VM is deployed no DLF (firewall) can be used. Please clarify.

Select Deploy Edge Appliance (Control VM) if you want to utilise dynamic routing protocols or the firewall. Otherwise no control VM will be deployed and you will need to use static routes to route L3 traffic between logical switches.

hansroeder · ‎01-16-2017

This means that you cannot use the firewall on the DLR, since there is no DLR Control VM. However, it is advised to not use the firewall on the DLR Control VM, so that shouldn't be a problem. You can of course always use the Distributed Firewall, with or without DLR Control VM.

cnrz · ‎01-16-2017

As no traffic passes through the Control VM, the Firewall on the DLR control VM is related to packets coming to the DLR or from the DLRitself, ICMP or Management like telnet or ssh, so it may be important during troubleshooting.

One interesting point for troubleshooting may be testing connectivity with ping from the Control VM if only static routing is used:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=21178...

As pointed, dFW is independent of DLR Funcitonality. Distributed Firewall is configured and managed through Vcenter, it is possible to use this Firewall even without Vxlan or logical switches

DaleCoghlan · ‎01-16-2017

Also keep in mind that if you do decide to deploy a DLR without a control VM as your only using static routing, and then at a later stage you decide you want to use Dynamic routing, it is not possible to just add/deploy the control VM to your existing DLR. You would need to deploy a new DLR with control VM and then migrate everyone to it, or delete your current DLR and re-create it with the control VM.

Dale

DaleCoghlan · ‎01-16-2017

Also remember that if your next hop from the DLR upstream is an ESG in HA mode and your running dynamic routing between the ESG and the DLR, you need to increase your protocol timeout timers to be larger than the HA failover times. Doing this should allow you to tolerate a complete failover of the DLR control VM with no data plane outage. Remember to enable graceful restart on the DLR Control VM

If you using ECMP between the Upstream ESG and DLR, then you need to decrease the routing protocol timers, but also remember that you need to keep the ESG advertising the networks so they can keep attracting traffic and forwarding to the protocol forwarding address. Usually this is a floating static supernet route which is then redistributed via the ESG.

Check out the Design Guide at the following link as this goes some way to explaining how to cater for DLR Control VM failures.

VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0

Dale

rajeevsrikant · ‎01-18-2017

I read the NSX design guide regarding the Control VM Failure & recovery.

I have the below doubt regarding this.

In my environment I am using ECMP between the DLR and the upstream ESG (OSPF) (ESG in Active/Active)

Below are the steps they have mentioned.

1 - Configure Static routes in the NSX Edge gateways pointing to the DLR with higher Administrative distance than OSPF.

2 - NSX edge to redistribute the static routes into OSPF and advertise to the physical routers.

3 - No static routes are configured in the DLR.

Below are my questions.

Q1 - In case of the Active Control VM fails, how the static route is taken for routing.

- What routes will be in the host servers ? Will the hosts receive static routes ? If so how ?

- How the DLR will learn the static routes where there are no static routes configured in the DLR.

Q2 - In case of the Active Control VM fails, with OSPF & Static routing what is the expected amount of down time.

Will there be a down time , if so how much.

cnrz · ‎01-18-2017

If there is no static routes configured in DLR Control VM, then the DLR Instances on the ESXi hosts have no way of learning these static routes as the DLR Control VM has the role of updating the routes to DLR kernel module instances. Most of the designs a single default route pointing to the inside interface of the Edge Gateway is sufficient. Even Ospf is used, generally the next-hop woud still be this interface.

From the NSX 6.2 Troubleshooting Guide page 76 routes on any specific host may be observed with the following command (or net-dvr command on the ESXi host CLI as next link)

https://pubs.vmware.com/NSX-62/topic/com.vmware.ICbase/PDF/nsx_62_troubleshooting.pdf

http://chansblog.com/6-nsx-distributed-logical-router/

http://chansblog.com/tag/net-vdr/

nsxmgr# show logical-router host hostID dlr dlrID route

Page 90 of the Troubleshooting document explains the scenario if the DLR Control VM is lost, the DLR would start to use the static default route, (if tested the Number of Routes on the net-vdr command would decrease to 1 which is the default route)

while the Edge would start to use the Summary Static route with higher administrative distance even the ospf routes are withdrawed, and the upstream physical routers would use the redistributed static summary route pointing to the outisde interface of the Edge Router.

NSX Routing Subsystem Failure Modes and Effects:

This chapter reviews the typical failure scenarios that might affect components of NSX routing subsystem and outlines the effects of these failures.

DLR Control VM is lost or powered off n Create, update, and delete operations for this DLR’s LIFs and routes fail n Any dynamic route updates will not be sent to hosts (including withdrawal of prefixes received via now broken adjacencies) DLR Control VM loses connectivity with the NSX Manager and Controllers

Same effects as above,

rajeevsrikant · ‎01-18-2017

Thanks.

In my environment I am using the OSPF NSSA Area.

Below is the routing table entry of my DLR.

DLR# show ip route

O IA 0.0.0.0/0 [30/3] via 100.45.90.162

O IA 0.0.0.0/0 [30/3] via 100.45.90.163

O N1 111.15.16.0/26 [110/3] via 100.45.90.162

O N1 111.15.16.0/26 [110/3] via 100.45.90.163

O N1 111.15.16.64/26 [110/3] via 100.45.90.162

2 default routes are injected by OSPF pointing to my 2 NSX Edge gateways (ECMP).

So in this case, when the Active DLR Control VM is down, the default route will remain & it will not flush from the routing table.

As per the design guide I will not configure the static route in the DLR.

Let me know if my above understanding is right.

If my above understanding is right there will not be any down time during DLR control VM failure since the default route will still remain in the DLR which gets pushed into the ESXi host routing.

rajeevsrikant · ‎01-18-2017

Attached is the diagram of my setup.

Scenario:

- 2 NSX Edge GW devices with OSPF (ECMP) with Physical NW and with the DLR

- DLR to Edge OSPF

- OSPF Area NSSA

- Will Add static route in the NSX Edge to point to the DLR ( No static route in the DLR)

- Default route is injected to the DLR with the Next hop as NSX Edge

- Default route is injected to the NSX Edge with the next hop as Physical NW

Failure: Active Control VM fails.

- OSPF adjacency will get lost between the DLR & the NSX edge.

- DLR will still have the default route generated by OSPF. The default route will not get flushed when the DLR control VM is down.

- Since the default route is there , the south - north traffic will be happening & there will be no connectivity loss. If there will be less connectivity loss let me for how long.

- In the NSX edge since the static route with high AD is configured there should not be impact to the north south traffic.

Above is my understanding. Correct me if am wrong.

cnrz · ‎01-18-2017

Although best way is to avoid the loss of DLR Control VM with HA and Anti-Affinity mechanisms, if understood correctly the question is " what would happen to the routing tables on the DLR instances on ESXi hosts if b Both DLR VMs are down when OSPF is used?

If the Design is ECMP based for throughput increase instead of Active-Standby Edge which is single next-hop for DLR, then dynamic routing protocols are necessary.

The 2 default routes are inserted into the routing table of the DLR by the Edge through OSPF Routing Protocol, which is in turn distributed to the DLR Instances on the ESXi hosts. Since this is ECMP Scenario, different Flows may choose different next-hops pointing either to .2 or .3. Since there are 2 next-hops, not sure without testing 2 distinct static default routes with higher administrative distance would achieve a backup mechanism. (may not be recommended due to possibility of losing the next-hop edge which may create problem of losing half of the flows since there is no mechanism as ip sla of understanding the next-hop edge is lost ).

Worst case may be again configuring single static route for one the Edges, but this again Anti affinity rules are needed for minimizing the risk of losing the Control VM and this next-hop Edge at the same time. If the ESXi host is lost, even if the Edge would power on on another ESXi host through Vmware Vsphere HA Mechanism, it may be tested if this time is sufficient for Applications.

So for ECMP scenarios, best practices for increasing the Availability of DLR Control VM with HA(High Availability) and avoiding the possibility of losing Edge and Active Control VM at the same time would decrease OSPF failure to a very minimum, which may be sufficient for most of the cases. Affinity rule for DLR VM HA is defined automatically, but recommended seperation of Data Stores of the 2 DLR VMs with Storage DRS manually, because it is not automatic as of 6.2.

https://fojta.wordpress.com/tag/ecmp/

"The other consideration is placement of DLR Control VM. If it fails together with one of ECMP Provider Edges the ESXi host vmkernel routes are not updated until DLR Control VM functionality fails over to the passive instance and meanwhile route to the dead Provider Edge is black holing traffic. If we have enough hosts in the Edge Cluster we should deploy DLR Control VMs with anti-affinity to all ECMP Edges. Most likely we will not have enough hosts therefore we would deployed DLR Control VMs to one of the compute clusters. The VMs are very small (512 MB, 1 vCPU) therefore the cluster capacity impact is negligible"

ESG Affinity Rules for SDRS

"The design guide speaks to some of this in table 10 of the 3.0 guide. ESG and DLR Control HA enables anti-affinity automatically but ECMP needs enabled manually."

http://www.routetocloud.com/2014/06/nsx-distributed-logical-router/#DLR_High_Availability

The High Availability (HA) DLR Control VM allows redundancy at the VM level. The HA mode is Active/Passive where the active DLR Control VM holds the IP address, and if the active DLR Control VM fails the passive DLR Control VM will take ownership of the IP address (flip event). The DLR route-instance and the interface of the LIFs and IP address exists on the ESXi host as a kernel module and are not part of this Active/passive mode flip event.

The Active DLR Control VM sync-forwarding table to secondary DLR Control VM, if the active fails, the forwarding table will continue to run on the secondary unit until the secondary DLR will renew the adjacency with the upper router.

http://www.routetocloud.com/2014/12/nsx-edge-and-drs-rules/

An Edge and DLR that belong to the same tenant should not run in the same ESXi host:

rajeevsrikant · ‎01-19-2017

No, my question is for only the active control VM failure. I am not looking for the both the control VM failure

cnrz · ‎01-19-2017

If the primary control VM fails, secondary control VM (which has no active OSPF neighbroships while waiting) would establish ospf adjacencies and update the routing table of DLR instances on ESXi hosts. As previos link :This failover dedection time is by default 15 seconds, can be decreased to 6 seconds. If no topology change occurs during this time (like losing an edge at the same time with the DLR control VM if they are both on the same host) no or minimal packet loss may be expected during this time.

The FIB entries on the hosts may not be persistent as dFW rules even the host is rebooted during NSX manager unavailability. But this may not not be design requirement as losing both Control VMs at the same time is very unlikely and may not be considered.

DFW Rules on a host

rajeevsrikant · ‎01-19-2017

Thanks.

Could you please help me in answering the queries with related to my scenario below. which i posted with the diagram for reference.

Scenario:

- 2 NSX Edge GW devices with OSPF (ECMP) with Physical NW and with the DLR

- DLR to Edge OSPF

- OSPF Area NSSA

- Will Add static route in the NSX Edge to point to the DLR ( No static route in the DLR)

- Default route is injected to the DLR with the Next hop as NSX Edge

- Default route is injected to the NSX Edge with the next hop as Physical NW

Failure: Active Control VM fails.

- OSPF adjacency will get lost between the DLR & the NSX edge.

- DLR will still have the default route generated by OSPF. The default route will not get flushed when the DLR control VM is down.

- Since the default route is there , the south - north traffic will be happening & there will be no connectivity loss. If there will be less connectivity loss let me for how long.

- In the NSX edge since the static route with high AD is configured there should not be impact to the north south traffic.

cnrz · ‎01-20-2017

South --> North Traffic

During the moment that the Primary Control VM down, and through the period that the Secondary Control VM establishes neighborship with the Edges and learns routes there may not be any traffic loss. This is because the DLR instances on the host retain their old routes (which is also the dynamic ospf default route learned from the Edge) and they shouldn't flush the default route entry.

After old Secondary (now Primary) control VM updates the routing table, again if there's no topology change other then Control VM is down since the topology is the same, next-hops would remain the same, and the traffic would continue as before. In that case no traffic loss may be possible.

One exception may be if for example one of the ESXi hosts is down, and on this ESXi host there are Primary Control VM and one of the Edges. (Best practice recommends to configure anti-affinity rules to avoid this). Since the DLR Instances on other remaining ESXi hosts have no way of knowing the Edge is down, they will not change the next-hop from this Edge to another remaining 7 edges. Second Control VM will not be able to establish ospf adjacency with the down Edge, so the default route entrry pointint to this specific edge would be flushed after it updates the routing tables of the DLR instances. This period may be about 10-15 seconds, (can be tuned with HA timers less than default 15 to 6 seconds).

This link about HA timer may be helpful:

https://nsxtech.net/2014/09/20/understanding-high-availability-on-the-nsx-edge-services-gateway/

North --> South traffic

The Static Routes with high administrative distances may be redistributed into ospf and advertised to the upstream physical routers so that the routes on the physical network about the DLR LIF Networs is not effected from the Control VM down, and reestablishing ospf with edges and Secondary Control VM

Regards,

rajeevsrikant · ‎01-20-2017

Thank you for the clear explanation.

In my environment both the 2 NSX edges & 2 Control VM are running on different hosts.

So single host failure will not trigger dual failure of NSX edges & the control VM.

So in this case my understanding is that both North -> South & South -> North traffic will not have any impact & there should not be any packet loss.

Also I would like to understand the logic that when the active control VM fails , what is the reason that the routes are not flushed out from the hosts.

Sorry for asking many questions regarding this. The reason i need to understand this 100 % before i propose this to my team.

cnrz · ‎01-21-2017

The reason why not flushed a non technical explanation would be because it is better than nothing, because the DLR instances don't have control plane on themselves, but rely on Control VM for learning routes.(or use Control VM as the control plane). So it may be a better choice to continue using these routes than flushing them and dropping the packets for 15 seconds until new routes are learned from the secondary control VM. Also I I think there's no signalling mechanism so that the DLR instances are aware if Control VM is up or down.Instead of talking directly to the DLR instance, Control VM uses the netcpa process, which in turn uses the Controller Cluster to update, so it may not be possible to learn this.

Would be good if there is more technical explanation, but according to understanding it is by DLR design and not configurable or changeable.

Some routing protocols such as RIP may depend on continuous refresh of the routing entries on the routing table by the routing process, but other protocols such as OSPF and BGP needs an event that creates a topology change, so they may stay long time as days months if the network is stable: (For DLR, OSPF and BGP are used and both are event driven).

https://supportforums.cisco.com/discussion/11602701/routing-table-time-last-update

"You may notice that for event-driven protocols such as OSPF, IS-IS, EIGRP or BGP, the time may go up to days, weeks, even years if the network is that stable. For timer-driven protocols like RIP, this timer should always show a value lower than the Update interval (30 seconds by default for RIP), as in these protocols, the route is essentially reinstalled into the routing table each time an update arrives that confirms the existence of this network."

http://www.routetocloud.com/2014/06/nsx-distributed-logical-router/#DLR_Control_VM_communications

On this basis the routing update happens in the following manner:

Step (1) DLR Control VM learn new route information (from the dynamic routing as an example) to update the NSX-v controller,
Step (2) the DLR will use the internal channel inside the ESXi01 host called the “Virtual Machine Communication Interface” (VMCI). VMCI will open a socket to transfer learned routes as Routing Information Base (RIB) information to the netcpa service daemon.
Step (3) The netcpa service demon will send the RIB information to the NSX-v controller. The flow of routing information passes through the Management VMkernel interface of the ESXi host, which means that the NSX-v controllers do not need a new interface to communicate to the DLR control VM. The protocol and port used for this communication is TCP/1234.
Step (4) NSX Controller will forward the DLR RIB to all netcpa service daemons on the ESXi host.
Step (5) netcpa will forward the FIB’s to the DLR route instance.

rajeevsrikant · ‎01-21-2017

Thanks Its quite clear for me now.

In case if any thing comes for any further clarification I will get back.

All

DLR Control VM - Static Route