Currently I have 2 NSX Edge Gateways in Active - Active (ECMP OSPF)
I want to change this setup to HA. Is it possible to achieve by configuration change or it is required to redeploy the NSX Edge ?
Let me know what is the best option.
You don't need to specify/allocate the IP and can just create a new logical switch for Edge HA.
The recommendation as described in VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0 is using VXLAN for HA
Further to add to the below my understanding is that i can delete one of the Edge gateways.
By this only 1 gateway will remain. But is it technically possible to configure it into HA or does it required to be re-deployed.
I haven't try this in detail but you can possibly do below steps:
- delete the 2nd NSX edge gateways
- disable ECMP on the 1st NSX edge gateway
- disable ECMP on physical router and DLR
- enable HA on the 1st NSX edge gateway
I have tried in the VMware Hands On Lab that I can disable ECMP from an ECMP-enabled edge and enable HA on that edge
Thanks.
Where is the option to enable the HA in the NSX Edge ?
Under Edge > Manage > Settings > Configuration > HA Configuration > Change
See below screenshots
Thanks Got it.
Regarding the management IP which needs to be configured for the HA does this needs to be on closed network (not reachable from any other network)
What is the normal recommendation for the network allocation for the management IP ( which logical switch or port group does this belongs to)
You don't need to specify/allocate the IP and can just create a new logical switch for Edge HA.
The recommendation as described in VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0 is using VXLAN for HA
Thanks
You should also make sure that Graceful Restart is enabled when running an ESG in HA mode. When running ECMP, Graceful Restart should be disabled.
Also make sure you adjust your dynamic routing protocol timers accordingly once you move from ECMP back to a Edge HA type deployment. This goes for both the ESG and the DLR.
And if you did ECMP correctly you will also have some floating static routes configured on the ECMP Edges which you should be able to remove once you fix up all the OSPF/BGP timers.
Dale
Thanks.....
You should also make sure that Graceful Restart is enabled when running an ESG in HA mode. When running ECMP, Graceful Restart should be disabled.
[Reply] - In my current scenario I am using ECMP & Graceful restart is enabled. Let me know the reason why Graceful restart should be disabled when using ECMP.
Also make sure you adjust your dynamic routing protocol timers accordingly once you move from ECMP back to a Edge HA type deployment. This goes for both the ESG and the DLR.
[Reply] - Does this mean that the OSPF Hello & Dead Interval should match between the Physical Router <-> Edge & Edge <-> DLR.
And if you did ECMP correctly you will also have some floating static routes configured on the ECMP Edges which you should be able to remove once you fix up all the OSPF/BGP timers.
[Reply] - Sorry i didn't get this point. Could you please give more insights to this.
bayuwibowo
I have query regarding your below steps
- delete the 2nd NSX edge gateways
- disable ECMP on the 1st NSX edge gateway
[Query] - I have the edge gateways having OSPF neighbour relationship to 2 physical L3 switches.
There are 2 OSPF Paths between the edge gateways & the Physical L3 switches.
So from my understanding I need to have the ECMP enabled in the Active NSX Edge Gateway. Please clarify.
- disable ECMP on physical router and DLR
- enable HA on the 1st NSX edge gateway
Further to the above since i am planning to do this in Production environment i am planning to take backup of the NSX Edge gateways before making any changes.
Would like to know how to do this. If any thing goes wrong by doing this, i need to have backup or way to ensure that i can revert back to the original settings.
Hi, if you are going to change from ECMP to HA then you will disable ECMP after changing the Edge to HA
Those steps are high level, the additional details are mentioned by DaleCoghlan above
For example in ECMP you may have the routing timers, for example in OSPF hello/dead timers at 1/3 seconds, for HA the recommendation is 30/120
Same goes on the summarized floating static routing that is normally used to handle DLR Control VM failure.
The static routing is no longer required in HA as the dynamic routing protocol timers are long enough
In terms of NSX Edge backup, you can't backup individual NSX Edge using snapshot or backup software
VMware NSX for vSphere 6.2 Documentation Center - Back Up NSX Edges
"Taking individual NSX Edge backups is not supported."
The NSX Edge configuration is part of NSX Manager, if you restore manually from snapshot/backup software the config will be out of sync with NSX Manager
Redeploying the Edge through the vSphere Web Client will restore your NSX Edge to the latest config.
One possible way is to backup existing configuration is through REST API by getting the edge configuration and save the XML
To restore, edit the XML and redeploy through REST API.
Here's a blog on how to do it: NSX Edge Backup and Restore – VMTECHIE
Thanks.
Below is the current OSPF Timers in my Edge & DLR.
OSPF Hello Interval – 10 seconds
OSPF Dead Interval – 40 seconds
So as per the recommendation it has to be changed as below.
OSPF Hello Interval – 30 seconds
OSPF Dead Interval – 120 seconds
I will do the same & i will ensure that it is same in my physical network device also.
Regarding the other question i asked, my NSX Edge Gateway has 2 uplinks for OSPF routing adjacency to 2 Physical L3 switches.
So there will be 2 paths from the NSX Edge Gateway to the physical network. So considering this , is this ECMP or should i not consider this as ECMP.
FYI the timers is from design guide
Regarding your OSPF, it depends on your setup.
If you have multilink OSPF on a different network and you would like to load balance them, then use ECMP
But when you want active/standby or it's a one network connected to two physical routers, you do not need ECMP
Thanks
My setup is similar to the one shown in the diagram from the design guide which you have shown.
The active edge will have OSPF neigh with 2 physcial routers. So from ESG prespective it has 2 equal paths for any network from the 2 physical routers.
So in this case will it require to enable ECMP in the Edge Gateway.
bayupw
- enable HA on the 1st NSX edge gateway
Does it involve any down time when we enable HA on the NSX gateway ?
When you have one NSX Edge, the traffic will pass through that Edge.
Once you enable HA on that Edge, a new Edge will be deployed and act as standby so there should be no downtime involved