Have been reading the admin guide and cannot find the answer.
I am backing up NSX every hour to FTP.
The guides all say that you can restore just an edge services gateway. YOu have to restore the full NSX configuration.
Lets say i have 10 ESGs all servicing different areas of my business. One of those ESGs needs to be restored due to misconfiguration or some other scenario where we cannot get it back online or passing traffic properly.
If i do the full ftp restore of NSX to NSX Manager will any of my other ESGs or Logical Switches be affected if they have had no changes made? Basically will my other areas of business take an outage while the restore is happening?
I have to image not. How to service providers do it? IT would be unacceptable to take a full outage or even a blip in service while a restore happens for 1 customer.
IT would make sense to me that a full restore, which is the only supported restore in NSX, would lose any changes since the last backup and you may need to "resync" but what about an outage during the restore process.
"How do service providers do it?"
The way professionals do it, is to maintain a very detailed log of the change. What where the exact settings before the change, after the change etc. and they rarely make several changes at the same time, letting things stablize first. That way, a mis-configuration (which can always happen, nobody is perfect) can be reverted quickly. And "four eyes" is important. Always have two knowledgable engineers actively involved in the change(s).
So restoring a configuration is a very rare occasion. And if only one thing was changed, but cannot be reverted somehow, then deploying a new appliance and importing the config from an hour ago is a peeps.
But preventing is better than healing. It' 80% thinking and preparing and testing, and 20% actual execution I always say.
The ones that get in trouble are mostly "revolver clickers" who click first, and think later 😉
A full NSX Manager restore is something you'd only want to do when the NSX Manager itself is broken beyond repair, not when a single edge needs a configuration change. That impacts a bit more then only the edges (management, Service Composer service (so no dynamic changes)). That said, there's a couple things I see with my customers:
- As mentioned; Implement a change process which documents the changes being made. If something goes wrong, just roll back those changes.
- Use a configuration monitor like vRealize Network Insight, which can let you see snapshots of the configuration at a certain point and then use that view to restore the proper settings.
- If it's not a configuration issue and the edge is broken somehow (doesn't pass traffic), simply redeploy it. That deploys a clean edge and configures it with the last known state.
All in all, there's not much to the edge configuration, so if someone makes a mistake and the edge goes offline - it's pretty easy & quick to revert those changes to get it back online.
The best option in case of failed edge appliance is to re-deploy them. This is the easiest & the fastest way, wen the NSX manager configuration is intact.
Log in to the vSphere Web Client.
Click Networking & Security & then click NSX Edges.
Select an NSX Edge instance.
Click the Actions icon and select Redeploy Edge.