Hi. We are running v6.4.6 in a cross-vCenter configuration. We are running into an issue since we upgrade from v6.3.5.
When we do failover testing and after everything is done with that, ie all NSX components are shut down in the Primary site. Manager, Controllers, Edges, UDLR. And the Secondary site gets promoted to Primary Manager and we create a new Control cluster.
Then we go back to failover over after we remove the Primary role from the Secondary NSX manager and he becomes a Transit role we go to delete the Controller cluster. We can delete 1 controller only. Any attempts to delete a 2nd one and the controller will power back on in about 6min. The only way we seem to be able to delete them is by powering both the remaining controllers off a couple of times, then the Manager lets us delete them.
Is this some fail safe?
This did not happen to us in v6.3.5
We have duplicate this every time in our test lab.
We have also tried to deleting the Secondary site control cluster when the Manager still a Primary role and not a Transit.
So after many hours of playing with this I found something interesting.
For one I could always delete the last 2 controllers with the REST api.
Now at the secondary site when removing the control cluster while the Secondary NSX manager is in either a Primary or Transit role, if the first controller I delete has the Master role, the other 2 simply will not delete through the manager. They will power off then power back on in about 5min. I noticed after I delete the first one if it is Master, then the other 2 controllers will show the remaining 2 as active nodes, but will still have the 3rd one in its configured cluster list. I can duplicate this every time. Only way to be able to delete through the manager is power them both off and on and form a cluster.
Now if the last controller I delete is the Master, this works just fine.
Again this is for the Secondary site. In the Primary site I can delete any controller I want.
It is bizarre behavior and I have no explanation for it, but at least now I am now how to remove the control cluster at the secondary site during whenever we do a failover test.