I've did a clean installation of NSX 2.4 in our environment. I've created a cluster of the management cluster (3 NSX Managers), but the GUI is still showing VM Clustering in progress.
When I logon to the NSX Managers via ssh and run the command get cluster status and get cluster status verbose then everything is showing up and running.
All nodes can ping eachother, DNS is set correctly, NTP is set correctly.
It seems like it's a GUI bug, since the cluster status is stable and up via the CLI.
I've also rebooted all the 3 managers.
I have the same issue, but in my case I deleted the VM from the vCenter in an attempt to get the NSX manager to fail and let me retry. It never did. I tried multiple reboots and have even gone into the CLI on the managers and confirmed that the node is removed.
No luck. The database seems to be keeping a record of the installation status and the option to terminate and delete is not available.
I was able to deploy a 3rd controller using a different name and ip as the in use name and ip will not release from the stateful database.
I am doing this in the NSX-T 2.4 beta lab and my resort to a clear installation of everything to see if the issue can be resolved.
Not that it is going to help much, but I'm seeing the same "VM Clustering in progress" state.
To add to Robert's tip, in the initial deployment I did make a couple of mistakes in deploying the second manager, that I deleted and redeployed.
This might be a deployment logic glitch, so to say...
I had the same issue using NSX-T-Manager-2.4.0-12456291 on a very clean installation, during the deployment of the 2nd node.
The 3rd node went fine.
There was no mistakes, no uninstalls, nothing that can point to a configuration error.
All looks good on CLI but the NSX-T web GUI is not able to follow the node deployment as it goes thru the different phases, and gets stuck at that "VM Clustering In Progress" status.
I could remove the "offending" node at the CLI (get cluster config; detach node <UUID>; then manually deleted the VM), and CLI reports good stability and normal status of the remaining nodes.
On three full lab deploys from scratch, I experienced this issue in one of them.
The only thing that I saw that was different when the node got stuck, was that when I was filling in the fields to deploy it, the self-refresh icon was self activating for the datastore field. I didn't want to wait and was able to chose the desired datastore and the VM deploy was successful as usual.
Looking to the VM workload from the vSphere Client, was apparent that the NSX-T Manager was heavily loaded (CPU and RAM).
Because these nodes run on top of virtualized ESXi hosts, there was heavy resource contention at the time.
Maybe this has nothing to do with the issue, but after this I reconfigured the ESXi hosts with more capacity and started a new lab from scratch by cloning the vApp template, and all went fine.
Can a transaction be blocked by some kind of time out internal settings cause this issue?
That could explain why these Nodes are deployed with 100% RAM reservation and 4 x vCPU as a minimum.
One thing is for sure, NSX-T version 2.2 was not such a heavy load as this v2.4 is.