SMcT
Enthusiast
Enthusiast

VCF 4.0 NSX-T Deployment stuck in endless loop

Hi All

Currently attempting to install VCF 4.0 into my lab.

Im stuck at an endless loop of deploying the management cluster nodes then the builder tearing them down again...

The vcd-bringup-debug.log shows entries of 'Waiting for NSX-T manager to become operational' once the vm's are online. Eventually it times out and gives the error 'NSXT_MANAGER_NON_OPERATIONAL NSX-T Manager operation status is false on 10.xx.xx.131'. It then proceeds to delete the VM's.

Its always the same IP address (10.xx.xx.131) of node A it mentions.

For the period of time the VM's are online, I can successfully log in and ping the other nodes, vCenter and the Cloud Builder appliance so comms look to be ok.

The logs aren't giving me anything else I am finding useful.

The only other thing that could be relevant is the deployment spreadsheet shows the NSX node A IP as valid, node B and C flag as invalid (red). I haven't been able to resolve this and assumed it might be a conditional formatting error, but it passes all the validation checks when I load it into the Cloud Builder appliance.

Anyone else had a similar experience, or can suggest anything?

Blog: stephanmctighe.com Twitter: @vStephanMcTighe
Tags (1)
0 Kudos
27 Replies
toffaha1
Enthusiast
Enthusiast

Hi,

I encountered the same issue and it seems storage latency as it worked fine after changing the Datastore RAID level into my nested lab.

BR,

Muhammad Toffaha

Technical Consultant

@vtoffaha

Best Regards,
Muhammad Toffaha
Technical Consultant
0 Kudos
sv1984
Contributor
Contributor

Hi Muhammad,

Can you please elaborate on the exact change that you made? I am using VCF 4.3.1 and also using a nested lab

Thanks!

0 Kudos
shank89
Expert
Expert

Can you describe your lab and the hardware you are using.

Likely meant the underlaying datastore raid level to 0.  Storage latency has a huge impact in nested labs, closely followed up memory and cpu resources.

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
0 Kudos
sv1984
Contributor
Contributor

Thanks for your reply Shashank!

Here is the host hardware specs.

sv1984_0-1632848635651.png

Running 4 nested ESXi nodes, Cloudbuilder, VyOS router and Jumphost running as VMs

sv1984_2-1632848822912.png

sv1984_1-1632848719639.png

 

 

 

 

0 Kudos
sv1984
Contributor
Contributor

I also see the following error in the bringup logs which seems to be the root cause of the failure. "Error occurred while getting certificate chain for 'nsx01b.tmelab.local"  Any idea on how to resolve this?

 

sv1984_0-1632849110834.png

 

0 Kudos
tenthirtyam
VMware Employee
VMware Employee

Does the password used for the NSX Manager admin pass a cracklib-check?

--
Ryan Johnson
Senior Staff Solutions Architect
0 Kudos
sv1984
Contributor
Contributor

Hi Ryan,

 

Yes it does pass the check. 

0 Kudos
VasanthanB
Contributor
Contributor

Is there any solution of reported issues?

0 Kudos