Solved: Re: NSX-T Edge Node 3.1.3.0 upgrade to 3.2.0 Error... - Page 2

grimsrue · ‎12-19-2021

Already have a crazy error message right from the very start when upgrading the first Edge node to 3.2.0.

This is happening with my Lab NSX-T environment and on the very first Edge Node. Upgrade gets to about 35% and then fails with this crazy error message below.

I deleted and redeployed my Edge nodes from scratch. I also found that I had incorrect DNS IPs configured on my edges and managers. I corrected the DNS IPs. I am receiving the same exact error before and after the redeploy/DNS IP update.

Any ideas out there. I'll probably open a SR with VMWare support to see if they can figure it out, but wanted to see if anyone else has run into the same error.

My lab is running some older server hardware, but they are running a fairly updated version of ESXi 6.7. The last round of NSX-T patches from 3.1.2.1 to 3.1.3.3 had no issues.

Error message:
Edge 3.2.0.0.0.19067070/Edge/nub/VMware-NSX-edge-3.2.0.0.0.19067089.nub switch OS task failed on edge TransportNode be263b3e-0610-497e-af01-7d994ee0443a: clientType EDGE , target edge fabric node id be263b3e-0610-497e-af01-7d994ee0443a, return status switch_os execution failed with msg: An unexpected exception occurred: CommandFailedError: Command ['chroot', '/os_bak', '/opt/vmware/nsx-edge/bin/config.py', '--update-only'] returned non-zero code 1: b"lspci:
Unable to load libkmod resources: error -12\nlspci:
Unable to load libkmod resources: error -12\nlspci:
Unable to load libkmod resources: error -12\nlspci:
Unable to load libkmod resources: error -12\nlspci:
Unable to load libkmod resources: error -12\nSystem has not been booted with systemd as init system (PID 1).
Can't operate.\nERROR: Unable to get maintenance mode information\nNsxRpcClient encountered an error: [Errno 2] No such file or directory\nWARNING: Exception reading InbandMgmtInterfaceMsg from nestdb, Command '['/opt/vmware/nsx-nestdb/bin/nestdb-cli', '--json', '--cmd', 'get', 'InbandMgmtInterfaceMsg']' returned non-zero exit status 1.\nERROR: NSX Edge configuration has failed. 1G hugepage support required\n" .

The only other idea I have is the "VMware-NSX-upgrade-bundle-3.2.0.0.0.19067070.mub" file got messed up when I was uploading it to the Manager. I am going re-upload it again and give the upgrade another go. Will respond to this post as to if it worked or not.

mackov83 · ‎12-22-2021

@grimsrue

Thanks for the advice, I will take a look, but this is a fresh build as of about 1 month ago so I suspect that will all be as expected. I deployed on 3.1 and upgraded to 3.1.3.3 last week which went perfectly fine. 3.2 is proving troublesome.

I should add that this is an eval type deployment, so it is the 'limited export' release.

mackov83 · ‎12-24-2021

Just to give an update in case anybody comes across it.

I am not sure why my upgrade was failing as DNS, NTP etc all checked out fine. After rebooting all managers and attempting the upgrade again, it still failed. I did however read the failure message closely and it mentioned something along the lines of:

node03 timeout
node02 failed due to timeout on node03

After seeing this I deleted node03 and redeployed it. Once it was online and cluster was healthy I progressed the upgrade and it went through as expected.

Filin_K · ‎12-25-2021

If vSphere EVC mode is enabled for the host for the NSX Edge VM, the CPU must be Haswell or later generation.

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/installation/GUID-22F87CA8-01A9-4F2E-B7DB-93...

engyak · ‎12-25-2021

Shouldn't that setting be lifecycled by the NSX Manager?

Also, thank you for responding on Christmas...

grimsrue · ‎12-25-2021

@Filin_K

It seems like there are conflicting requirements on that page. The top of the page says don't go any lower than Haswell when running EVC mode, which indirectly states that 3.2.0 Edge nodes will only work on Haswell CPUs or newer. Further down, on the same page it states that Sandy Bridge is supported for 1G Huge Page support.

The CPUs of my Lab ESXi hosts are older Sandy Bridge. I suspect it would be rare to find any production NSX-T environments running Sandy Bridge processors since Sandy Bridge CPUs are extremely old, but if VMWare states 1G Huge Pages are supported on Sandy Bridge and later CPU generation, than I would assume Sandy Bridge is a supported processor series. So is Sandy Bridge Support or is it not Supported?????

leechunk · ‎01-03-2022

EVS is not enabled in my lab environment (Ivy Bridge) and the 3.2 Edge VM (or any other Linux VM for the matter) is missing the Hugepage CPU features too.

DominicFoley · ‎01-21-2022

I had to do exactly the same.

I deployed the ova manually - you do not need to set the NSX-T manager config (username, password, thumbprint) in the template configuration - you should leave this empty.

It deploys the VM in a powered off state.

Then add the advanced paramters (see attached) and power on the vm

Login as root and run cat /proc/cpuinfo | grep pdpe1gb - you should get some output

Once you have done this, go back to the nsxcli and run the 'join management-plane' command

Once registered you can then go in and configure the edge in the NSX-T Manager UI.

All

NSX-T Edge Node 3.1.3.0 upgrade to 3.2.0 Error message