DaIceMan
Enthusiast
Enthusiast

after ESXi 6.7 U3 -> 7.0 u2 upgrade no vMotion - possible bug and workaround.

Just wanted to drop my experience here on this upgrade. We recently upgraded a customer's vSphere from 6.7 to 7. The VCSA upgrade went smoothly (!) without major issues. The 3 hosts however, well that was a different story. The hosts were 3 DL380 Gen 9 host with all the latest hp updates.The first obstacle was that hte first 2 hosts (they weren't bought all together) had a 4GB microSD and ESXi 7 requires at least 8. That was resolved by backing up the config on the 6.7 U3 swapping out the microSD with a 32GB one, installing the same identical build (hey it's handy to keep those ESXi ISOs saved!) and restoring the configuration from backup. Good to go. Next obstacle, a dependancy problem with the hp ssacli tool - quickly resolved by removing it and rebooting. Now ready for the upgrade to 7 which was prevsiously prepped as baseline on vcenter. The upgrade apparently went all smoothly, the host rebooted and after a while reconnected with vcenter in maintenance. Ok so let's see, tae it out of maintenance and, yes the HA service is installed and waiting for election and... ok seems ok. Let's try a vmotion.... ah, no go: A general system error occurred: Invalid fault. Check the logs and... hmm nothing really revealing, on the receiving side (the newly updated host) I just see "Evicting vm ....." - strange. Let's check the SAN, are all datastores accessivble? Yes. browse with no issues. VMKPING all interfaces fine, no issues. Jumbo frames problem? Nope, not enabled here. Let's try and reboot, nada nope no vmotion same error. Host enters maintenance and exits with no isssues. Moved out and back in the cluster, no issues. But always that cryptic A general system error occurred: Invalid fault error. Well we went ahead and reinstalled the 6.7 and restore dthe config and checked vmotion: works right away. What the hell? Again, proceed to upgrade to 7.0 u2, nope same A general system error occurred: Invalid fault error. Well that went on for 2 days. So I simply started checking the iSCSI initiator settings and datastore targets, I simply did a resca of the (software) HBA. Then I rebooted and when I took it out of maintenance for the nth time, "installing HA service"... What the hell? Yep it installed the service (which it already had after the upgrade) and lo and behold vmotion works!

So to the next host, same procedure and same problem, no vmotion. Well now I know what screw to turn, and rescanning the HBA and rebooting does the trick "installing HA...".

Seems to me there is probably a slight bug here as the first thing it should do (and did) was checking if HA was installed and activating it then waiting for election. It *did* do these steps every time exiting maintenance (first time it installed HA) but evidently something doesn't go as it should. Why would rescaning the hba and editing the targets (any) fix this? I don't know but it did, twice. The third host had already an 8GB microSD so it was not necessray to swap and this did upgrade without this problem. Just wanted to point this out if anyone happens to encounter a cryptic A general system error occurred: Invalid fault error after a 6.7 U3 to 7.0 u2 upgrade.

 

0 Replies