VMware Cloud Community
erikgraa
Enthusiast
Enthusiast

vCenter update failed for workload domain upgrading to vCloud Foundation 4.3

I upgraded my test instance to vCloud Foundation 4.3 just two weeks ago, and was en route to finishing upgrading to vCF 4.3.1 the other night, but my workload domain vCenter failed at 80% during the "converting data" post install step.

 

I only had a snapshot for the workload domain vCenter, which was stupid, as now the ELM/replication seems to be read-only on the workload domain vCenter, which made my second update attempt to fail on an earlier vmdir step because the state is not "normal".

 

So now I have to first remediate the read-only vmdir before I can figure out why the first update failed.

 

Any pointers? Is VMware Support my only choice now?

0 Kudos
4 Replies
jonastro
VMware Employee
VMware Employee


Hello,

Always take powered off snapshot of all the vCenters in VCF before starting the upgrade.

Now that you have snapshot of only the WLD vCenter and considering this vCenter upgrade went till 80% and failed, it will be in a half patched state (meaning few RPMs are not updated yet and few are already upgraded to latest version and there could be impending errors due to the interdependency and version mismatch)

During patching, the installer puts the replication in read-only/standby mode. This can also be the reason why replication failed after upgrade failure at 80%.

Next time, as soon as the upgrade fails and if you don't have a complete snapshot of all the VC try this:

  • Power off all linked VCs in VCF
  • Take current state snapshot of all VC VMs
  • Power on all VMs and make sure all services are back online and functional
  • Restore only the Failed vCenter to older snapshot and reboot
  • Wait for services to come back online

Most of the times, if there are not many changes happening in VC, the restore of the failed vCenter should get it back to normal state and start replicating with other VCs. 

You can verify replication using - https://kb.vmware.com/s/article/2127057

The problem comes in due of 'TIME' If you wait too long to restore, then snapshot restore will not work as the VCs will go out of sync and start to get replication errors.


How to Fix if no snapshot taken for all the nodes and it has been quite sometime since upgrade failed:

  1. Have to restore to snapshot and start to work on replication issues (or)
  2. Work on the failed vCenter to get the missing packages updated and update the version in SDDC inventory

    For both the steps you need to engage VMware Support.
    Hope this helped you.

 

Regards,

Jonathan

 

VMware Cloud Foundation
0 Kudos
erikgraa
Enthusiast
Enthusiast

Hi

 

Thanks for responding

 

I have come to the conclusion that since this is a lab environment I will simply delete and recreate the workload domain 🙂

0 Kudos
jonastro
VMware Employee
VMware Employee

Do keep the steps I shared handy when you perform upgrade in production.

Have a nice day 🙂 

 

Regards,

Jonathan

VMware Cloud Foundation
0 Kudos
erikgraa
Enthusiast
Enthusiast

Agreed.

 

The vCenter upgrade section in the vCF Lifecycle Management document does not mention snapshotting all vCenter PSCs at all though.

 

Perhaps this should be updated?

 

In fact, the only mention of snapshotting, is that it is NOT[1] supported?

 

[1] vCenter Server upgrade stages now use APIs instead of CLIs, so snapshots are not supported. Backup the vCenter Server appliance before starting the upgrade.

https://docs.vmware.com/en/VMware-Cloud-Foundation/4.3/vcf-lifecycle/GUID-F9E0A7C2-6C68-45B9-939A-C0...

Tags (1)
0 Kudos