dbutch1976
Hot Shot
Hot Shot

vCenter 7.0.2 appliance fails to start services after restoring snapshot

Jump to solution

Hi all,

We had an upade from 7.0.2 to 7.0.3c fail with an error at 80% stating "Exception occured in postinstallHook."

I opened a ticket with VMware and it appeared that the services would not start and the tech recommended restoring from the snapshot I had taken prior to the upgrade. The problem was, that the services failed to start on the snapshot also, and there didn't appear to be any other explaination, the tech believed that the vCenter could have been in an unhealthy state prior to beginning the upgrade which may explain both issues, however in our post-mortem there were a lot of questions around how the snapshot was taken.

I took the snapshot with the vCenter running, I deselected "snapshot the virtural machine's memory" and did NOT select the "quiesce virtual machine" option.

My question is, could this have caused the issue with the services not starting? The tech said no, taking a snapshot of a running vCenter (parituclarly a standalone with a local database) should have no issue, but there is some contention here because I'm also hearing that you should never snapshot a vCenter while it is running and that the snapshot won't work.

Luckily we were finally able to get the vCenter back online, but for a while there this was a BIG BIG deal.

Is it REQUIRED to power the vCenter appliance off prior to snapshotting it? If so, does VMware have any pubslished best practices outlining this?

 

0 Kudos
1 Solution

Accepted Solutions
Anil0210
Enthusiast
Enthusiast

Hello dbutch1976,

Just few clarifications and suggestions,

- Do you have standalone vCenter or in linked mode. ?

- Taking snapshot with vCenter powered on should not have any issue and as per VMware donot have any such guidance. I am not sure if any operations related any running service might be running and when you restored that config file for that service might have deleted.

- Just to be clear, The recommended backup and restore process for the vCenter Appliance is via  https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vcenter.install.doc/GUID-3EAED005-B0A3-40CF... 

(File based backup and restore allows you to take automatic scheduled backups)

- If you want to go with snapshot, preferably just go for it if it is standalone vCenter and not in linked mode.

- You can refer to this blog which explains, why snapshot can create a issue with linked mode, https://blogs.vmware.com/vsphere/2019/10/considerations-for-vsphere-component-backup-and-restore-par... 

- Just a suggestion, for vCenter (whether standalone or linked mode) try to use file based backups.

View solution in original post

0 Kudos
6 Replies
msripada
Virtuoso
Virtuoso

Based on the description, it looks like there might be some service which is failing to start and unable to start other dependency services. 

Could you please share the output of service-control --status --all output 

please check if there are any certificate expired on VCSA 

https://kb.vmware.com/s/article/82332

thanks,

MS

0 Kudos
dbutch1976
Hot Shot
Hot Shot

Thank you for the reply. VMware support was able to fix the issue. Apparently they were able to find some back traces in the logs and were able to revert some previous config files.

My real question is could snapshotting the VC while it it was running and then restoring the snapshot cause this issue? Is there an established best practice from VMware to power off the vCenter prior to snapshotting it?

0 Kudos
msripada
Virtuoso
Virtuoso

My real question is could snapshotting the VC while it it was running and then restoring the snapshot cause this issue? Is there an established best practice from VMware to power off the vCenter prior to snapshotting it?

Not really. When you have taken snapshot, it was without memory so the runtime setting are not preserved. When you revert the snap, the service loads configs and start their services. If there was an issue in the config prior to snapshot, it would not change. As the service is already running prior to snapshot, the config changes wont harm /stop the existing running service unless its stopped/rebooted. Hope this clears. It has nothing to do with snapshot.

thanks,

MS

0 Kudos
a_p_
Leadership
Leadership

Not sure whether there's an official best practice available for that. My personal best practice is to take snapshots of critical VMs - especially those with databases etc. - in powered off state to have a clean state.

André

0 Kudos
Anil0210
Enthusiast
Enthusiast

Hello dbutch1976,

Just few clarifications and suggestions,

- Do you have standalone vCenter or in linked mode. ?

- Taking snapshot with vCenter powered on should not have any issue and as per VMware donot have any such guidance. I am not sure if any operations related any running service might be running and when you restored that config file for that service might have deleted.

- Just to be clear, The recommended backup and restore process for the vCenter Appliance is via  https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vcenter.install.doc/GUID-3EAED005-B0A3-40CF... 

(File based backup and restore allows you to take automatic scheduled backups)

- If you want to go with snapshot, preferably just go for it if it is standalone vCenter and not in linked mode.

- You can refer to this blog which explains, why snapshot can create a issue with linked mode, https://blogs.vmware.com/vsphere/2019/10/considerations-for-vsphere-component-backup-and-restore-par... 

- Just a suggestion, for vCenter (whether standalone or linked mode) try to use file based backups.

0 Kudos
dbutch1976
Hot Shot
Hot Shot

Nope, wasn't in linked mode, but I'm at a loss to understand why reverting to the snapshot prior to the upgrade didn't work. The issue was that the services failed to start after the upgrade, and rolling back didn't fix it, so at this point I believe there was a pre-existing issue which was going to happen as soon as the VM was rebooted which is why both the snapshot and upgrade failed. It worked on the second attempt with the same binaries once the booting issue was fixed, so that adds a little more credibility to that theory.

Thanks for your insights.

0 Kudos