VMware Cloud Community
Josh26
Virtuoso
Virtuoso
Jump to solution

Upgrade issues

Hi,

We are upgrading a ESXi 3.5U4 server to vSphere.

Our first test at doing so, the vSphere Host Update application got to 3%, then according to the VI client, asked the head to reboot.

15 minutes later, the console was still at a black screen, and the server was unpingable. I hit power cycle, the program went on and completed the upgrade. Anyone else seen this issue? Should I have just been more patient?

0 Kudos
1 Solution

Accepted Solutions
filbo
Enthusiast
Enthusiast
Jump to solution

Hi Josh,

This is a known issue with ESXi 3.5. During any shutdown there is a small chance that it will hang. This is a race condition and for some reason the vSphere 4.0 upgrade agents seem to be particularly good at "winning" the race.

In technical terms what's happening is that the CIM daemon (sfcbd) somehow manages to kill the init process. Since the final stages of ESXi shutdown / reboot are fired off by init, if init is dead the system is unable to shut down.

If you flip to the vmkernel log screen on alt-F12 you'll see complaints about init having died. At the moment of death there is a message that ends in "wantCoreDump : init -enabled : 0"; this only appears once and will eventually scroll out of sight. You will also see one or more "UserCartel: CartelCleanup:993 bootstrap/init UW died (0)" messages.

When init dies in this manner, the ESXi host is already sufficiently shut down that it is safe to hit a hardware reset or power switch (manually or through DRAC/iLO/IPMI/RSA remote management).

If it happened during an vSphere 4.0 upgrade, the upgrade should continue from where it left off, unless enough time has passed that the upgrade engine timed out. (I have a feeling this is about 15 minutes but I'm not very sure.)

Because the issue is caused by sfcbd, you can avoid it by shutting down sfcbd before starting an upgrade. However, be aware that this very act of shutting down sfcbd could cause the problem! On the gripping hand, it's much less trouble for your already-evacuated ESXi host to hang while shutting down sfcbd manually, outside the vSphere upgrade process.

How can we ship with this problem? Catch 22. To fix the problem on your ESXi 3.5 host we would have to ship you a 3.5 patch, for which the system would have to be shut down, possibly invoking the problem! And you wouldn't be too thrilled to be told that you have to first upgrade 3.5 to (e.g.) Update 5 before even considering vSphere.

The problem is self-correcting in that when the box hangs, you eventually kick it, it wakes up and goes on with its business.

Several aspects of this issue will be corrected in future patch work for 3.5 and 4.0. It's a pretty complex interaction and only needs to be stopped in at least one place.

Field remedy: ensure that sfcbd is not running before starting vSphere upgrade. You can do this by manually shutting it down (if it hangs, reboot and do it again); or by disabling CIM. has a nice writeup.

In the context of a vSphere upgrade you'll want to:

  1. move or shutdown VMs (probably telling VC to put the host into standby mode);

  2. manually stop sfcbd (optionally by disabling it and rebooting); if the empty host hangs at this point, just power cycle and repeat;

  3. start vSphere upgrade.

We've found this issue very difficult to reproduce in the lab(), so it's unlikely you'll need to power cycle even once. ()Not impossible. I'm by no means claiming this problem doesn't exist, only that it's not very common.

I apologize for the inconvenience. I didn't cause it, but I was on the team trying to resolve it just before the GA cut, and we did not succeed in finding a foolproof way of avoiding it in all cases.

>Bela<

View solution in original post

0 Kudos
9 Replies
Dave_Mishchenko
Immortal
Immortal
Jump to solution

I had the same issue during the beta. I don't recall the exact times that it took, but things proceeded OK without any intervention.

imxuk
Contributor
Contributor
Jump to solution

Hey,

How does one get hold of the upgrade-release.zip to use the vSphere Host Update Utility? What are the requirements/restrictions to get it? 😕

Cheers,

Ed

0 Kudos
lurch89
Enthusiast
Enthusiast
Jump to solution

Similarly, anyone know how I can take a running ESXi 3.5 U4 server and upgrade it to ESXi 4.0.0 with no loss of data? Will my local datastores remain intact after the upgrade?

0 Kudos
scoop
Enthusiast
Enthusiast
Jump to solution

Yes, if you use the Host Update Utility, your local VMFS will remain intact (and you can rollback to 3.5 if there's a problem during the install).

0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

Regarding obtaining the .zip, just go to vmware.com, downloads, ESXi 4.0, and it's selectable as an option.

To perform the upgrade without any loss of data or reconfiguration, you need to select the download vCenter option, then a sub-option is to download the vSphere client, including host update package.

I've just run it on a second server. It worked fine with no data loss, except again, the server appeared to blackscreen and hang during the process. I power cycled it and it came up and completed the upgrade.

0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

I'm noting also there doesn't yet appear to be an edition featuring the HP management agents.

0 Kudos
Dave_Mishchenko
Immortal
Immortal
Jump to solution

The oem versions have typically lagged the general release.

0 Kudos
filbo
Enthusiast
Enthusiast
Jump to solution

Hi Josh,

This is a known issue with ESXi 3.5. During any shutdown there is a small chance that it will hang. This is a race condition and for some reason the vSphere 4.0 upgrade agents seem to be particularly good at "winning" the race.

In technical terms what's happening is that the CIM daemon (sfcbd) somehow manages to kill the init process. Since the final stages of ESXi shutdown / reboot are fired off by init, if init is dead the system is unable to shut down.

If you flip to the vmkernel log screen on alt-F12 you'll see complaints about init having died. At the moment of death there is a message that ends in "wantCoreDump : init -enabled : 0"; this only appears once and will eventually scroll out of sight. You will also see one or more "UserCartel: CartelCleanup:993 bootstrap/init UW died (0)" messages.

When init dies in this manner, the ESXi host is already sufficiently shut down that it is safe to hit a hardware reset or power switch (manually or through DRAC/iLO/IPMI/RSA remote management).

If it happened during an vSphere 4.0 upgrade, the upgrade should continue from where it left off, unless enough time has passed that the upgrade engine timed out. (I have a feeling this is about 15 minutes but I'm not very sure.)

Because the issue is caused by sfcbd, you can avoid it by shutting down sfcbd before starting an upgrade. However, be aware that this very act of shutting down sfcbd could cause the problem! On the gripping hand, it's much less trouble for your already-evacuated ESXi host to hang while shutting down sfcbd manually, outside the vSphere upgrade process.

How can we ship with this problem? Catch 22. To fix the problem on your ESXi 3.5 host we would have to ship you a 3.5 patch, for which the system would have to be shut down, possibly invoking the problem! And you wouldn't be too thrilled to be told that you have to first upgrade 3.5 to (e.g.) Update 5 before even considering vSphere.

The problem is self-correcting in that when the box hangs, you eventually kick it, it wakes up and goes on with its business.

Several aspects of this issue will be corrected in future patch work for 3.5 and 4.0. It's a pretty complex interaction and only needs to be stopped in at least one place.

Field remedy: ensure that sfcbd is not running before starting vSphere upgrade. You can do this by manually shutting it down (if it hangs, reboot and do it again); or by disabling CIM. has a nice writeup.

In the context of a vSphere upgrade you'll want to:

  1. move or shutdown VMs (probably telling VC to put the host into standby mode);

  2. manually stop sfcbd (optionally by disabling it and rebooting); if the empty host hangs at this point, just power cycle and repeat;

  3. start vSphere upgrade.

We've found this issue very difficult to reproduce in the lab(), so it's unlikely you'll need to power cycle even once. ()Not impossible. I'm by no means claiming this problem doesn't exist, only that it's not very common.

I apologize for the inconvenience. I didn't cause it, but I was on the team trying to resolve it just before the GA cut, and we did not succeed in finding a foolproof way of avoiding it in all cases.

>Bela<

0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

Thanks for that filbo. I'll employ that field remedy - but I'm at least much more satisfied that my power cycle didn't interupt some eight hour service I was supposed to have waited for.

0 Kudos