Solved: PSOD after U1 upgrade by updatemanager

Oletho · ‎11-21-2009

Update manager install timed out during remediate of the first host that I tried to update.

Now I get a PSOD on the first part of boot sequence, right after scheduler is loaded. The server can boot into troubleshooting mode, but I have not had the time to investigate yet.

Any suggestions before I try another host?

Server: HP BL495c G5

Ole Thomsen

peter_holmstrom · ‎11-24-2009

Ok, so we have tested a lot of things and it seems to be the HP Agent .. either uninstalling it or just do a..

service hp-health stop

service hpsmhd stop

service hp-snmp-agents stop

.. and the updates get's installed and works fine..

So just remember to do that when using Update Manager and you'll probably will be fine.

//Peter

View solution in original post

peter_holmstrom · ‎11-21-2009

Exactly the same problem here but on a BL460c G5.

Iv'e updated 7 BL460 G6 without any problems with Update Manager.

//Peter

Scissor · ‎11-21-2009

Maybe check for any BIOS/Firmware updates for your Host?

uzimmermann · ‎11-21-2009

Same problem here. BL460c g6, update timed out. Reboot caused PSOD. Had to reinstall from scratch. Second system has the same problem. After 15 minutes times out. esxupdate.log shows this in the last few lines:

DEBUG: cos.rpm: Cleaning up vmware-esx-iscsi
DEBUG: cos.rpm: Cleaning up bind-utils
DEBUG: cos.rpm: Cleaning up vmware-esx-docs
DEBUG: cos.rpm: Cleaning up vmware-esx-drivers-scsi-mptsas
DEBUG: cos.rpm: /usr/sbin/vmkmod-install.sh
DEBUG: cos.rpm: Cleaning up vmware-esx-drivers-block-cciss
DEBUG: cos.rpm: /usr/sbin/vmkmod-install.sh
DEBUG: cos.rpm: Cleaning up vmware-esx-vmx
DEBUG: cos.rpm: Cleaning up vmware-esx-cim
DEBUG: cos.rpm: Cleaning up vmware-esx-backuptools
DEBUG: cos.rpm: Cleaning up pam_passwdqc
DEBUG: cos.rpm: Cleaning up curl
DEBUG: cos.rpm: Cleaning up vmware-esx-drivers-ata-libata
DEBUG: cos.rpm: /usr/sbin/vmkmod-install.sh
DEBUG: cos.rpm: Cleaning up glibc

And it has been not moving since then. Over 35 minutes now. On hold with Vmware support for > 25 minutes now.

uzimmermann · ‎11-21-2009

Sigh .. VMware really needs better trained people in support. After 32 minutes waiting for a live tech, the person I got is completly clueless. Wants to just go and kill the still running esxupdate without really know what it will do, including a reboot, which I am pretty sure in the current state will lead to the same PSOD.

uzimmermann · ‎11-21-2009

Ok, I made some progress on my own, not due to help from VMware support.

As shown above the esxupdate was hanging after the clean up glibc. strace showed it waiting in a FUTEX. Killing esxupdate and trying to rescan the host from Update Manager would do the same. I deleted the /var/lib/rpm/__db* files, and ran rpm --rebuilddb. I then ran by hand

/usr/sbin/vmkmod-install.sh

/usr/sbin/cim-install.sh

esxcfg-boot -b

Now scanning the host from Update Manager returns all patches are applied and rebooting the host does not lead to a PSOD.

peter_holmstrom · ‎11-22-2009

The G6 that worked fine when i updated them via Update Manager wasnt updated at all before the Update 1, just newly installed .. ESX 16.. the G5 that had the problem had all the updates before Update 1.. ESX 17..

Havent had any time to look anything more at the problem due to some other problems last night.

Going back and i think ill be reinstalling all the G5's insted.

uzimmermann · ‎11-22-2009

I don't think the problem is the platform but when it was installed. The hosts I had problems so far with, were installed shortly after ESX 4 was released. One of the updates between the original release and the Update 1, potential glibc related, is causing the rpm issue. I have seen similar issues on RedHat EL3 years ago where rpm would hang on the database and deleting the __db* files, rebuilding the database worked.

The first host, which ended up with PSOD, probably due to the esxcfg-boot -b not being run, got reinstalled with ESX 4 (release), then updated to Update 1 (only 16 of 57 patches were applied).

Ulf.

uzimmermann · ‎11-22-2009

On my 3. host it started blocking on rpm database earlier on, when trying to install esx tools. Same problem, futex waiting forever. Killed esxupdate, removed __db* files, rebuilddb, Let Update Manager rescan and restart patching, it continued on at esx tools and finished this time.

Host currently rebooting.

rexit · ‎11-22-2009

Exact Same problem on DL380G5. I am a huge moron though and though it was jsut one host, ended up manualy rebuilding that one and jsut tried the update on a second box and it looks like it is going to time out as well.

Oletho · ‎11-22-2009

Does any of you use the HP agents, and did you uninstall them before upgrading?

Mine were running during install, and I wonder if they could have anything to do with the mess.

My pre-upgrade check did not report any problem.

Ole Thomsen

KDkul · ‎11-23-2009

(only 16 of 57 patches were applied).......you might have tried staging them in VUM which might have resulted in staging 16 out of 57. However, if you remediate them you'll see all the 57 as installed.

Refer : http://www.vmware.com/support/vsphere4/doc/vsp_vum_40u1_rel_notes.html

It says :

When you stage a baseline that contains multiple bulletins, some bulletins might be shown as missing

After the successful staging of a baseline that contains multiple

bulletins, some bulletins might be shown as Staged and others as

Missing. The reason for the discrepancy is that bulletins might contain

installation bundles, which differ in their versions, but are for the

same component. In such cases, Update Manager stages only the newest

versions of the installation bundles. The bulletins that contain the

obsolete versions of installation bundles are marked as Missing and are

not staged. When you remediate the baseline, Update Manager ignores the

old versions of installation bundles and only installs the latest

versions. The bulletins marked Missing become Installed because the new

versions of the installation bundles are remediated.

VMStr · ‎11-23-2009

Hi gents

Currently I am testing U1 in our test environment and had the same issue on the first host (HP DL385 G2) I have updated!

Update Manager had a time out, after reboot server went to a PSOD.

Just reinstalled the server, applied pre-U1 patches and than U1 - without issues this time.

Let's see how the next server will update.

CU

steveanderson3 · ‎11-23-2009

We experienced this same issue, on HP DL380 G5. Got a psod using update manager, never got the server back. This was a freshly installed ESX 4.0 that never had a vm on it. I'm wondering if there is something to what was mentioned about insight manager agents. I did not uninstall prior to the upgrade attempt.

steveanderson3 · ‎11-23-2009

I'm also wondering if there is an issue trying to update an ESX host to 4.0 update 1, being managed by vCenter 4 non-update 1. I'm not able to find any documentation on that topic.

uzimmermann · ‎11-23-2009

I have seen this rpm database problem on EL3 machines which did not run HP agents (Supermicro servers). And I don't see that the issue is introduced by VUM itself, as VUM just calls esxupdate locally on the host. Unless it is because VUM calls it through an interface which gets updated. Hmmm.

The first host, which timed out and then got rebooted (by hand), experienced the PSOD because the boot config wasn't rebuild and so the software pieces weren't matching. Rebuilding the boot config from the diagnostic console didn't help. I reinstalled that host with ESX 4.0 (original release as I hadn't downloaded the U1 DVD yet), redid all the configuation and installed the HP management agents. Then I patched it through VUM. It staged 16 of 57 patches, which went through fine (with the agents running). The end result was 40 patches applied with 17 being obsolete.

edit Actually let me rephrase the PSOD part. It depends on where the esxupdate gets stuck at and then the host getting rebooting. On one host the esxupdate got stucked pretty far down the line, where it was just doing clean up and one of the last commands, which hadn't run yet, was the esxcfg-boot command. So running it by hand made that host ok, during reboot time. My first host might have gotten stuck earlier on, i.e. in the middle of installing updates. This is what happened on my 3rd host I attempted. This 3rd host was stucked at installing esxtools. Killing the esxupdate, removing the __db* files from /var/lib/rpm, running rpm --rebuilddb, then rescanning/repatching made esxupdate continue at esxtools and it finished correctly that time.

PTEastvale · ‎11-23-2009

I have the exact same issue last week. My server is HP DL 380 G5. The server was upgraded from ESX 3.5U4 to ESX 4.0 (with no issue). When i applied the 11/19/09 patches along with ESX 4.0 update 1 that's when all hell broke loose.

I have a support case opened with HP/Vmware support now. My recommended course of action is to rebuild this server from scratch. However, since i upgraded from ESX 3.5 to 4.0 last week, I have the option to boot back into ESX 3.5. I was able to boot back into ESX 3.5 with no issue so HP will run this in a lab and determine whether i can "re-upgrade" from my ESX 3.5 again.

uzimmermann · ‎11-23-2009

For the first 3 hosts, my vCenter server was still on 4.0, for the next 2 hosts it was 4.0 U1. Host number 4 hang again at the end during clean up. Going through the above steps returned all patches were applied, so I can esxcfg-boot -b manual and rebooted. Host number 5 seem to hang during the install of a package, but as soon I attached strace to it, it started moving and finished completly on it's own. Maybe I was just too impatient. But all 5 hosts had the HP management agents installed and running. Host number 3 and 4 were ESX 3.5 which got upgraded to ESX 4. Host 1 through 3 were clean installs in May from ESX 4 release.

hmundt · ‎11-24-2009

>I'm also wondering if there is an issue trying to update an ESX host to 4.0 update 1, being managed by vCenter 4 non-update 1

Not an issue per the compatibility matrix (http://www.vmware.com/pdf/vsphere4/r40/vsp_compatibility_matrix.pdf), page 3

peter_holmstrom · ‎11-24-2009

Ok, so we have tested a lot of things and it seems to be the HP Agent .. either uninstalling it or just do a..

service hp-health stop

service hpsmhd stop

service hp-snmp-agents stop

.. and the updates get's installed and works fine..

So just remember to do that when using Update Manager and you'll probably will be fine.

//Peter

All

PSOD after U1 upgrade by updatemanager