VMware Cloud Community
PXSRB
Contributor
Contributor

Upgrade 5.1 U2 to 5.5U2, crashing hostd and vpxa

Hello I'm,

I'm having a strange issue when trying to upgrade my ESXI5.1 farm to 5.5.

13 hosts in the farm, each HP DL380 G7 with 2x Xeon E5649 and 144Gb of memory. ESXi installed on local datastore (on HP p410i raid controller) and connected to my EMC san with 2x qlogic HBA for the VM datastores.

Latest firmwares applied.

All hosts running perfect for months (if not years) on ESXi 5.1. Upgraded the boxes to 5.5 U2 using update manager. 12 boxes upgraded without any problem and are working fine.

One box refuses to upgrade. Update manager starts, via the console I see the box rebooting and updating, it comes back as a 5.5 with correct build number but does not come online again in vCenter (stays disconnected).

Ping and ssh are ok, direct vShphere client to the ESX is not working, nor is the web client.

With ssh I checked the /var/log/hostd.log and /var/log/vmkernel.log and it seems to me that hostd and/or vpxa are crashing:

cpu20:33831)User: 2888: wantCoreDump : vpxa-worker -enabled : 1

cpu20:33831)UserDump: 1820: Dumping cartel 33831 (from world 33831) to file /var/core/vpxa-worker-zdump.000 ...

Restarting the management services enables me to login to the host again with the vShpere client, but get kicked out after some seconds, only to discover the same errors in the logs again.

I already found some discussions about a faulty iSCSI software initiater provided by ima-be2iscsi that could explain this behaviour. See vmware 5.5 u1 - iScsi adapter crashes the system

However this reflects to an update done with the 5.5 U1 Driver Rollup ISO, while I'm using the full 5.5 U2 iso provided by HP. Besides, 12 hosts upgraded fine and one is giving a problem... Tried the solution provided there but no luck.

I already tried to install 5.5 U2 from scratch with the ISO, but that failed too with this message (after completing the wizard of typing a password and selecting a disk to install):

esx_upgrade.jpg

It stays there forever. Hitting 'enter' brings me back to 'Welcome to the installer' screen.

Upgrading or plain installing with the 5.5 U1 ISO has exactely the same problem, but again only on this host.

After my failed upgrade to 5.5 U2 with the update manager I rebooted the host, selected shift+R during boot to enter recovery and booted the 5.1 U2 again. The server is now running as smooth as before... However I'm still stuck with a 5.1 U2 host in my 5.5 farm.

Hardware health reporting ok during bios post, nothing in the ILO either. (sounds logical to me as 5.1 is working perfect). My HP case got closed because of this.

A problem with HP customized ISO seems also strange to me as it is working on all other hosts.

I hope somebody can help me on this ?

Thanks!

0 Kudos
3 Replies
FritzBrause
Enthusiast
Enthusiast

Since you were able to SSH into the host, did you check /var/log/esxupdate.log for any errors?

Is the BIOS maybe different on this host?

Any other warnings in /var/log/vmkwarning.log or /var/log/vmkernel.log?


0 Kudos
Alistar
Expert
Expert

Hi there,

I guess the simplest way would be to totally wipe your local storage - ideally via breaking/redoing the RAID and installing fresh from scratch - and re-installing ESXi from the ISO again. Seems something might have gone wrong with the software modules during this ESXi's lifetime and the installer doesn't want to initialize because of that.

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
0 Kudos
PXSRB
Contributor
Contributor

3 system boards and 2 CPUs later, it seemed to be an hardware issue with one of the CPUs.

Finally after replacing the CPUs the problem is completely gone.

Really strange that 5.1 was running perfectly on the 'faulty' CPU while 5.5 is having a problem with it...

0 Kudos