VMware Cloud Community
cykVM
Expert
Expert

HP Proliant DL380e Gen8, HP OEM VMWare ESXi 5.5 Update 2 keeps crashing (PSOD)

Hello everyone,

I maintain a single VMWare host running vSphere 5.5 (ESXi) Update 2 OEM HP version at the moment for a mid-size charity.

The hardware in use:

HP Proliant DL380e Gen8 (bought brand new in August 2014), HP SmartArray B320i storage controller, HP H222 host bus adapter (only a HP Ultrium4 tape drive connected to that), HP Intel 4port NIC 366i, 32GB RAM, 2 Quadcore Intel Xeon E5-2407

The box was initially installed and configured in August using HP OEM vSphere 5.5 Update 1 installation CD. vSphere is installed on the RAID array configured on the B320i controller. A VMWare Essentials license is also in use/installed.

It's running 3 Windows 2008 R2 VMs (DC, Exchange 2010 and a backup server with Backup Exec 2010 R3 [I know this is not a recommended/supported configuration, but it worked with 5.5 U1 without issues]) besides 2 Debian Linux VMs.

2 weeks ago during weekend maintenance I first installed the latest HP SPP (Service Pack for Proliant) Sept. 2014 which provided several firmware updates for e.g. the B320i, the 366i NIC etc.

After that I performed an upgrade instalölation of vSphere HP OEM 5.5 Update 2 version, which was also released by HP beginning of Sept..

All those setup/update procedures went through without any issues, error messages or crashes.

The host was running fine for 3 days and suddenly crashed with a PSOD stating: PCPU 0: no heartbeat (2/2 IPIs received) [unfortunately I did not take a screenshot]

I reset/rebooted the host through iLo4 console and kept an eye on the server the next days.

The first PSOD took place during daily (nightly) backup on the connected tape drive.

On the following Friday/Saturday night (about 2 days later) it crashed again with the following PSOD - again with PCPU 0: no heartbeat (2/2 IPIs received):

PSOD1.PNG

So I started investigating this, found some hints here in the VMWare communities leading to recommended BIOS settings of HP Proliant servers and checked the actual settings and changed the values to the recommended ones. The server was running fine without gliutches for about 16 hours then crashed again with this PSOD:

PSOD2.PNG

I continued investigation, and especially took an eye on power management setting in BIOS, vSphere and in the Windows VMs.

Also checked installed firnware versions of the storage controllers and NIC and driver versions in use. All OK there (as recommended in HP VMWare recipe Sept. 2014).

Server was running fine for about a week after the reboot then another PSOD early this morning at about 3 a.m.:

PSOD3.PNG

The server/VMs were mostly idle at this time, no heavy I/O activity.

The first two PSODs happened during backup but not at a certain time (one at about 10 p.m. the other early in the morning between 2 and 3 a.m.).

I read through tons of hints to faulty NIC drivers/firmware, BIOS confgurations etc. but nothing helps or even everything is configured exactly as in HP recommondations for vSphere 5.x.

For the BIOS settings I followed this list/table:Recommended BIOS Settings on HP ProLiant DL580 G7 for VMware vSphere | Boerlowie's Blog

vSphere is configured to "High Performance Mode" and the Windows VMs, too.

I'm somehow stuck now, so maybe someone here has a good hint for me?

If you need any further hardware/software/configuration/whatever details, just ask.

Cheers and thanks in advance for any help,

cykVM

122 Replies
iofhua
Contributor
Contributor

I'm using a Proliant DL360e Gen8

I forgot to knock on wood yesterday when I said it only crashes once every couple weeks or so. It didn't make it one day this time. I had to restart it again this morning.

This one has the heating server on it for our school district. It's not a priority to have it online right now, but when winter comes I don't want it crashing over the weekend. So I guess I have to do a fresh install of 5.5 u1?

Is there a way I can backup my VM's in ESXI 5.5? If I could drop them onto an external drive or something and then import them back into the new install of ESXI that would work I guess.

They should release a ESXI 5.5 u3 that fixes these problems. This thread has been going on since September last year. It's been almost a year now.

0 Kudos
cykVM
Expert
Expert

Was the same for me, sometimes the server stayed up for severals days and other times it crashed after 24 to 48 hours.

You may check VMware KB: Backing up and restoring ESXi configuration using the vSphere Command-Line Interface and ... for backing up the host config if you need to do that.

This does not take a backup of the VMs.

Depending on the VMWare license in use (free or Essential/Enterprise license) you may backup the VMs using your favourite VMWare.enabled backup solution (Veeam, Backup EXEC or similar). With a free license running you need to shut down the VMs and copy them over to your external HDD connected to your admin workstation. This might be VERY slow so be sure to have some maintenance time available.

With an install from scratchg the existing datastore is not touched and you may add your VMs back to inventory after 5.5 u1 (HP customized) is installed.

0 Kudos
iofhua
Contributor
Contributor

Thank you cykVM. I will take a look at that article and figure out what I need to do. It's good to know that the datastore shouldn't be touched during a fresh install of ESXI. But just to be safe I should probably have a backup of them anyway.

0 Kudos