VMware Cloud Community
cykVM
Expert
Expert

HP Proliant DL380e Gen8, HP OEM VMWare ESXi 5.5 Update 2 keeps crashing (PSOD)

Hello everyone,

I maintain a single VMWare host running vSphere 5.5 (ESXi) Update 2 OEM HP version at the moment for a mid-size charity.

The hardware in use:

HP Proliant DL380e Gen8 (bought brand new in August 2014), HP SmartArray B320i storage controller, HP H222 host bus adapter (only a HP Ultrium4 tape drive connected to that), HP Intel 4port NIC 366i, 32GB RAM, 2 Quadcore Intel Xeon E5-2407

The box was initially installed and configured in August using HP OEM vSphere 5.5 Update 1 installation CD. vSphere is installed on the RAID array configured on the B320i controller. A VMWare Essentials license is also in use/installed.

It's running 3 Windows 2008 R2 VMs (DC, Exchange 2010 and a backup server with Backup Exec 2010 R3 [I know this is not a recommended/supported configuration, but it worked with 5.5 U1 without issues]) besides 2 Debian Linux VMs.

2 weeks ago during weekend maintenance I first installed the latest HP SPP (Service Pack for Proliant) Sept. 2014 which provided several firmware updates for e.g. the B320i, the 366i NIC etc.

After that I performed an upgrade instalölation of vSphere HP OEM 5.5 Update 2 version, which was also released by HP beginning of Sept..

All those setup/update procedures went through without any issues, error messages or crashes.

The host was running fine for 3 days and suddenly crashed with a PSOD stating: PCPU 0: no heartbeat (2/2 IPIs received) [unfortunately I did not take a screenshot]

I reset/rebooted the host through iLo4 console and kept an eye on the server the next days.

The first PSOD took place during daily (nightly) backup on the connected tape drive.

On the following Friday/Saturday night (about 2 days later) it crashed again with the following PSOD - again with PCPU 0: no heartbeat (2/2 IPIs received):

PSOD1.PNG

So I started investigating this, found some hints here in the VMWare communities leading to recommended BIOS settings of HP Proliant servers and checked the actual settings and changed the values to the recommended ones. The server was running fine without gliutches for about 16 hours then crashed again with this PSOD:

PSOD2.PNG

I continued investigation, and especially took an eye on power management setting in BIOS, vSphere and in the Windows VMs.

Also checked installed firnware versions of the storage controllers and NIC and driver versions in use. All OK there (as recommended in HP VMWare recipe Sept. 2014).

Server was running fine for about a week after the reboot then another PSOD early this morning at about 3 a.m.:

PSOD3.PNG

The server/VMs were mostly idle at this time, no heavy I/O activity.

The first two PSODs happened during backup but not at a certain time (one at about 10 p.m. the other early in the morning between 2 and 3 a.m.).

I read through tons of hints to faulty NIC drivers/firmware, BIOS confgurations etc. but nothing helps or even everything is configured exactly as in HP recommondations for vSphere 5.x.

For the BIOS settings I followed this list/table:Recommended BIOS Settings on HP ProLiant DL580 G7 for VMware vSphere | Boerlowie's Blog

vSphere is configured to "High Performance Mode" and the Windows VMs, too.

I'm somehow stuck now, so maybe someone here has a good hint for me?

If you need any further hardware/software/configuration/whatever details, just ask.

Cheers and thanks in advance for any help,

cykVM

122 Replies
basicfreeze
Contributor
Contributor

@Tebi

Any luck with 6?

0 Kudos
cykVM
Expert
Expert

Looks like ESXi 6 makes similar trouble out of the box. At least it was reported for the B120i SmartArray controller with the included -92 hpvsa driver, see Re: Very slow acces to datastores on HP MIcroserver Gen8. Can't edit System Resource Reservation wit...

A downgrade to -88 driver seems to help. So far it looks like the -90 and -92 driver version is buggy.

0 Kudos
basicfreeze
Contributor
Contributor

@cykVM

Thanks for the quick response! Like many others in this thread, I've been running 5.5 U 1 stable for a few months now (thanks for that!). Trying to avoid the jump to 6 if I can (5.5 preferred) but I'm at a point where I would like to start patching ESX.

I'm kind of new to ESX and didn't realize that driver roll back was an option. I have a spare DL360e with a B320i. I'm going to try the latest and greatest from HP (3/30 5.5 U2 release), test for -92 speed issues and pending any, use the 88 vib and the rollback instructions in the thread you linked too. Fingers crossed..

0 Kudos
basicfreeze
Contributor
Contributor

So far so good with 5.5 U2 and the -88 vib. Running stable for 6 hours and no issues with VM provisioning or file transfers. I'm going to watch it for a couple days before I attempt this roll out to production though. Wanted to note that I ran into a 'cannot merge VIBs' error with the Mellanox_bootbank_net-mst_2.0.0.0-10EM.550 vib when updating from U1 to U2. Uninstalled it, rebooted, and re ran the U2 update with no issues.

While I as at it I thought I'd throw in some more B320i speed issue evince. Here are a few graphs confirming what we all already knew (vsphere client -> Performance -> disk (default options)). The file transfers that I measured here were done with a 4.28 GB ISO, VM provisions were done with a copy of the same ISO. Almost all of these transfers were done with explorer but the results are still pretty obvious (write and read rate min and max).

5.5U1 -88 hpvsa

5.5U1-88hpvsa.jpg

5.5 U2 -92 hpvsa

5.5U2-92hpvsa.jpg

5.5 U2 -88 hpvsa

5.5U2-88hpvsa.jpg

cykVM
Expert
Expert

Thanks for the feedback and the charts proving the performance hits with the -9x hpvsa driver.

0 Kudos
Rymsza
Contributor
Contributor

I installed ESXi 6 in DL380e with B320i using HP custom (hpvsa 92) and had the same problem. Server very slow.
I installed hpvsa 88 and performance is better.
Running stable for 18 hour and now he's purple screen

0 Kudos
cykVM
Expert
Expert

Do you have a screenshot of the PSOD?

Does not necessarily be hpvsa driver's fault. I also read about some issues with some HP 366i NICs and VMWare 6.

My personal opinion from reading several posts in ESXi 6 community is that version 6 is often pretty unstable.

0 Kudos
Rymsza
Contributor
Contributor

new available driver (hpvsa 98). someone already installed?

Drivers & Software - HP Support Center.

0 Kudos
Rymsza
Contributor
Contributor

See the screen!!

0 Kudos
cykVM
Expert
Expert

Is your firmware and BIOS also on the latest versions, especially for the iLo card?

0 Kudos
Rymsza
Contributor
Contributor

I think so.
I ran SPP 2015.04

System ROM P73
System ROM Date 08/02/2014
Backup System ROM Date 08/02/2014
Integrated Remote Console .NET Java
License Type iLO 4 Standard
iLO Firmware Version 2.10 15 Jan 2015

0 Kudos
cykVM
Expert
Expert

Is some sort of power management activated in server's BIOS? Looks to me that it's shutting down idle core(s) and dying afterwards.

Try "HP static high performance mode", for further BIOS settings you may read: Recommended BIOS Settings on HP ProLiant DL580 G7 for VMware vSphere | Boerlowie's Blog

0 Kudos
Rymsza
Contributor
Contributor

In the Bios and vmware settings are already high performance.
Yesterday I tried to install hpvsa 98, but the disks does not appear. and the next reboot it automatically back to the old version hpvsa.
After that installed hpvsa 86, and is already working 24 hours without problems.

0 Kudos
Tebi
Contributor
Contributor

basicfreeze Hi, Im sorry Im late! I haveno problems with esxi 6.0, do you think Im going to?

0 Kudos
cykVM
Expert
Expert

Tebi‌ What hardware (server type/model, storage controller ...) are you on? Are there really no performance issues with the hpvsa driver 9x on your VMWare 6 host? Are you using local or external storage?

0 Kudos
Tebi
Contributor
Contributor

Hi, sorry Im late. Now the server is far away from me, running the production, but of course its still my issue if it crashes. Anyway I have a pic of server's performance. About what you asked to me, Im using internal storage!

I dindt like it when I saw this, but its really the worst situation I could see.

Performance VMware Petrocuyo.png@

0 Kudos
cykVM
Expert
Expert

Thanks for the feedback. I think your performance charts are for CPU performance and not disk/datastore performance.

I had the impacts mainly on backup which had more than double the runtime.

0 Kudos
Tebi
Contributor
Contributor

I had those impacts everytime I got connected remotely by remote desktop. The system is running properly by now since May, but I can´t say it wont crash.

Unfortunately I cant get more performance images by now.

Regards, keep in touch for news!

0 Kudos
iofhua
Contributor
Contributor

Hi I'm running ESXI 5.5 u2 on a HP proliant server. It does intermittently crash with the purple "no heartbeat" screen about once every couple weeks. Unfortunately I did a fresh install of ESXI 5.5 u2 - I just picked the latest version as at the time I had no idea there were issues with this version on Proliant servers. Is there anyway I can downgrade to u1 at this point? Or am I screwed? Is there anything I can do to prevent the purple screens?

0 Kudos
cykVM
Expert
Expert

What type/model of HP Proliant do you have? And which storage controller is in use?

In general there is no direct downgrade, it's basically installing the previous version over the installed one.

0 Kudos