VMware Cloud Community
cykVM
Expert
Expert

HP Proliant DL380e Gen8, HP OEM VMWare ESXi 5.5 Update 2 keeps crashing (PSOD)

Hello everyone,

I maintain a single VMWare host running vSphere 5.5 (ESXi) Update 2 OEM HP version at the moment for a mid-size charity.

The hardware in use:

HP Proliant DL380e Gen8 (bought brand new in August 2014), HP SmartArray B320i storage controller, HP H222 host bus adapter (only a HP Ultrium4 tape drive connected to that), HP Intel 4port NIC 366i, 32GB RAM, 2 Quadcore Intel Xeon E5-2407

The box was initially installed and configured in August using HP OEM vSphere 5.5 Update 1 installation CD. vSphere is installed on the RAID array configured on the B320i controller. A VMWare Essentials license is also in use/installed.

It's running 3 Windows 2008 R2 VMs (DC, Exchange 2010 and a backup server with Backup Exec 2010 R3 [I know this is not a recommended/supported configuration, but it worked with 5.5 U1 without issues]) besides 2 Debian Linux VMs.

2 weeks ago during weekend maintenance I first installed the latest HP SPP (Service Pack for Proliant) Sept. 2014 which provided several firmware updates for e.g. the B320i, the 366i NIC etc.

After that I performed an upgrade instalölation of vSphere HP OEM 5.5 Update 2 version, which was also released by HP beginning of Sept..

All those setup/update procedures went through without any issues, error messages or crashes.

The host was running fine for 3 days and suddenly crashed with a PSOD stating: PCPU 0: no heartbeat (2/2 IPIs received) [unfortunately I did not take a screenshot]

I reset/rebooted the host through iLo4 console and kept an eye on the server the next days.

The first PSOD took place during daily (nightly) backup on the connected tape drive.

On the following Friday/Saturday night (about 2 days later) it crashed again with the following PSOD - again with PCPU 0: no heartbeat (2/2 IPIs received):

PSOD1.PNG

So I started investigating this, found some hints here in the VMWare communities leading to recommended BIOS settings of HP Proliant servers and checked the actual settings and changed the values to the recommended ones. The server was running fine without gliutches for about 16 hours then crashed again with this PSOD:

PSOD2.PNG

I continued investigation, and especially took an eye on power management setting in BIOS, vSphere and in the Windows VMs.

Also checked installed firnware versions of the storage controllers and NIC and driver versions in use. All OK there (as recommended in HP VMWare recipe Sept. 2014).

Server was running fine for about a week after the reboot then another PSOD early this morning at about 3 a.m.:

PSOD3.PNG

The server/VMs were mostly idle at this time, no heavy I/O activity.

The first two PSODs happened during backup but not at a certain time (one at about 10 p.m. the other early in the morning between 2 and 3 a.m.).

I read through tons of hints to faulty NIC drivers/firmware, BIOS confgurations etc. but nothing helps or even everything is configured exactly as in HP recommondations for vSphere 5.x.

For the BIOS settings I followed this list/table:Recommended BIOS Settings on HP ProLiant DL580 G7 for VMware vSphere | Boerlowie's Blog

vSphere is configured to "High Performance Mode" and the Windows VMs, too.

I'm somehow stuck now, so maybe someone here has a good hint for me?

If you need any further hardware/software/configuration/whatever details, just ask.

Cheers and thanks in advance for any help,

cykVM

122 Replies
cykVM
Expert
Expert

Ok, you have to ask support for the update, that's why I could not find any download location.

0 Kudos
AlbertWT
Virtuoso
Virtuoso

ok, so what's the filename that you applied to the Server Firmware ?

/* Please feel free to provide any comments or input you may have. */
0 Kudos
sunonfire
Contributor
Contributor

I have upgraded my HP ESXi from 5.5 U2 to ESXi 5.5 U1 this Monday night, after 4 days unfortunately it still got the screen below this morning. any suggestion would appreciated.

Here is the HP hardware spec:

HP ML350G8E E5-2407V2(1/2) 16GB 2*1TB SATA-3.5 WS2012 STD
SERIES ML350E GEN8

PROCESSOR TYPE Intel Xeon E5-2407
v2(2.4GHz/4-core/10MB/6.4GT-s QPI/80W, DDR3-1333) Processor

NUMBER OF PROCESSORS 1 (1 OF 2)

MEMORY AMOUNT 16GB

MEMORY CONFIG 2 x 8 GB

MEMORY TYPE DDR3-1600LV RDIMM

INTERNAL HDD (2) 1TB 7.2K LFF SATA HDD

INTERNAL DRIVE BAYS4 LFF drive bays

COMPATIBLE HDD LFF HP SATA

DYNAMIC SMART ARRAY B120I/512MB FBWC SATA CONTROLLER (RAID
0/1/1+0/5)

HOT PLUG CAPABLE YES

CHASSIS TYPE 5U TOWER

NETWORK INTERFACE (1) 361I ETHERNET 1GB DUAL PORT

OPERATING SYSTEM MS WIN SERVER 2012 R2 STANDARD

0 Kudos
cykVM
Expert
Expert

Hi,

sunonfire schrieb:

I have upgraded my HP ESXi from 5.5 U2 to ESXi 5.5 U1 this Monday night, after 4 days unfortunately it still got the screen below this morning. any suggestion would appreciated.

What do you mean by "upgraded [...] from 5.5 U2 to [...] 5.5 U1"?

Did you use the SHIFT+r method for altbootbank?

From the (PSOD) screenshot you are running ESXi 5.5.0 build 20681690 which is 5.5 U2 (see either VMware KB: Correlating VMware products build numbers to update levels or VMware ESXi 5.5 Update 2 Release Notes)

So your "upgrade" or better downgrade went wrong.

It's generally not possible to downgrade to a prior version by running an "Upgrade" installation from the older versions installation CD.

The only options you have to perform a downgrade to a prior version (e.g. from 5.5 U2 to 5.5 U1) are:

Doing an "upgrade" from 5.5 U1 cd won't replace the newer version installed, it should even give you warnings/errors.

0 Kudos
cykVM
Expert
Expert

AlbertWT schrieb:

ok, so what's the filename that you applied to the Server Firmware ?

I just found traces that it was distributed via HP SUM (Software Update Manager). The Update installer for Windows should be cp024537.exe and for Linux it's cp024540.scexe

both containing the bin file ilo4_210.bin.

Really wonder why there is no official download location available.Just found this python script: python-hpilo · iLO4 firmware 2.10 · 5742aa6

And besides jawad has a completely different hardware configuration and even had different PSODs on his Bladeservers BL 460Gen8. See Post 50: Re: HP Proliant DL380e Gen8, HP OEM VMWare ESXi 5.5 Update 2 keeps crashing (PSOD)

0 Kudos
jawad
Contributor
Contributor

Hope it helps Smiley Happy

In the shadows...
0 Kudos
sunonfire
Contributor
Contributor

Thanks for the info.

I installed the ESXi 5.5 u2 from beginning on the host, so I cannot use SHIFT+R at boot time to roll back.

Looks like the good option for me is to backup the host configuration, then do a fresh install of 5.5 U1 and import configuration.

Cheers

0 Kudos
siegfriedLH
Contributor
Contributor

Thank you!

0 Kudos
cykVM
Expert
Expert

You're welcome. But it would help to describe what exactly was your problem and how you solved it? Smiley Wink

0 Kudos
vlho
Hot Shot
Hot Shot

Hi,

today Hp released new driver version 5.5.0.92 for Smart Array B120i/B320i Controller:

http://h20564.www2.hp.com/hpsc/swd/public/detail?sp4ts.oid=5258668&swItemId=MTX_c29cdfe1761443408615...

Try...

0 Kudos
cykVM
Expert
Expert

Thanks for the information, vlho. Will give that a try at next possible maintenance.

0 Kudos
dospavlos
Contributor
Contributor

I was able to fix this no heatbeat issue on a BL460c G7.  I was receiving the PSOD during ESXi installation; issue occurred whether I tried to run 5.5 U1 or 5.5 U2.  I tried installing ESX 4.1 U2 but the install would stall at 28% trying to load network drivers.  I tried all the BIOS changes suggested in this thread (power settings, VT-d, etc.).  Tried resetting system to default through BIOS.  Tried recreating local drive array.  All without sucess. My fix was to upgrade the firmware on the embedded FlexFabric Embedded Ethernet NICs. I was running NIC firmware version 4.x and upgraded to latest on HP's website: 10.2.340.22 using package 'OFFLINE Firmware image (.iso) for HP Emulex Converged Network Adapters and Network Adapters (American, International)'.  This upgrade dramatically reduced boot time and fixed the PSOD, no heartbeat issue for my server.

0 Kudos
cykVM
Expert
Expert

Thanks for the information but I'm afraid this does not really help with my DL380e Gen8 and the other non-blade systems mentioned here, because they usually do not have Emulex cards.

0 Kudos
cykVM
Expert
Expert

In between HP released a new SPP 04/2015. The 2.10 firmware for the iLo4 card is included and also available as a separate download now.

There is a freshly generated customized HP VMWare 5.5 U2 installation image available including the -92 hpvsa driver.

0 Kudos
Tebi
Contributor
Contributor

Hello, I started with this issue a few weeks ago. Unfortunately, I have to solve it really quickly now!

Thanks for all your help, now I know that have a chance running the 5.5.0 U1 version, but want to know anyway, have you test that

2.10 firmware for the iLo4 card  included and also available as a separate download.?

Is there really a freshly generated customized HP VMWare 5.5 U2 installation image available including the -92 hpvsa driver that works it out??

Thanks again!

0 Kudos
cykVM
Expert
Expert

I already installed iLo 2.10 firmware besides some other updates included in SPP 04/2015.

But had no time so far to test the upgrade to VMWare 5.5 Update 2, yet.

The installation image for HP customized 5.5 Update 2 found here: VMware vSphere 5: Private Cloud Computing, Server and Data Center Virtualization  includes the hpvsa -92 driver, you can view the contents-list by clicking on "Read more".

There's no confirmation yet if the hpvsa -92 driver fixes the problem.

0 Kudos
cykVM
Expert
Expert

Just to clarify: Since my rolled-back 5.5 Update 1 is running very stable on the production host, I still have concerns about upgrading it to 5.5 U2. Was waiting for someone here to confirm that it's running stable with new firmware(s), BIOS and drivers before messing around with it. So far no-one tried the latest updates and confirmed the PSODs disappeared, so I just kept with 5.5 Update 1.

0 Kudos
Tebi
Contributor
Contributor

Ok thanks again!

I'll check the logs carefully before proceed, and  then may be will go to U1. Anyway I'll be waiting for news about that bug.

0 Kudos
sunonfire
Contributor
Contributor

on Sunday, 3 May 2015, I have upgraded my HP Proliant ML350e Gen8 to ESXi 6.0 (Download VMware vSphere), it went OK so far.

Have installed the ESXi 5.5 U2 on many DELL servers, never had the PSOD issue, feel a bit sorry for HP in resolving the issue.

0 Kudos
Tebi
Contributor
Contributor

Hello, I hope your server is still working ok, I'm upgrading mine on this weekend to 6.0!

I'll let you know how it works.

Regards!

0 Kudos