VMware Cloud Community
StevenInsight
Contributor
Contributor

HP ProLiant DL380e Gen8 - ESXi 5.5.0 build 2143827 - PSoD

PSoD...intermittently occurring about once a week...random amount of reboots sometimes works...

First happened 2 weeks ago - single reboot seemed to fix it - performed ESXi upgrade to latest patch anyway just in case

Then a week ago it PSoD again - this time we got HP involved - they recommended some firmware updates and BIOS configuration changes

Did all this, which didnt seem to help - left it running over the weekend - then rebooted it 4 times and it started working again!

Today looks to be the same deal - PSoD - no real rhyme or reason

Am thinking we get HP to replace the MoBo - but only because i cant think of anything else to do...please help

Some diagnostic info (i guess?) Looking at the boot process this morning, looks like it failed between the "swapobj created successfully" and starting up the "smartdstart" service Code Start is always between : 0x418023 && 0x418037 Backtrace always starts with 0x41238a VMKernal Log states: PCPU 0 didnt have a heartbeat && NMI IPI recvd. eip(base):ebp:cs [0xbaca74(0x418037000000):0x41238a35)

Reply
0 Kudos
12 Replies
jrmunday
Commander
Commander

Hi Steven,

Looking at your PSOD, the host appears to be reacting normally (ie. as expected) to a NMI. I generally disable NMI on my ESXi hosts.

I would certainly update firmware in the first instance as this is often the problem with these types of issues.

With regards to this hardware model, there is an HP advisory out where ILO causes PSOD due to NMI;

https://h20566.www2.hp.com/hpsc/doc/public/display?sp4ts.oid=5261094&docId=emr_na-c04332584&docLocal...

The other thing I would do is make sure that you have the current HP management bundles - see http://vibsdepot.hp.com/

You can see your current versions but running "esxcli software vib list" from a SSH session.

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
Reply
0 Kudos
henriqueseprol
Contributor
Contributor

Hmm, as you're using build 2143827 I suppose that you're not using HP Custom ISO Images, because the last HP Custom ISO build from november 2014 is 2068190.

All Gen8 server require the customised ISO. This is mandatory or you'll have tons of problems, starting from NICs not detected.

You can download it here:

VMware vSphere 5: Private Cloud Computing, Server and Data Center Virtualization

Reply
0 Kudos
BenLiebowitz
Expert
Expert

Hi,

I agree with Jon above, if you didn't build with an HP customized ISO, then you'll need to rebuild with one.

If you did do so with an older one, then you'll need to update the firmware/bios/drivers.

Ben Liebowitz, VCP vExpert 2015, 2016, & 2017 If you found my post helpful, please mark it as helpful or answered to award points.
Reply
0 Kudos
LeslieBNS9
Enthusiast
Enthusiast

I also agree with others to rebuild with the HP custom ISO. We have the exact same server in our environment and received PSOD's. After rebuilding according to HP's specs as far as firmware/driver and custom ISO image we have not had any problems.

Here's the recipe we used from HP for our own servers http://vibsdepot.hp.com/hpq/recipes/June2014VMwareRecipe13.0.pdf

StevenInsight
Contributor
Contributor

Issue has re-appeared!!!

This is after performing full re-install of ESXi server using HP image

Will be contacting HP to follow up and possibly get hardware replaced

Reply
0 Kudos
jrmunday
Commander
Commander

Have you also upgrade your firmware / BIOS etc? The HP image alone is no good without the correct firmware levels.

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
Reply
0 Kudos
StevenInsight
Contributor
Contributor

have gone through and updated all firmware with HP support

I have installed the ESXi HP image

HP have now replaced hardware and the issue persists

Any information on how to disable NMI please?

Reply
0 Kudos
d3vnul
Contributor
Contributor

I had identical problem few months ago. In my case the resolution was to update the smart array raid driver to latest version.

You may read this also : New ESXi 5.5 Install threw PSOD, Raid controller driver?

Reply
0 Kudos
a63b7
Contributor
Contributor

Allready upgraded to all recent firmware with the HP SPP ISO?

It might help you out.

http://h17007.www1.hp.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx

Just create a bootable USB with this tool:

http://h20566.www2.hp.com/hpsc/swd/public/detail?swItemId=MTX_2aa85604194243afbdb1c29a34

Run it to make sure.

Since the diskdump failed seems like the server / the OS loses contact with the storage.

How is the OS installed?

On what kind of storage hardware and storage config?

Reply
0 Kudos
cykVM
Expert
Expert

Hi,

I had the same issues with random "No heartbeat" PSODs with any build of HP customized VMWare ESXi 5.5 Update 2. Long discussion since Sept. 2014 here:

HP Proliant DL380e Gen8, HP OEM VMWare ESXi 5.5 Update 2 keeps crashing (PSOD)

Unfortunately with no real resolution. I tested nearly every available driver for the storage controller with no luck. Only working thing for me was to go back to HP customized 5.5 Update 1 which since then runs fine and stable.

Looks like the hpvsa driver is somewhat incompatible with the 5.5 U2 kernel.

Reply
0 Kudos
dospavlos
Contributor
Contributor

I was able to fix this no heatbeat issue on a BL460c G7.  I was receiving the PSOD during ESXi installation; issue occurred whether I tried to run 5.5 U1 or 5.5 U2.  I tried installing ESX 4.1 U2 but the install would stall at 28% trying to load network drivers.  I tried all the BIOS changes suggested in this thread (power settings, VT-d).  Tried resetting system to default through BIOS.  Tried recreating local drive array.  All without sucess. My fix was to upgrade the firmware on the embedded FlexFabric Embedded Ethernet NICs. I was running NIC firmware version 4.x and upgraded to latest on HP's website: 10.2.340.22 using package 'OFFLINE Firmware image (.iso) for HP Emulex Converged Network Adapters and Network Adapters (American, International)'.  This upgrade dramatically reduced boot time and fixed the PSOD, no heartbeat issue for my server.

Reply
0 Kudos
Alistar
Expert
Expert

Hmm from what I can tell, each PSOD belongs to another symptom

  • Wakeups found in stack - are you sure all Power Saving functions & features are disabled in BIOS and that the server is set to operate in maximum static performance?
  • Another one seems like a RAID controller error (turn_batteries_off_if_cache_empty)
  • The common denominator in the stack seems to be OS_events_flag_set+0x25 or OS_interrupt_control - do you have any devices passed through to your VM Guest? Or maybe the OS points to the VMKernel itself rather than Guest OS?

As someone before noticed, this might very well be a RAID controller issue - what hardware exactly did HP replace? However I think that their controllers are integrated On Board. This is a pretty tough nut to crack.

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
Reply
0 Kudos