PSoD...intermittently occurring about once a week...random amount of reboots sometimes works...
First happened 2 weeks ago - single reboot seemed to fix it - performed ESXi upgrade to latest patch anyway just in case
Then a week ago it PSoD again - this time we got HP involved - they recommended some firmware updates and BIOS configuration changes
Did all this, which didnt seem to help - left it running over the weekend - then rebooted it 4 times and it started working again!
Today looks to be the same deal - PSoD - no real rhyme or reason
Am thinking we get HP to replace the MoBo - but only because i cant think of anything else to do...please help
Some diagnostic info (i guess?) Looking at the boot process this morning, looks like it failed between the "swapobj created successfully" and starting up the "smartdstart" service Code Start is always between : 0x418023 && 0x418037 Backtrace always starts with 0x41238a VMKernal Log states: PCPU 0 didnt have a heartbeat && NMI IPI recvd. eip(base):ebp:cs [0xbaca74(0x418037000000):0x41238a35)
Hi Steven,
Looking at your PSOD, the host appears to be reacting normally (ie. as expected) to a NMI. I generally disable NMI on my ESXi hosts.
I would certainly update firmware in the first instance as this is often the problem with these types of issues.
With regards to this hardware model, there is an HP advisory out where ILO causes PSOD due to NMI;
The other thing I would do is make sure that you have the current HP management bundles - see http://vibsdepot.hp.com/
You can see your current versions but running "esxcli software vib list" from a SSH session.
Cheers,
Jon
Hmm, as you're using build 2143827 I suppose that you're not using HP Custom ISO Images, because the last HP Custom ISO build from november 2014 is 2068190.
All Gen8 server require the customised ISO. This is mandatory or you'll have tons of problems, starting from NICs not detected.
You can download it here:
VMware vSphere 5: Private Cloud Computing, Server and Data Center Virtualization
Hi,
I agree with Jon above, if you didn't build with an HP customized ISO, then you'll need to rebuild with one.
If you did do so with an older one, then you'll need to update the firmware/bios/drivers.
I also agree with others to rebuild with the HP custom ISO. We have the exact same server in our environment and received PSOD's. After rebuilding according to HP's specs as far as firmware/driver and custom ISO image we have not had any problems.
Here's the recipe we used from HP for our own servers http://vibsdepot.hp.com/hpq/recipes/June2014VMwareRecipe13.0.pdf
Have you also upgrade your firmware / BIOS etc? The HP image alone is no good without the correct firmware levels.
have gone through and updated all firmware with HP support
I have installed the ESXi HP image
HP have now replaced hardware and the issue persists
Any information on how to disable NMI please?
I had identical problem few months ago. In my case the resolution was to update the smart array raid driver to latest version.
You may read this also : New ESXi 5.5 Install threw PSOD, Raid controller driver?
Allready upgraded to all recent firmware with the HP SPP ISO?
It might help you out.
http://h17007.www1.hp.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx
Just create a bootable USB with this tool:
http://h20566.www2.hp.com/hpsc/swd/public/detail?swItemId=MTX_2aa85604194243afbdb1c29a34
Run it to make sure.
Since the diskdump failed seems like the server / the OS loses contact with the storage.
How is the OS installed?
On what kind of storage hardware and storage config?
Hi,
I had the same issues with random "No heartbeat" PSODs with any build of HP customized VMWare ESXi 5.5 Update 2. Long discussion since Sept. 2014 here:
HP Proliant DL380e Gen8, HP OEM VMWare ESXi 5.5 Update 2 keeps crashing (PSOD)
Unfortunately with no real resolution. I tested nearly every available driver for the storage controller with no luck. Only working thing for me was to go back to HP customized 5.5 Update 1 which since then runs fine and stable.
Looks like the hpvsa driver is somewhat incompatible with the 5.5 U2 kernel.
I was able to fix this no heatbeat issue on a BL460c G7. I was receiving the PSOD during ESXi installation; issue occurred whether I tried to run 5.5 U1 or 5.5 U2. I tried installing ESX 4.1 U2 but the install would stall at 28% trying to load network drivers. I tried all the BIOS changes suggested in this thread (power settings, VT-d). Tried resetting system to default through BIOS. Tried recreating local drive array. All without sucess. My fix was to upgrade the firmware on the embedded FlexFabric Embedded Ethernet NICs. I was running NIC firmware version 4.x and upgraded to latest on HP's website: 10.2.340.22 using package 'OFFLINE Firmware image (.iso) for HP Emulex Converged Network Adapters and Network Adapters (American, International)'. This upgrade dramatically reduced boot time and fixed the PSOD, no heartbeat issue for my server.
Hmm from what I can tell, each PSOD belongs to another symptom
As someone before noticed, this might very well be a RAID controller issue - what hardware exactly did HP replace? However I think that their controllers are integrated On Board. This is a pretty tough nut to crack.