I thought I'd post this little tidbit for anyone who happens to start wandering down the rabbit hole.
We had 2 Hosts PSOD with the attached error and after much consternation discovered the root cause was bad firmware on the ILO for the Gen8 Series. HP released a recent fix. Seems that the ILO firmware 1.3x-1.50 will cause spontaneous NMIs. Ours occurred roughly 10 days after reboot.
Now Here's the added steps
After flashing the ilo, which contrary to what their tech staff will tell you does require a reboot, you will need to confirm some bios changes as well.
Recommended BIOS Changes
a. Set HP power profile to "Maximum Performance".(Power Management Options -> HP Power Profile ->Maximum Performance)
b. Set HP power regulator to "HP Static High Performance Mode".(Power Management Options -> HP Power Regulator ->HP Static High Performance Mode)
c. Set "Minimum Processor Idle Power Core State" to "No C-states".(Power Management Options ->Advanced Power Management Options->Minimum Processor Idle Power core State -> No C-States )
d. Set "Minimum Processor Idle Power Package State" to "No Package state".(Power Management Options ->Advanced Power Management Options->Minimum Processor Idle Power Package State -> No Package State )
e. Set “Memory power savings mode to "Maximum Performance".(Power Management Options ->Advanced Power Management Options-> Memory power savings mode-> Maximum Performance)
Rise of a new issue...
Now we encountered some weirdness across the board with Hardware Status and Sensor Issues. Sensors would lock in updating mode, sensor and log file updates would be unresponsive. The issue would continue after reboots. In some mose cases reboots wouldnt clear and ipmi clear wouldn't help as well. Some systems would get stuck in a loop of erroneous IPMI Log full errors and sensor update stall.
This occured within 20 minutes of doing the ILO Update. The general TShooting procedures in KBs 2011531 and 1033725 didn't cut it. It seems the ilo update creates a new descriptor and sensor ouput that the OS and VCenter can't interpet.
On some systems, I found that I could clear the error by following the below procedure which staved off the issue a little longer.
From VCenter Host View, Select Hardware Status Pane
Select System Event Log View>Reset Event Log (wait for refresh)
Select Alerts and warnings view>then Reset Sensors (wait for refresh)
SSH to host and login as root
run the following commands to clear ipmi logs and restart management services
• localcli hardware ipmi sel clear
• services.sh restart
• Wait for the hiost to come back in VCenter, Give ample time for services to stabalize
• Go to Hosts View in Hardware Status Pane and select Select System Event Log View> Choose to update
• Errors should now be clear and the sensors should have a few additional entries including "voltage"
When successful you will see the sensors change both in Number and description on the Harware Status Page.
But the long term solution is to update to the new HP ESXi Offline Bundle v 1.7-13.
From End to End the successful process was
1) Flash the ILO with firmware to 1.51
2) Clear IML
3) Clear Sensor\ Hardware status logs and pane (see above)
4) Change the BIOs performance setting to Max Performance as directed
5) Install updated HP ESXi Offline Bundle for VMware vSphere 5.5 v 1.7-13.zip
6) Remove\Re-add from VCenter if needed
Thanks for this post. We had the same exact NMI and PSOD. I update the ILO4 firmware to 1,.51 as well as the HP DL360p Gen 8 BIOS ROM to the 2.10.2014 version and all seems well. Time will tell, but your post was very informative.
Extremely useful post - thanks. Just had one of my 560 Gen8's go out this morning with a 'LINT1' PSOD before realising that whole cluster is still running ILO4 1.40 firmware. Have updated the affected host and just rebooting it now! Need to get the rest done ASAP I suppose!
82 days after last reboot and remediation- Experienced the below problem (issue in faulty ams package)
yes, it is confirmed as per this article: HP Support document - HP Support Center so I have updated the HP Blade iLO to v 2.00 with SPP 2014.09 .ISO file