HP ProLiant DL360e Gen8
"PCPU Locked Up" Occurred or That "No Heartbeat" Was Observed
Already disabled "Collaborative Power Control" in BIOS
Latest version installed ( Build 1198252 )
Latest BIOS, SAS driver.
I opened a support ticket with the HP VMware support team using my support contract with HP Part Number UQ629E - < US $ 300 for 1 year
The solution is to upgrade to ESXi 5.0 Update 2 or later (as PCC is disabled by default).
I did a force upgrade using the HP customized ESXi 5.5 ISO mounted via ILO.
Below is the reply from HP VMware support
Greetings,
I checked the logs for the PSOD on PCPU locked up failed to ack TLB
The Failed to ack TLB Invalidate may be caused by either a hardware or a software issue.
Description - Physical CPUs fail when trying to clear memory page tables
Some HP servers experience a situation where the PCC (Processor Clocking Control or Collaborative Power Control) communication between the VMware ESXi kernel (VMkernel) and the server BIOS does not function correctly.
SCOPE
HP ProLiant G5, G6, G7, or Gen8-series server that includes PCC (Processor Clocking Control or Collaborative Power Control)
As a result, one or more PCPUs may remain in SMM (System Management Mode) for many seconds. When the VMkernel notices a PCPU is not available for an extended period of time, a purple diagnostic screen occurs.
This issue has been resolved as of ESXi 5.0 Update 2 as PCC is disabled by default
To work around this issue in versions prior to ESXi 5.0 U2, disable PCC manually.
To disable PCC:
Connect to the ESXi host using the vSphere Client.
Click the Configuration tab.
In the Software menu, click Advanced Settings.
Select vmkernel.
Deselect the vmkernel.boot.usePCC option.
Restart the host for the change to take effect.
Kindly let us know
Regards,
Harish Kumar G
Technical Solutions Engineer - VMware
Multivendor Software Support Solution Center
Hewlett-Packard
Working Hours: 11:30 AM to 8:30 PM EST Thursday - Monday
Scheduled Absence:
Contact the following numbers based on your region:
For APJ - Australia – 13 11 47 | India - 1800-425-8080 | Malaysia – | New Zealand - 0800 66 47 47 | Singapore – +65 627 25300
For EMEA - UK & I - +44 179 37 93020 | UAE - 800 4910 | Saudi Arabia - 800 897 1444
For AMS – US & Canada – 1800 633 3600 In IVR Say "Software" and then "VMware" and enter the Service Agreement ID (SAID)
PSOD screenshot is not giving much information as well no disk dump has generated. Stack also showing unknown.
If you get some details from logs, would be easy to figure out the cause..
@marqcd Do you have beacon probing enabled on your VSS or VDS?
I did not have beacon probing enable. Should I enable it?
I do have 2 out of the 4 network cards connected to 2 different network switches for redundancy.
The error occurs and is not leaving a coredump file on the diagnostic partition. I already confirmed that the diagnostic partition is configured as noted on
Update:
I'm installing a patch release just 5 days ago to see if it resolves my issue
VMware ESXi, Patch Release ESXi410-201312001
After installing the new update 201312001, got new error
"Recursive panic on same CPU"
"DF Exception 8 IP"
I opened a support ticket with the HP VMware support team using my support contract with HP Part Number UQ629E - < US $ 300 for 1 year
The solution is to upgrade to ESXi 5.0 Update 2 or later (as PCC is disabled by default).
I did a force upgrade using the HP customized ESXi 5.5 ISO mounted via ILO.
Below is the reply from HP VMware support
Greetings,
I checked the logs for the PSOD on PCPU locked up failed to ack TLB
The Failed to ack TLB Invalidate may be caused by either a hardware or a software issue.
Description - Physical CPUs fail when trying to clear memory page tables
Some HP servers experience a situation where the PCC (Processor Clocking Control or Collaborative Power Control) communication between the VMware ESXi kernel (VMkernel) and the server BIOS does not function correctly.
SCOPE
HP ProLiant G5, G6, G7, or Gen8-series server that includes PCC (Processor Clocking Control or Collaborative Power Control)
As a result, one or more PCPUs may remain in SMM (System Management Mode) for many seconds. When the VMkernel notices a PCPU is not available for an extended period of time, a purple diagnostic screen occurs.
This issue has been resolved as of ESXi 5.0 Update 2 as PCC is disabled by default
To work around this issue in versions prior to ESXi 5.0 U2, disable PCC manually.
To disable PCC:
Connect to the ESXi host using the vSphere Client.
Click the Configuration tab.
In the Software menu, click Advanced Settings.
Select vmkernel.
Deselect the vmkernel.boot.usePCC option.
Restart the host for the change to take effect.
Kindly let us know
Regards,
Harish Kumar G
Technical Solutions Engineer - VMware
Multivendor Software Support Solution Center
Hewlett-Packard
Working Hours: 11:30 AM to 8:30 PM EST Thursday - Monday
Scheduled Absence:
Contact the following numbers based on your region:
For APJ - Australia – 13 11 47 | India - 1800-425-8080 | Malaysia – | New Zealand - 0800 66 47 47 | Singapore – +65 627 25300
For EMEA - UK & I - +44 179 37 93020 | UAE - 800 4910 | Saudi Arabia - 800 897 1444
For AMS – US & Canada – 1800 633 3600 In IVR Say "Software" and then "VMware" and enter the Service Agreement ID (SAID)