VMware Cloud Community
marqcd
Contributor
Contributor
Jump to solution

HP ProLiant DL360e Gen8 Purple Screen of Death (PSOD) "PCPU Locked Up" Occurred or That "No Heartbeat" Was Observed

HP ProLiant DL360e Gen8

"PCPU Locked Up" Occurred or That "No Heartbeat" Was Observed

Already disabled "Collaborative Power Control" in BIOS

http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/topIssuesDisplay/?sp4ts.oid=5249...

Latest version installed ( Build 1198252 )

Latest BIOS, SAS driver.

vmware ESXi error - yellow screen6.jpg

Reply
0 Kudos
1 Solution

Accepted Solutions
marqcd
Contributor
Contributor
Jump to solution

I opened a support ticket with the HP VMware support team using my support contract with HP Part Number UQ629E - < US $ 300 for 1 year

The solution is to upgrade to ESXi 5.0 Update 2  or later (as PCC is disabled by default).

I did a force upgrade using the HP customized ESXi 5.5 ISO mounted via ILO.

Below is the reply from HP VMware support

Greetings,

I checked the logs for the PSOD on PCPU  locked up failed to ack TLB

The Failed to ack TLB Invalidate may be caused by either a hardware or a software issue.

Description - Physical CPUs fail when trying to clear memory page tables

Some HP servers experience a situation where the PCC (Processor Clocking Control or Collaborative Power Control) communication between the VMware ESXi kernel (VMkernel) and the server BIOS does not function correctly.

SCOPE

HP ProLiant G5, G6, G7, or Gen8-series server that includes PCC (Processor Clocking Control or Collaborative Power Control)

As a result, one or more PCPUs may remain in SMM (System Management Mode) for many seconds. When the VMkernel notices a PCPU is not available for an extended period of time, a purple diagnostic screen occurs.

This issue has been resolved as of ESXi 5.0 Update 2 as PCC is disabled by default

To work around this issue in versions prior to ESXi 5.0 U2, disable PCC manually.

To disable PCC:

    Connect to the ESXi host using the vSphere Client.

    Click the Configuration tab.

    In the Software menu, click Advanced Settings.

    Select vmkernel.

    Deselect the vmkernel.boot.usePCC option.

    Restart the host for the change to take effect.

Kindly let us know

Regards,

Harish Kumar G

Technical Solutions Engineer - VMware

Multivendor Software Support Solution Center

Hewlett-Packard

Working Hours: 11:30 AM to 8:30 PM EST Thursday - Monday

Scheduled Absence:

Contact the following numbers based on your region:

For APJ - Australia – 13 11 47 | India - 1800-425-8080 | Malaysia – | New Zealand - 0800 66 47 47 | Singapore – +65 627 25300

For EMEA - UK & I - +44 179 37 93020 | UAE - 800 4910 | Saudi Arabia - 800 897 1444

For AMS – US & Canada  – 1800 633 3600 In IVR Say "Software" and then "VMware" and enter the Service Agreement ID (SAID)

View solution in original post

Reply
0 Kudos
7 Replies
john23
Commander
Commander
Jump to solution

PSOD screenshot is not giving much information as well no disk dump has generated. Stack also showing unknown.

If you get some details from logs, would be easy to figure out the cause..

Thanks -A Read my blogs: www.openwriteup.com
Reply
0 Kudos
mrlesmithjr
Enthusiast
Enthusiast
Jump to solution

@marqcd Do you have beacon probing enabled on your VSS or VDS?

everythingshouldbevirtual.com @mrlesmithjr
Reply
0 Kudos
marqcd
Contributor
Contributor
Jump to solution

I did not have beacon probing enable. Should I enable it?

I do have 2 out of the 4 network cards connected to 2 different network switches for redundancy.

Reply
0 Kudos
marqcd
Contributor
Contributor
Jump to solution

The error occurs and is not leaving a coredump file on the diagnostic partition. I already confirmed that the diagnostic partition is configured as noted on

VMware KB: Configuring an ESX/ESXi 3.0-4.1 host to capture a VMkernel coredump from a purple diagnos...

Reply
0 Kudos
marqcd
Contributor
Contributor
Jump to solution

Update:

I'm installing a patch release just 5 days ago to see if it resolves my issue

VMware ESXi, Patch Release ESXi410-201312001

VMware KB: VMware ESXi, Patch Release ESXi410-201312001

Reply
0 Kudos
marqcd
Contributor
Contributor
Jump to solution

After installing the new update 201312001, got new error

"Recursive panic on same CPU"

"DF Exception 8 IP"

vmware ESXi error - yellow screen8-2013-12-11-0700.jpg

Reply
0 Kudos
marqcd
Contributor
Contributor
Jump to solution

I opened a support ticket with the HP VMware support team using my support contract with HP Part Number UQ629E - < US $ 300 for 1 year

The solution is to upgrade to ESXi 5.0 Update 2  or later (as PCC is disabled by default).

I did a force upgrade using the HP customized ESXi 5.5 ISO mounted via ILO.

Below is the reply from HP VMware support

Greetings,

I checked the logs for the PSOD on PCPU  locked up failed to ack TLB

The Failed to ack TLB Invalidate may be caused by either a hardware or a software issue.

Description - Physical CPUs fail when trying to clear memory page tables

Some HP servers experience a situation where the PCC (Processor Clocking Control or Collaborative Power Control) communication between the VMware ESXi kernel (VMkernel) and the server BIOS does not function correctly.

SCOPE

HP ProLiant G5, G6, G7, or Gen8-series server that includes PCC (Processor Clocking Control or Collaborative Power Control)

As a result, one or more PCPUs may remain in SMM (System Management Mode) for many seconds. When the VMkernel notices a PCPU is not available for an extended period of time, a purple diagnostic screen occurs.

This issue has been resolved as of ESXi 5.0 Update 2 as PCC is disabled by default

To work around this issue in versions prior to ESXi 5.0 U2, disable PCC manually.

To disable PCC:

    Connect to the ESXi host using the vSphere Client.

    Click the Configuration tab.

    In the Software menu, click Advanced Settings.

    Select vmkernel.

    Deselect the vmkernel.boot.usePCC option.

    Restart the host for the change to take effect.

Kindly let us know

Regards,

Harish Kumar G

Technical Solutions Engineer - VMware

Multivendor Software Support Solution Center

Hewlett-Packard

Working Hours: 11:30 AM to 8:30 PM EST Thursday - Monday

Scheduled Absence:

Contact the following numbers based on your region:

For APJ - Australia – 13 11 47 | India - 1800-425-8080 | Malaysia – | New Zealand - 0800 66 47 47 | Singapore – +65 627 25300

For EMEA - UK & I - +44 179 37 93020 | UAE - 800 4910 | Saudi Arabia - 800 897 1444

For AMS – US & Canada  – 1800 633 3600 In IVR Say "Software" and then "VMware" and enter the Service Agreement ID (SAID)

Reply
0 Kudos