VMware Cloud Community
max70
Contributor
Contributor

PSOD: PCPU Locked up failed to ack tlb invalidate

Hi everybody,

I have a production esxi server (3i, 3.5.0,110271) on a ibm eServer x3400 with 10 virtual machines on it. The server stopped with a PSOD with a message saying:

"PCPU 3 locked up failed to ack tlb invalidate" Panic from another cpu world 2955, machine check exception

This happened twice in the last 2 weeks (!!). After rebooting the server the virtual machines started with no problems, i have no evidence of hardware problems in server hardware logs.

Thanks in advance

Tags (2)
Reply
0 Kudos
25 Replies
thingy
Enthusiast
Enthusiast

Hi

As anyone else had a re-occurrence of this issue?

We're fully patched up and yet, once a month, we get this same PSOD.

SMAR78700, can you let us know what patch VMware support suggested?

regards,

Jinesh

Reply
0 Kudos
SMAR78700
Contributor
Contributor

Hello

VMware support recommend me to upgrade to VI3.5 U4 and after this updgrade, patch again with update manager

ESX350-200904403-BG - PATCH

This fix my problem with this PSOD.

regards,

Stephane

ps : sorry for my poor english

Reply
0 Kudos
thingy
Enthusiast
Enthusiast

Many thanks for the quick response.

Reply
0 Kudos
salmonj
Contributor
Contributor

maokaman,

Have you been able to find a solution to this issue? We're on pretty similar hardware (we have 9690sa, that's the only difference) and face the same issue under i/o load. Many thanks in advance!

PZh

Reply
0 Kudos
dmhamel21
Contributor
Contributor

We are having the same issue on our 2 new servers.  We migrated to 6.0 U2 and installed an HA cluster.  Also running Vsan with 100 hosts., 4 - 1.98 TB SSD Drives per server.  I have an open ticket with SuperMicro and Vmware support.  We updated to the latest BIOS per support but still crashing.  About every 4 days 1 of the 2 new servers goes down. 

The only variable is the original SuperMicro Server SVDI has different Revision of Processors V2 V.S. V4.  Just wondering if anyone out there had this issue resolved and what it took to fix?

See attached images for configuration and crash info....

The HA is configured with 3 hosts, DRS is ON, HA is ON, EVC is Intel IVY Bridge Gen.  Total CPU Resources 155 Ghz.

Total Memory: 1023.75 GB , Total Storage 15.56 TB, Total Processors 76, 0 datastore clusters, 1 VSAN Datastore.  104 Virtual Machines, Includes View, and Vcenter. View is windows server based, and Vcenter is linux.  Fully Automated Migration, with Host monitoring.  I have the Scratch logs going to 1 server for now.

Thanks For any input in advance.

Reply
0 Kudos
dmhamel21
Contributor
Contributor

Hello all the problem was resolved!!  It took SuperMicro to create a new bios for both of the 2 new servers to resolve the issue.  The new intel processors were having a problem with VMware multiple versions. 

Manufacturer: "Supermicro"

    Product: "X10DRG-HT"

"New Bios" X10DRGH7.411

Manufacturer: "Supermicro"

    Product: "SYS-1028GR-TRT"

Processor Info: #128

    Payload length: 0x2a

    Socket: "CPU 1" & "CPU 2"

    Socket Type: 0x2b (Socket LGA2011-3)

    Socket Status: Populated

    Type: 0x03 (CPU)

    Family: 0xb3 (Xeon)

    Manufacturer: "Intel"

    Version: "Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz"

    Processor ID: 0xbfebfbff000406f1

    Status: 0x01 (Enabled)

    External Clock: 100 MHz

    Max. Speed: 3600 MHz

    Current Speed: 2000 MHz

Reply
0 Kudos