VMware Cloud Community
wglman
Contributor
Contributor

vmware esxi 6.5.0 Update 2 PSOD (purple screen)

Hello.

Please help me find the cause of the error.

screen.jpg

0 Kudos
10 Replies
MikeStoica
Expert
Expert

Check if you have any hardware issues: CPU or RAM.

0 Kudos
wglman
Contributor
Contributor

This is the first thing I checked. No problem.

0 Kudos
wglman
Contributor
Contributor

There are messages in the logs:

2019-05-24T15:35:32.485Z cpu21:68626)@BlueScreen: Machine Check Exception: Fatal MCE on PCPU21 in world 68626:vmm6:Tenef

System has encountered a Hardware Error - Please contact the hardware vendor

2019-05-24T15:35:32.485Z cpu21:68626)Code start: 0x418021000000 VMK uptime: 0:01:24:12.304

2019-05-24T15:35:32.486Z cpu21:68626)0x43912091be50:[0x41802112ec1b]IDT_VMMForwardMCE@vmkernel#nover+0x2b stack: 0x0

2019-05-24T15:35:32.486Z cpu21:68626)0x43912091bf80:[0x418021119987]VMMVMKCall_Call@vmkernel#nover+0x157 stack: 0x43912091bfec

2019-05-24T15:35:32.486Z cpu21:68626)0x43912091bfe0:[0x41802114b8a2]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x0

2019-05-24T15:35:32.489Z cpu21:68626)base fs=0x0 gs=0x418045400000 Kgs=0x0

2019-05-24T15:35:32.430Z cpu21:68626)MCA: 201: UC Excp G5 B0 Sb2000000000a0005 A4180210fdfe4 Mef Internal Parity Error.

0 Kudos
MikeStoica
Expert
Expert

Usually these comes from either CPU or memory errors. I see that you also use a GA version of ESXi. YOu should upgrade it.

0 Kudos
pragg12
Hot Shot
Hot Shot

Hi,

Share the underlying hardware specs along with manufacturer.

Consider marking this response as "Correct" or "Helpful" if you think my response helped you in any way.
0 Kudos
wglman
Contributor
Contributor

Hello.

Updated vmvare esxi to 6.5.0 Update 2 (Build 13635690).
After this week there were no failures. Today again there was a failure.
Equipment:
Intel (R) Xeon (R) CPU E5-2640 v4 @ 2.40GHz x2
128GB
2Tb SSD

I give a link to the logs

Log_vmware / Облако Mail.Ru

0 Kudos
wglman
Contributor
Contributor

The problem is still there.
Perhaps I have such a case:
https://kb.vmware.com/s/article/2146388
Microcode updated. BIOS latest version.

Server:

Supermicro SYS-6018R-MT

0 Kudos
wglman
Contributor
Contributor

Two processors were installed.
After removing one processor, the crashes stopped.

Are there any ideas why the two are failing?

0 Kudos
daphnissov
Immortal
Immortal

Well, if you removed a single physical CPU and the crashes stopped, logic would dictate (which would be supported by the fact that your PSOD names an MCE as the cause, which is almost always hardware failure related) you have a faulty physical CPU that should be replaced.

0 Kudos
Neutro
Enthusiast
Enthusiast

This is a known problem solved in latest june imageprofile.

VMware ESXi 6.7, Patch Release ESXi670-201906002

This patch updates the following issues:

  • In very rare cases and for environments with more than 96 CPUs and high workload, an ESXi host might fail with a purple diagnostic screen while migrating a virtual machine from one NUMA node to another.
0 Kudos