VMware Cloud Community
davidcrowder
Enthusiast
Enthusiast
Jump to solution

Help diagnosing purple screen

I could use some assistance diagnosing a purpose screen.  This seems to indicate an issue with the Physical CPU #30... beyond that, I'm not seeing why...

Capture1.PNG

Any help is greatly appreciated.

Thanks in advance.

Reply
0 Kudos
1 Solution

Accepted Solutions
daphnissov
Immortal
Immortal
Jump to solution

It won't say why exactly. MCEs almost always indicate physical hardware failure, and most often I see that something on the mainboard has failed. The best way to know for sure is to run hardware diagnostics from your vendor to pinpoint the issue.

View solution in original post

Reply
0 Kudos
8 Replies
daphnissov
Immortal
Immortal
Jump to solution

It won't say why exactly. MCEs almost always indicate physical hardware failure, and most often I see that something on the mainboard has failed. The best way to know for sure is to run hardware diagnostics from your vendor to pinpoint the issue.

Reply
0 Kudos
asajm
Expert
Expert
Jump to solution

Hi davidcrowder

Can you check

VMware Knowledge Base

If you think your queries have been answered
Marking this response as "Solution " or "Kudo"
ASAJM
Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello David,

System has encountered a Hardware Error - Please contact the hardware vendor

If you can reproduce the issue readily and/or under load you may be able to narrow down what component is potentially broken/failing by whether you get consistent backtrace and/or things such as specific cores always indicated (e.g. always cores from one CPU but not the other if dual-socket) then again this could indicate slot or other board failure as Chip said above or potentially even the memory bank local to that socket.

Either way switching components around would likely be the only way to deduce it further e.g. if it always follows a CPU when switched. Probably good idea to check your out-of-band management and call your hardware vendor before doing the above of course.

Bob

Reply
0 Kudos
adgate
Enthusiast
Enthusiast
Jump to solution

What is the hardware build? Have you checked its support using VMware Compatibility Guide​?

Reply
0 Kudos
serveradminist2
Contributor
Contributor
Jump to solution

which hardware you using.

Reply
0 Kudos
Arthos
Enthusiast
Enthusiast
Jump to solution

MCE point to hardware errors. Besides running diagnostics , I suspect it to be related to cpu power states as mentioned in the trace. In order to provide more details I need hardware/server information.

Please upvote ,

Thanks.

Reply
0 Kudos
davidcrowder
Enthusiast
Enthusiast
Jump to solution

Apologies for the delayed response.  The ESXi crashdumps/logs pointed to various processes & cores, but were consistent in pointing to physical CPU-socket #2.  The system management gave the hardware a clean bill of health... however, further digging in the system management logs showed assert errors in a memory module on CPU #2.  Why that didn't trigger alerts in system management, and subsequently monitoring software... ugh.  At any rate, we had the offending memory module replaced and everything is back up and running.  Thank you all for pointing me in the correct direction.

Edit:  It did not take this long to fix it -- it was fixed same day.  It took this long to update because, for whatever reason, the VMWare forum/community website would not allow me to update the post.  Thanks again, all!

Reply
0 Kudos
serveradminist2
Contributor
Contributor
Jump to solution

This is memory issue please change ASAP.

Reply
0 Kudos