ZippityZippity
Contributor
Contributor

Random PSOD

     Greetings All, I keep getting random psod once every 7-14 days, othertimes during random times like rebooting.

I know it is hardware related but I am no expert at reading the dump files.  At first I thought it was related to some PCI-E ssds that I am playing around with, they hold no important material.  I thought I had pinpointed which one and removed it but it has not resolved anything.

I have attached 3 coredump files, vmkernel.log and a screenshot of the PSOD

Any help would be appreciated.

Thanks

0 Kudos
7 Replies
dekoshal
Hot Shot
Hot Shot

Have a look at this :

An ESXi host fails with a purple diagnostic screen with the error: Recursive panic on same CPU (2036...

For more detailed review follow below KB article to extract log file from zdump located in /var/core and upload it here

Extracting the log file after an ESX or ESXi host fails with a purple screen error

If you found this or any other answer helpful, please consider the use of the Correct or Helpful to award points.

Best Regards,

Deepak Koshal

CNE|CLA|CWMA|VCP4|VCP5|CCAH

ZippityZippity
Contributor
Contributor

I did upload the dumps, they have already been converted they are the .1 .2 .3 files

0 Kudos
ZippityZippity
Contributor
Contributor

Well if I am reading the vmkernel logs and vmkernel dumps correctly it looks like its being caused by my Intel 10gbps nic if anyone can read the dumps and confirm?

0 Kudos
dekoshal
Hot Shot
Hot Shot

Are you running VM's on SSD storage?

If you found this or any other answer helpful, please consider the use of the Correct or Helpful to award points.

Best Regards,

Deepak Koshal

CNE|CLA|CWMA|VCP4|VCP5|CCAH

0 Kudos
ZippityZippity
Contributor
Contributor

Yes I am running a couple of VMs on SSD storage, is that a bad idea?

0 Kudos
dekoshal
Hot Shot
Hot Shot

I am looking at this forum and looks like they had issue with running VM on the SSD. Logs looks similar to one that you are getting.  Have a look.

https://www.developpez.net/forums/d1682564/systemes/virtualisation/vmware/sos-crashs-esxi-repetition...

If you found this or any other answer helpful, please consider the use of the Correct or Helpful to award points.

Best Regards,

Deepak Koshal

CNE|CLA|CWMA|VCP4|VCP5|CCAH

0 Kudos
Finikiez
Champion
Champion

What HW do you use?

Crash dumps can be extracted only by VMware support. Others can only read logs and text from a dump file.

Looking on the screenshot attech, it looks as a CPU issue. Can you read logs on a management controller of your server (iLO, BMC or whatever and check for any HW erros)? Also if you can, run HW tests for several hours.

0 Kudos