VMware Cloud Community
em2397a
Contributor
Contributor

PSOD on ESXi 6.7

Good afternoon, the following PSOD began to appear:

Tell me, please, what could be the reason for this? How to decrypt such PSOD? 

Hypervisor:VMware ESXi, 6.7.0, 19195723

Model:ProLiant ML350 G6

Processor Type:Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

 

Reply
0 Kudos
10 Replies
Lalegre
Virtuoso
Virtuoso

Hey there @em2397a,

Seems there was an issue with the vmnic1 as per the exception. Could you please connect to the ESXi over SSH and run the following:

  • esxcfg-nics -l
  • esxcli software vib list

Copy both output so I can check what current driver are you using and what version is installed as that could be the issue.

Also, please provide the exact ESXi build you are running.

Reply
0 Kudos
em2397a
Contributor
Contributor

thanks for your answer, here are the results of the commands:

 

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

@em2397a,

Thanks for that, forgot to ask for this:

vmkchdev -l | grep -i vmnic

 

Copy that small output here.

Reply
0 Kudos
em2397a
Contributor
Contributor

[root@ESXi-01:~] vmkchdev -l | grep -i vmnic
0000:03:04.0 14e4:1678 103c:703e vmkernel vmnic0
0000:03:04.1 14e4:1678 103c:703e vmkernel vmnic1
0000:14:00.0 15b3:6750 15b3:0021 vmkernel vmnic2
[root@ESXi-01:~]

Reply
0 Kudos
em2397a
Contributor
Contributor

just (after 4 hours of work) another PSOD appeared, started downloading a large file in the browser on the virtual machine "PC01-Win10Pro"

em2397a_0-1677166706114.png

 

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

Interesting, now I see something different related to the VM itself on the PCPU0 that had the process allocated at that moment and a hardware alert on the very top: LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed. This may be a hardware problem; please contact your hardware vendor.

Do you have any alert from the hardware administrative console? (iDRAC, ILO, etc)

Also, quick one, could you please run: tail -n 1000 /var/log/vmkernel.log 

Copy that into a file and attach it

em2397a
Contributor
Contributor

Went to extreme measures, migrated all virtual machines to another host and completely reinstalled esxi, installed all updates, migrated machines back, 48 hours without PSOD, thanks for the help

maksym007
Expert
Expert

As always start with iLO/iDRAC updates, after that BIOS + firmware/drivers of network cards and update ESXi to the latest version. 

Should be Ok.

Reply
0 Kudos
em2397a
Contributor
Contributor

Thanks for the advice, at the moment all the latest firmware and updates are installed.
The problem remained unresolved, PSODs were repeated, but we managed to figure out the pattern:
I have a "NAS" virtual machine with TrueNAS that originally had an HBA LSI controller connected to it via Passthrough, a few weeks ago I added a second exactly the same LSI controller and PSODs appear when you load these LSI controllers at the same time. I checked it 3 times: I start copying and after 3-5 minutes PSOD, each time is different. At the moment, I ordered a SAS expander to remove one of these LSI controllers and connect all the disks to one controller, since only one controller was installed before and PSODs never happened (6-7 months).

Reply
0 Kudos
em2397a
Contributor
Contributor

No other ideas yet...

Reply
0 Kudos