Hey there @em2397a,
Seems there was an issue with the vmnic1 as per the exception. Could you please connect to the ESXi over SSH and run the following:
Copy both output so I can check what current driver are you using and what version is installed as that could be the issue.
Also, please provide the exact ESXi build you are running.
Thanks for that, forgot to ask for this:
vmkchdev -l | grep -i vmnic
Copy that small output here.
[root@ESXi-01:~] vmkchdev -l | grep -i vmnic
0000:03:04.0 14e4:1678 103c:703e vmkernel vmnic0
0000:03:04.1 14e4:1678 103c:703e vmkernel vmnic1
0000:14:00.0 15b3:6750 15b3:0021 vmkernel vmnic2
[root@ESXi-01:~]
just (after 4 hours of work) another PSOD appeared, started downloading a large file in the browser on the virtual machine "PC01-Win10Pro"
Interesting, now I see something different related to the VM itself on the PCPU0 that had the process allocated at that moment and a hardware alert on the very top: LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed. This may be a hardware problem; please contact your hardware vendor.
Do you have any alert from the hardware administrative console? (iDRAC, ILO, etc)
Also, quick one, could you please run: tail -n 1000 /var/log/vmkernel.log
Copy that into a file and attach it
Went to extreme measures, migrated all virtual machines to another host and completely reinstalled esxi, installed all updates, migrated machines back, 48 hours without PSOD, thanks for the help
As always start with iLO/iDRAC updates, after that BIOS + firmware/drivers of network cards and update ESXi to the latest version.
Should be Ok.
Thanks for the advice, at the moment all the latest firmware and updates are installed.
The problem remained unresolved, PSODs were repeated, but we managed to figure out the pattern:
I have a "NAS" virtual machine with TrueNAS that originally had an HBA LSI controller connected to it via Passthrough, a few weeks ago I added a second exactly the same LSI controller and PSODs appear when you load these LSI controllers at the same time. I checked it 3 times: I start copying and after 3-5 minutes PSOD, each time is different. At the moment, I ordered a SAS expander to remove one of these LSI controllers and connect all the disks to one controller, since only one controller was installed before and PSODs never happened (6-7 months).
No other ideas yet...
