I am so thankful that there is such a detailed thread on this issue, we just had our first PSOD on a Dell R740 with Intel Corporation I350 Gigabit Network cards. While waiting for VMware support to call me back I started digging around and landed here.
We are running VMware ESXi, 6.7.0, 10764712 which is Update 1 so I got a long way to go to get to latest 6.7 and a bunch of firmware updates, but this looks promising that latest ESXI build with latest I350 firmware should take care of this. With such a big jump (Update 1 to Update 3 and bunch of firmware) hopefully I don't run into new bugs.
The insane part of all this is, our host has been running stable for 242 days and then this PSOD with no change to the infrastructure recently.
- Firmware Version: 1.67.0:0x80000d38:18.3.6
- Version: 1.4.7
PSOD posted for reference, looks pretty identical to the OP screenshot.
From what I have gathered on this thread, updating to latest ESX does not fix the issue and instead we need to apply latest 1.4.10 igbn driver version.
Based on the change log of the driver it specifically states..
- Fixed intermittent TX hang due to race condition between start and stop of TX queue.
- Fixed duplicate nic reset due to race condition between uplink reset and watchdog threads.
That sounds assuring.
I suspect we need to update the driver within VMware and also the firmware on the network cards to latest version for interop compatibility?
Still working fine with no issues.