Hey guys,
We've encountered some issues with the igbn driver (version 1.4.7) on Dell hosts running ESXi 6.7 Update1. This has been raised with the vendors and Dell and VMware have both come back advising that there might a known issue in this driver. One our customers hosts running this driver encountered a PSOD.
Dell infact has reached out to their vendors (Intel) and they have confirmed that this impacting the Intel i350 cards. Both Vmware and Dell have advised to disable the igbn drivers and use the native igbo drivers.
Has anyone else encountered similar issues wit this driver? If yes then has anyone disabled it to use the igbo drivers instead? Is it a stable workaround?
Looking forward to some feedback on those who have switched over to the igb drivers.
Cheers, Onil
We raised a ticket and it was confirmed to us that 1.4.10 should resolve the problem and that a KB article is following in this regard at some point.
So far we haven't seen this problem again after applying 1.4.10; uptime is now 38 days.
That's awesome mweissen,
Are there any entries in the events for the host indicating any kind of disconnects or port group recovery events?
6.7U2 here and after applying 1.4.10 11 days ago I also have not seen any further crashes. My max uptime on 1.4.7 was 31 days, but seeing others report success gives me hope the driver is the fix.
Hello oneilv!
At the moment my Events are flooded mostly with "Sensor -1 type" events (VMware Knowledge Base)
I haven't seen any disconnects or recovery events, neither on the ESX nor on the switch side (Cisco Event Log).
Still running fine with no reported or observed issues.
That's good to know mweissen. Hope things are still stable
Thanks for reporting backJasonEde
So far we are getting positive feedback on the new driver from users which is a great result.
Is there going to be an official KB article on this soon?
Thanks,
I can confirm that everything has been rock-solid for 60+ days now.
Hi JasonEde
Sorry I am not sure if and when VMware will be publishing an official KB article on this issue since nothing has come out so far (at least I haven't seen anything yet).
Excellent result mweissen
I am so thankful that there is such a detailed thread on this issue, we just had our first PSOD on a Dell R740 with Intel Corporation I350 Gigabit Network cards. While waiting for VMware support to call me back I started digging around and landed here.
We are running VMware ESXi, 6.7.0, 10764712 which is Update 1 so I got a long way to go to get to latest 6.7 and a bunch of firmware updates, but this looks promising that latest ESXI build with latest I350 firmware should take care of this. With such a big jump (Update 1 to Update 3 and bunch of firmware) hopefully I don't run into new bugs.
The insane part of all this is, our host has been running stable for 242 days and then this PSOD with no change to the infrastructure recently.
Driver Info:
Driver: igbn
PSOD posted for reference, looks pretty identical to the OP screenshot.
From what I have gathered on this thread, updating to latest ESX does not fix the issue and instead we need to apply latest 1.4.10 igbn driver version.
Based on the change log of the driver it specifically states..
- Fixed intermittent TX hang due to race condition between start and stop of TX queue.
- Fixed duplicate nic reset due to race condition between uplink reset and watchdog threads.
That sounds assuring.
I suspect we need to update the driver within VMware and also the firmware on the network cards to latest version for interop compatibility?
Hi guys,
I had the same issue. This should be fixed in the 1.4.10 driver.
Please see the KB article from VMware: VMware Knowledge Base
Thanks for sharing the link to the KB vkernelblog
Thanks exxoid, please note the official KB from VMware about this issue
Hi JasonEde
Please find the KB article from VMware reg this Intel driver issue below
Cheers
Onil Varghese
Cheers,
Still working fine with no issues.