VMware Cloud Community
oneilv
Enthusiast
Enthusiast

igbn driver issue on ESXi 6.7 Update1

Hey guys,

We've encountered some issues with the igbn driver (version 1.4.7) on Dell hosts running ESXi 6.7 Update1. This has been raised with the vendors and Dell and VMware have both come back advising that there might a known issue in this driver. One our customers hosts running this driver encountered a PSOD.

Dell infact has reached out to their vendors (Intel) and they have confirmed that this impacting the Intel i350 cards. Both Vmware and Dell have advised to disable the igbn drivers and use the native igbo drivers.

Has anyone else encountered similar issues wit this driver? If yes then has anyone disabled it to use the igbo drivers instead? Is it a stable workaround?

Looking forward to some feedback on those who have switched over to the igb drivers.

Cheers, Onil

37 Replies
JasonEde
Contributor
Contributor

We raised a ticket and it was confirmed to us that 1.4.10 should resolve the problem and that a KB article is following in this regard at some point.

mweissen
Contributor
Contributor

So far we haven't seen this problem again after applying 1.4.10; uptime is now 38 days.

oneilv
Enthusiast
Enthusiast

That's awesome mweissen​,

Are there any entries in the events for the host indicating any kind of disconnects or port group recovery events?

Reply
0 Kudos
cruz878
Contributor
Contributor

6.7U2 here and after applying 1.4.10 11 days ago I also have not seen any further crashes. My max uptime on 1.4.7 was 31 days, but seeing others report success gives me hope the driver is the fix.

Reply
0 Kudos
mweissen
Contributor
Contributor

Hello

At the moment my Events are flooded mostly with "Sensor -1 type" events (VMware Knowledge Base)

I haven't seen any disconnects or recovery events, neither on the ESX nor on the switch side (Cisco Event Log).

Reply
0 Kudos
JasonEde
Contributor
Contributor

Still running fine with no reported or observed issues.

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

That's good to know mweissen​. Hope things are still stable

oneilv
Enthusiast
Enthusiast

Thanks for reporting backJasonEde

So far we are getting positive feedback on the new driver from users which is a great result.

JasonEde
Contributor
Contributor

Is there going to be an official KB article on this soon?

Reply
0 Kudos
mweissen
Contributor
Contributor

Thanks,

I can confirm that everything has been rock-solid for 60+ days now.

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

Hi JasonEde

Sorry I am not sure if and when VMware will be publishing an official KB article on this issue since nothing has come out so far (at least I haven't seen anything yet).

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

Excellent result mweissen

Reply
0 Kudos
exxoid
Contributor
Contributor

I am so thankful that there is such a detailed thread on this issue, we just had our first PSOD on a Dell R740 with Intel Corporation I350 Gigabit Network cards. While waiting for VMware support to call me back I started digging around and landed here.

We are running VMware ESXi, 6.7.0, 10764712 which is Update 1 so I got a long way to go to get to latest 6.7 and a bunch of firmware updates, but this looks promising that latest ESXI build with latest I350 firmware should take care of this. With such a big jump (Update 1 to Update 3 and bunch of firmware) hopefully I don't run into new bugs.

The insane part of all this is, our host has been running stable for 242 days and then this PSOD with no change to the infrastructure recently.

Driver Info:

Driver: igbn

  • Firmware Version: 1.67.0:0x80000d38:18.3.6
  • Version: 1.4.7

PSOD posted for reference, looks pretty identical to the OP screenshot.

error.png

From what I have gathered on this thread, updating to latest ESX does not fix the issue and instead we need to apply latest 1.4.10 igbn driver version.

Based on the change log of the driver it specifically states..

- Fixed intermittent TX hang due to race condition between start and stop of TX queue.

- Fixed duplicate nic reset due to race condition between uplink reset and watchdog threads.

That sounds assuring.

I suspect we need to update the driver within VMware and also the firmware on the network cards to latest version for interop compatibility?

Reply
0 Kudos
vkernelblog
Contributor
Contributor

Hi guys,

I had the same issue. This should be fixed in the 1.4.10 driver.

Please see the KB article from VMware: VMware Knowledge Base

oneilv
Enthusiast
Enthusiast

Thanks for sharing the link to the KB vkernelblog

oneilv
Enthusiast
Enthusiast

Thanks exxoid​, please note the official KB from VMware about this issue

VMware Knowledge Base

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

Hi JasonEde

Please find the KB article from VMware reg this Intel driver issue below

VMware Knowledge Base

Cheers

Onil Varghese

JasonEde
Contributor
Contributor

Cheers,

Still working fine with no issues.