VMware Cloud Community
oneilv
Enthusiast
Enthusiast

igbn driver issue on ESXi 6.7 Update1

Hey guys,

We've encountered some issues with the igbn driver (version 1.4.7) on Dell hosts running ESXi 6.7 Update1. This has been raised with the vendors and Dell and VMware have both come back advising that there might a known issue in this driver. One our customers hosts running this driver encountered a PSOD.

Dell infact has reached out to their vendors (Intel) and they have confirmed that this impacting the Intel i350 cards. Both Vmware and Dell have advised to disable the igbn drivers and use the native igbo drivers.

Has anyone else encountered similar issues wit this driver? If yes then has anyone disabled it to use the igbo drivers instead? Is it a stable workaround?

Looking forward to some feedback on those who have switched over to the igb drivers.

Cheers, Onil

37 Replies
oneilv
Enthusiast
Enthusiast

Further to the above, VMware recommended to disable the igbn driver and use the native igbo driver.

We made these changes on one host and the host experienced disconnections almost every night with the management network going down and we had to manually restart the management network every time to get connectivity to the VC restored.

Raising this with VMware they still don't have an ETA for the a fix and they along with Dell, have advised that they now have a test driver available which is not yet released for production.

Due to the long delays in this issue, we have decided to roll back to the old igbn driver version 4.1 as it seems to be more stable compared to version 1.4.7

Cheers, Onil

vm7user
Enthusiast
Enthusiast

We also have PSOD issue with I350 and VMware-VMvisor-Installer-6.7.0.update02-13006603.x86_64-DellEMC_Customized-A00.iso

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

Are you running the cards using the igbn drivers? If yes what version?

Reply
0 Kudos
anvanster
Enthusiast
Enthusiast

Can you please post the screenshot of the PSOD. It might be helpful.

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

vm7user​ can you please share a screenshot of the PSOD? Seems like you are running ESXi 6.7 Update2 while we are seeing this issue on 6.7 Update1

Reply
0 Kudos
vm7user
Enthusiast
Enthusiast

Image_esxi_dell.png

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

Thanks vm7user

Did VMware GSS get back to you as to what caused the PSOD? Or did they confirm it was the same igbn driver?

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

Hi All,

Please note that VMware have come back with the below.

Summary

PSOD on boot multiple hosts

Cause

The Intel igbn driver in combination with VMWare's internal network handling has resulted in this scenario

Resolution

The codelevel fix has been released for in 6.5u3 and is slated for 6.7U3 which is pending release shortly.

vm7user​ - maybe this is impacting you too. Have you hear back from VMware at all reg the PSOD's?

Cheers, Onil

Reply
0 Kudos
danf201110141
Contributor
Contributor

I have recently migrated two servers from 5.5->6.5->6.7, and the two of them are near identical with the exception of one card having an Intel I350 adapter. The server with the I350 adapter became very unstable and any serious sustained network traffic would crash the management interface.

This was troublesome as vCenter was also on this server (it's been moved to the more stable server) and everything became unresponsive. Restarting the management network via console did not help.

One thing that would always trigger it is a full backup (not an incremental backup) as it would transfer a few TB off of the host. It would always crash about 4-5 hours in the job necessitating a host restart as the management network would not reset.

I think perhaps part of the issue in our case is the onboard networking requires the igb driver and the I350 wanted the igbn driver. I've blacklisted the igbn driver and now ESXi is using the igb driver for everything.

I'm currently monitoring the backup but it's now in the verifying stage so I don't believe it will crash the management interface any more.

Hopefully 6.7u3 will be released soon - I will try unblocking the igbn driver then to see if the problem is resolved.

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

Hi danf201110141

Intel have released a new driver for igbn, please see the below link for it. Read through the release notes and see if you can test this on one of your ESXi hosts.

You will also need to upgrade the firmware on the Intel cards to ensure interop with this driver version.

Give this a crack and hopefully it fixes your issue. Or else wait for vSphere 6.7 Update3 as they are making changes in the way network is handled.

Intel i350 Driver igbn 1.4.10 https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI67-INTEL-IGBN-1410&productId=742

Keep us posted with results if you do try the above driver.

Cheers,

Onil

mweissen
Contributor
Contributor

We have seen the same PSOD about 3-4 times since April on a Dell PowerEdge R740xd. Just applied the update and will keep you posted.

cruz878
Contributor
Contributor

esxi_crash.jpg

Different HW than OP, but I am also seeing similar on 6.7u2. Would like to be kept abreast of any developments/fixes.

I posted full logs and additional detail of my error on Reddit: https://www.reddit.com/r/vmware/comments/dan3k3/pf_exception_14_in_world_2481950vmnic00tx_ip/

oneilv
Enthusiast
Enthusiast

Hi mweissen,

Has the host network been stable since you updated the driver to the latest version?

Any issues or feedback you can provide to us on this?

Cheers, Onil

Reply
0 Kudos
mweissen
Contributor
Contributor

Hello oneilv​!

The system has not PSOD'ed since we applied the patch - for 20 days straight now. But in the past we had already seen uptimes of up to 40 days before it crashed. So it's looking good until now, but I cannot say 100% sure that it's fixed for good.

cruz878
Contributor
Contributor

Did you apply 6.7u3 or igbn 1.4.10? I was also up for 31 days before the first PSOD.

Reply
0 Kudos
mweissen
Contributor
Contributor

First I applied only 6.7u3 which led to a PSOD after about a week. Then I also applied igbn 1.4.10. Since that we are now running 20 days straight.

Reply
0 Kudos
cruz878
Contributor
Contributor

Thanks for the information. I am holding off patching myself for now as there has seemingly been no confirmation either corrects the problem. Would appreciate it if I could ping you in another 3 weeks or so to get another update.

Reply
0 Kudos
JasonEde
Contributor
Contributor

We've seen this on 6.7 Update 3. Drivers on our I350 are 1.4.7. Will be updating to 1.4.10 and get back if this has fixed it or not.

Reply
0 Kudos
oneilv
Enthusiast
Enthusiast

Hi JasonEde​,

Please keep us posted if you still encounter issues on the Intel i350 nice after upgrading the drivers to igbn version 1.4.10

Cheers,

Onil

Reply
0 Kudos