Hi,
I'm running a homelab with ESXi 6.7 (13006603). I got three nics in my host, two are onboard and one is an Intel ET 82576 dual-port pci-e card. All nics are assigned to the same vSwitch; actually only one is connected to the (physical) switch atm.
When I'm using one of the 82576 nics and put heavy load on it (like backing up VMs via Nakivo B&R) the nic stops workign after a while and is dead/Not responding anymore. Only a reboot of the host or (much easier) physically reconnecting the nic (cable out, cable in) solves the problem.
I was guessing there is a driver issue, so I updated to the latest driver by intel:
[root@esxi:~] /usr/sbin/esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:04:00.0 ne1000 Down 0Mbps Half 00:25:90:a7:65:dc 1500 Intel Corporation 82574L Gigabit Network Connection
vmnic1 0000:00:19.0 ne1000 Up 1000Mbps Full 00:25:90:a7:65:dd 1500 Intel Corporation 82579LM Gigabit Network Connection
vmnic2 0000:01:00.0 igb Down 0Mbps Half 90:e2:ba:1e:4d:c6 1500 Intel Corporation 82576 Gigabit Network Connection
vmnic3 0000:01:00.1 igb Down 0Mbps Half 90:e2:ba:1e:4d:c7 1500 Intel Corporation 82576 Gigabit Network Connection
[root@esxi:~] esxcli software vib list|grep igb
net-igb 5.2.5-1OEM.550.0.0.1331820 Intel VMwareCertified 2019-06-16
igbn 0.1.1.0-4vmw.670.2.48.13006603 VMW VMwareCertified 2019-06-07
Unfortunately this didn't solve the problem.
However ... this behaviour doesn't occur, when I'm using one of the nics using the ne1000 driver.
Any idea how to solve the issue?
(... or at least dig down to it's root?)
Thanks a lot in advance.
Regards
Chris
PS: I found another thread which might be connected to my problem: Stopping I/O on vmnic0 Same system behaviour, same driver.
I had now this lovely Issue as well.
igb 5.3.3..
Host had 12 Days uptime after upgrade from ancient version 6.0. and then started to get the Issue.
i downgraded now the Driver to 5.3.2 ,... so far is fine. Will see next 12 days hows going.
Hows for everyone else working?
Intel seems itself have an 5.3.6,... but not seeing ported to esxi as vib
thx
max
I've still been good on 5.3.2 as of today. I've been running constant pings across multiple paths and larger transfers of 10GB downloads and 30GB uploads each day without issue.
;( happend again for 5.3.2 driver on my side. My god is so impressive that working condition just breaks with newer versions. i need to replace the card - sensless.
@virtualslam: Which FW you're on?.
as i have 1.2 and seems issue with it
Best regards
Max
I have a 4 port Dell version with FW 1.77 and a 2 port Supermicro version with FW 1.13.1. I am running the tests across each card though and with the same port configuration I had before switching to 5.3.2. Sorry to hear it didn't help you. I'm still skeptical of the reliability of these NICs in ESXi. But it is just for a lab and I will need newer NICs one day when I switch to ESXi 7 since they aren't supported anymore anyway.
Well there we go. Pushed it just that much harder with a template deployment and the nic crashed. So 5.3.2 does not make it stable enough. I guess it's time to shop for some new nics.
I wanted to thank everyone for this thread. I am having the same issue. Upgraded one host to 6.5U3 (15256549) and thought everything was fine. Ran it for a couple weeks and didn't notice any issue. Upgraded three more hosts to the same build and the VM's just started dropping off the network left and right. After banging my head into the wall I finally found this thread. Sure enough, I moved my traffic to an onboard Broadcom and the problem stopped. I have eight Intel Gigabit ET Quad Port Cards that show up as Intel Corporation 82576 Gigabit Network Connection..two quad port cards per machine. We have a four port port-channel for production traffic that was having fits. Driver was the igb 5.3.3 driver. We were thinking about downgrading it to 5.3.2 but after reading all the comments here...seems like pretty much all the versions do not work once you go to 6.5U3.
We just purchased eight new Intel I350-T4 cards to replace the Intel 82576's. We have replaced them on one host and so far it seems to have fixed the issue. We were able to recreate the issue by copying a 50GB file off of one of our VM's...it would pretty much take down the network each time with 82576's. We tested...
Distributed vSwitch - 4 Port Port Channel using Intel 82576's - Copy File...high packet loss and eventually the VM's would lose connectivity.
Distributed vSwitch - 1 Port Trunk using Intel 82576 - Copy File...same problem
Standard vSwitch - 1 Port Trunk using Intel 82576 - Copy File...same problem...lost connectivity
Distributed vSwitch - 1 Port Trunk using Broadcom - Copy File - Works
Standard vSwitch - 1 Port Trunk using Broadcom - Copy File - Works.
Finally
Distributed vSwitch - 4 Port Port Channel using Intel I350-T4's - Copy File - Works.
So...it definitely seems to be related to the 6.5U3 or igb driver. Those NIC's were working fine before the upgrade. I hated wasting the money on eight new quad port NIC's but my VMware support case so far has gone nowhere and I had production equipment down.
Thank you all for the info you posted. I still have a case open with VMware but now that I am swapping the NIC's to resolve the issue, they will probably end up just closing it.
Just as a follow-up to my situation. I got Intel I350 nics and it has worked well in every metric that I tested before. pfSense can now use vmxnet3. Storage vMotions are using both of the vnics that I have given it to use and is not losing connection like before. And lastly backups that would cause it to lose connections are working without issue as well.
Thank you, it's been a long time coming, and following this method, 2 Dell C2100 servers, VMware ESXi, 6.5.0, 10719125 have been operating normally for 19 days.