We have started to experience over the last couple of weeks our ESXi 6.5 U2 hosts becoming unresponsive after a time, they drop off from Virtual Center and the DUCI become unresponsive. The only method of access is through SSH, which is intern slow. The VM's on these hosts do continue to run for a time, and then themselves have performance issues where as the servers needed.
VMkernel logging shows the following;
/var/log/vmkernel.log -- pattern
2018-09-28T13:55:10.431Z cpu34:65677)[qedentv_multictx_remove_rx_rule:1534(vmnic7)]Removing mac:00:50:56:a1:57:00, vlan_id:0x0, from fp:1, op:MAC_DEL, hw_fn:0
2018-09-28T13:55:10.431Z cpu34:65677)[qedentv_multictx_set_rx_rule:1139(vmnic7)]Applying 00:50:56:a1:57:00 filter, vlan_id:0xffff, fp_id:0, hw_fn:0.
2018-09-28T13:55:10.431Z cpu34:65677)[qedentv_multictx_q_free:5068(vmnic7)]fp:1, is_last:0, qtype:RX, hw_fn:0
2018-09-28T13:55:20.431Z cpu25:65677)[qedentv_multictx_q_alloc:4641(vmnic7)]fp:1, feat:0x0, qtype:RX, hw_fn:0
2018-09-28T13:55:20.458Z cpu43:65677)[qedentv_multictx_remove_rx_rule:1534(vmnic7)]Removing mac:00:50:56:a1:57:00, vlan_id:0x0, from fp:0, op:MAC_DEL, hw_fn:0
2018-09-28T13:55:20.458Z cpu43:65677)[qedentv_multictx_set_rx_rule:1139(vmnic7)]Applying 00:50:56:a1:57:00 filter, vlan_id:0xffff, fp_id:1, hw_fn:0.
2018-09-28T13:55:24.431Z cpu45:65677)[qedentv_multictx_remove_rx_rule:1534(vmnic6)]Removing mac:00:50:56:a1:4d:77, vlan_id:0x0, from fp:1, op:MAC_DEL, hw_fn:0
2018-09-28T13:55:24.431Z cpu45:65677)[qedentv_multictx_set_rx_rule:1139(vmnic6)]Applying 00:50:56:a1:4d:77 filter, vlan_id:0xffff, fp_id:0, hw_fn:0.
2018-09-28T13:55:24.432Z cpu45:65677)[qedentv_multictx_q_free:5068(vmnic6)]fp:1, is_last:0, qtype:RX, hw_fn:0
2018-09-28T13:55:29.432Z cpu32:65677)[qedentv_multictx_q_alloc:4641(vmnic6)]fp:1, feat:0x0, qtype:RX, hw_fn:0
2018-09-28T13:55:29.457Z cpu24:65677)[qedentv_multictx_remove_rx_rule:1534(vmnic6)]Removing mac:00:50:56:a1:4d:77, vlan_id:0x0, from fp:0, op:MAC_DEL, hw_fn:0
2018-09-28T13:55:29.457Z cpu24:65677)[qedentv_multictx_set_rx_rule:1139(vmnic6)]Applying 00:50:56:a1:4d:77 filter, vlan_id:0xffff, fp_id:1, hw_fn:0.
localcli
# localcli network nic list
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
-------------------------------------------------------------------------------------------------------------
vmnic0 0000:18:00.0 igbn Up Down 0 Half 24:6e:96:b7:e9:44 1500 Intel Corporation I350 Gigabit Network Connection
vmnic1 0000:18:00.1 igbn Up Down 0 Half 24:6e:96:b7:e9:45 1500 Intel Corporation I350 Gigabit Network Connection
vmnic2 0000:18:00.2 igbn Up Down 0 Half 24:6e:96:b7:e9:46 1500 Intel Corporation I350 Gigabit Network Connection
vmnic3 0000:18:00.3 igbn Up Down 0 Half 24:6e:96:b7:e9:47 1500 Intel Corporation I350 Gigabit Network Connection
vmnic4 0000:3b:00.0 qedentv Up Up 10000 Full f4:e9:d4:77:99:dc 9000 QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic5 0000:3b:00.1 qedentv Up Up 10000 Full f4:e9:d4:77:99:dd 9000 QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic6 0000:5e:00.0 qedentv Up Up 10000 Full f4:e9:d4:73:5c:4e 1500 QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic7 0000:5e:00.1 qedentv Up Up 10000 Full f4:e9:d4:73:5c:4f 1500 QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
I have been informed this could be a firmware/driver issue, however I want to see if any one in the community has had the issue.
Thanks
Tristan
Based off some of those messages, there was something similar mentioned to be fixed in the version below:
VMware ESXi 6.5 qedentv 3.9.17.1 NIC Driver for QLogic FastLinQ QL45xxx, QL41xxx Ethernet Controller
Are you running a version prior to this release for those NICs?
Based off some of those messages, there was something similar mentioned to be fixed in the version below:
VMware ESXi 6.5 qedentv 3.9.17.1 NIC Driver for QLogic FastLinQ QL45xxx, QL41xxx Ethernet Controller
Are you running a version prior to this release for those NICs?
We are also seeing this issue on some new hosts and currently have calls open with VMware and Dell.
I would be keen to know if the driver linked here resolved your issue and these messages in the log?
In our case appears to be affecting management network where the host disconnects from vCenter and will then re-connect shortly after. The NIC is QL41164HMRJ, we have installed driver 3.10.23.1 and latest firmware available from Dell 8.24.46.0 and are currently monitoring.
Are you still experience the issue with this driver 3.10.23.1 and the firmware Dell 8.24.46.0.
regards!
RMV
Hi - yes unfortunately the issue is on going. We initially closed the case(s) but have recently observed further disconnection. Were advised that the above messages are informational and 'log spew'.
Hey,
Currently we are facing the same issue as you described in this post and i was curious if you fixed this issue described in this topic and how you achieve this?
We initially installed updated firmware and drivers for the QLogic NIC:
qedentv device firmware mfw 8.24.46.0 storm 8.37.9.0 (vNetwork_Firmware_25XD6_WN64_14.07.50_A00-00.EXE)
qedentv driver 3.10.23.1 (QLG-qed-ESXi6.5-offline_bundle-11329876.zip)
This improved the situation greatly, however we did still experience a disconnect on one host (they seem to manifest after some up time). We proceeded to install BIOS & LCC firmware also:
BIOS_YJXXX_WN64_1.6.13.EXE
iDRAC-with-Lifecycle-Controller_Firmware_G6W0W_WN64_3.30.30.30_A00.EXE
And have so far not observed any further disconnection but are monitoring the situation closely. Support calls have been closed today as issue is no longer presenting itself for further troubleshooting.