VMware Cloud Community
tris179
Contributor
Contributor
Jump to solution

ESXi 6.5 U2 Hosts become unresponsive - VMKernel.log errors

We have started to experience over the last couple of weeks our ESXi 6.5 U2 hosts becoming unresponsive after a time, they drop off from Virtual Center and the DUCI become unresponsive.  The only method of access is through SSH, which is intern slow.  The VM's on these hosts do continue to run for a time, and then themselves have performance issues where as the servers needed.

VMkernel logging shows the following;

/var/log/vmkernel.log  -- pattern
2018-09-28T13:55:10.431Z cpu34:65677)[qedentv_multictx_remove_rx_rule:1534(vmnic7)]Removing mac:00:50:56:a1:57:00, vlan_id:0x0, from fp:1, op:MAC_DEL, hw_fn:0
2018-09-28T13:55:10.431Z cpu34:65677)[qedentv_multictx_set_rx_rule:1139(vmnic7)]Applying 00:50:56:a1:57:00 filter, vlan_id:0xffff, fp_id:0, hw_fn:0.
2018-09-28T13:55:10.431Z cpu34:65677)[qedentv_multictx_q_free:5068(vmnic7)]fp:1, is_last:0, qtype:RX, hw_fn:0
2018-09-28T13:55:20.431Z cpu25:65677)[qedentv_multictx_q_alloc:4641(vmnic7)]fp:1, feat:0x0, qtype:RX, hw_fn:0
2018-09-28T13:55:20.458Z cpu43:65677)[qedentv_multictx_remove_rx_rule:1534(vmnic7)]Removing mac:00:50:56:a1:57:00, vlan_id:0x0, from fp:0, op:MAC_DEL, hw_fn:0
2018-09-28T13:55:20.458Z cpu43:65677)[qedentv_multictx_set_rx_rule:1139(vmnic7)]Applying 00:50:56:a1:57:00 filter, vlan_id:0xffff, fp_id:1, hw_fn:0.
2018-09-28T13:55:24.431Z cpu45:65677)[qedentv_multictx_remove_rx_rule:1534(vmnic6)]Removing mac:00:50:56:a1:4d:77, vlan_id:0x0, from fp:1, op:MAC_DEL, hw_fn:0
2018-09-28T13:55:24.431Z cpu45:65677)[qedentv_multictx_set_rx_rule:1139(vmnic6)]Applying 00:50:56:a1:4d:77 filter, vlan_id:0xffff, fp_id:0, hw_fn:0.
2018-09-28T13:55:24.432Z cpu45:65677)[qedentv_multictx_q_free:5068(vmnic6)]fp:1, is_last:0, qtype:RX, hw_fn:0
2018-09-28T13:55:29.432Z cpu32:65677)[qedentv_multictx_q_alloc:4641(vmnic6)]fp:1, feat:0x0, qtype:RX, hw_fn:0
2018-09-28T13:55:29.457Z cpu24:65677)[qedentv_multictx_remove_rx_rule:1534(vmnic6)]Removing mac:00:50:56:a1:4d:77, vlan_id:0x0, from fp:0, op:MAC_DEL, hw_fn:0
2018-09-28T13:55:29.457Z cpu24:65677)[qedentv_multictx_set_rx_rule:1139(vmnic6)]Applying 00:50:56:a1:4d:77 filter, vlan_id:0xffff, fp_id:1, hw_fn:0.

localcli


# localcli network nic list
Name    PCI Device    Driver   Admin Status  Link Status  Speed  Duplex  MAC Address MTU  Description
-------------------------------------------------------------------------------------------------------------
vmnic0  0000:18:00.0  igbn     Up Down             0  Half 24:6e:96:b7:e9:44  1500  Intel Corporation I350 Gigabit Network Connection
vmnic1  0000:18:00.1  igbn     Up Down             0  Half 24:6e:96:b7:e9:45  1500  Intel Corporation I350 Gigabit Network Connection
vmnic2  0000:18:00.2  igbn     Up Down             0  Half 24:6e:96:b7:e9:46  1500  Intel Corporation I350 Gigabit Network Connection
vmnic3  0000:18:00.3  igbn     Up Down             0  Half 24:6e:96:b7:e9:47  1500  Intel Corporation I350 Gigabit Network Connection
vmnic4  0000:3b:00.0  qedentv  Up Up           10000  Full f4:e9:d4:77:99:dc  9000  QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic5  0000:3b:00.1  qedentv  Up Up           10000  Full f4:e9:d4:77:99:dd  9000  QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic6  0000:5e:00.0  qedentv  Up Up           10000  Full f4:e9:d4:73:5c:4e  1500  QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic7  0000:5e:00.1  qedentv  Up Up           10000  Full f4:e9:d4:73:5c:4f  1500  QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter

I have been informed this could be a firmware/driver issue, however I want to see if any one in  the community has had the issue.

Thanks

Tristan

Reply
0 Kudos
1 Solution

Accepted Solutions
TotesHagopes
VMware Employee
VMware Employee
Jump to solution

Based off some of those messages, there was something similar mentioned to be fixed in the version below:

VMware ESXi 6.5 qedentv 3.9.17.1 NIC Driver for QLogic FastLinQ QL45xxx, QL41xxx Ethernet Controller

Download VMware vSphere

Are you running a version prior to this release for those NICs?

View solution in original post

Reply
0 Kudos
6 Replies
TotesHagopes
VMware Employee
VMware Employee
Jump to solution

Based off some of those messages, there was something similar mentioned to be fixed in the version below:

VMware ESXi 6.5 qedentv 3.9.17.1 NIC Driver for QLogic FastLinQ QL45xxx, QL41xxx Ethernet Controller

Download VMware vSphere

Are you running a version prior to this release for those NICs?

Reply
0 Kudos
danpcc
Contributor
Contributor
Jump to solution

We are also seeing this issue on some new hosts and currently have calls open with VMware and Dell.

I would be keen to know if the driver linked here resolved your issue and these messages in the log?

In our case appears to be affecting management network where the host disconnects from vCenter and will then re-connect shortly after. The NIC is QL41164HMRJ, we have installed driver 3.10.23.1 and latest firmware available from Dell 8.24.46.0 and are currently monitoring.

Reply
0 Kudos
reymtz
Contributor
Contributor
Jump to solution

Are you still experience the issue with this driver 3.10.23.1 and the firmware Dell 8.24.46.0.

regards!

RMV

Reply
0 Kudos
danpcc
Contributor
Contributor
Jump to solution

Hi - yes unfortunately the issue is on going. We initially closed the case(s) but have recently observed further disconnection. Were advised that the above messages are informational and 'log spew'.

Reply
0 Kudos
t1mm1t84
Contributor
Contributor
Jump to solution

Hey,

​Currently we are facing the same issue as you described in this post and i was curious if you fixed this issue described in this topic and how you achieve this?

Reply
0 Kudos
danpcc
Contributor
Contributor
Jump to solution

We initially installed updated firmware and drivers for the QLogic NIC:

qedentv device firmware mfw 8.24.46.0 storm 8.37.9.0 (vNetwork_Firmware_25XD6_WN64_14.07.50_A00-00.EXE)

qedentv driver 3.10.23.1 (QLG-qed-ESXi6.5-offline_bundle-11329876.zip)

This improved the situation greatly, however we did still experience a disconnect on one host (they seem to manifest after some up time). We proceeded to install BIOS & LCC firmware also:

BIOS_YJXXX_WN64_1.6.13.EXE

iDRAC-with-Lifecycle-Controller_Firmware_G6W0W_WN64_3.30.30.30_A00.EXE

And have so far not observed any further disconnection but are monitoring the situation closely. Support calls have been closed today as issue is no longer presenting itself for further troubleshooting.

Reply
0 Kudos