VMware Cloud Community
parsdade
Contributor
Contributor

dead network vm machine

 

Hi

For several days, the network of virtual machine has been down at certain hours of the day, especially at night, but access to the main esxi machine is available.
And the only solution is to shutdown and no shutdown ports server in the Cisco Router Switch.

Server have 2 network card : 82576 Gigabit Network Connection and both is connected

Version ESXI: 6.7.0 Update 3 (Build 14320388)

Driver:igb

 

Kernel Logs

 

 

 

 

2021-03-21T00:26:15.071Z cpu8:2099804)NetPort: 1580: disabled port 0x2000007
2021-03-21T00:26:15.073Z cpu8:2099804)Vmxnet3: 18576: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 2 port 0x2000007
2021-03-21T00:26:15.073Z cpu8:2099804)Vmxnet3: 18817: Using default queue delivery for vmxnet3 for port 0x2000007
2021-03-21T00:26:15.073Z cpu8:2099804)NetPort: 1359: enabled port 0x2000007 with mac 00:50:56:b4:f9:62
2021-03-21T00:28:30.490Z cpu1:2100113)Vmxnet3: 24930: Machine-Hosting,00:50:56:b4:10:4b, portID(33554440): Hang detected,numHangQ: 1, enableGen: 751
2021-03-21T00:28:30.490Z cpu1:2100113)Vmxnet3: 24939: portID:33554440, QID: 0, next2TX: 15, next2Comp: 17, lastNext2TX: 29, next2Write:17, ringSize: 512 inFlight: 29, delay(ms): 18720,txStopped: 0
2021-03-21T00:28:30.490Z cpu1:2100113)Vmxnet3: 24943: portID: 33554440, sop: 17 eop: 17 enableGen: 0 qid: 751, pkt: 0x459a73ef58c0
2021-03-21T00:28:30.490Z cpu1:2100113)NetSched: 717: 0x2000002: received a force quiesce for port 0x2000008, dropped 540 pkts
2021-03-21T00:28:30.493Z cpu1:2100113)NetPort: 1580: disabled port 0x2000008
2021-03-21T00:28:30.495Z cpu1:2100113)Vmxnet3: 18576: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 2 port 0x2000008
2021-03-21T00:28:30.495Z cpu1:2100113)Vmxnet3: 18817: Using default queue delivery for vmxnet3 for port 0x2000008
2021-03-21T00:28:30.495Z cpu1:2100113)NetPort: 1359: enabled port 0x2000008 with mac 00:50:56:b4:10:4b
2021-03-21T00:29:00.058Z cpu8:2099804)Vmxnet3: 24930: HostMachine,00:50:56:b4:f9:62, portID(33554439): Hang detected,numHangQ: 1, enableGen: 753
2021-03-21T00:29:00.058Z cpu8:2099804)Vmxnet3: 24939: portID:33554439, QID: 0, next2TX: 15, next2Comp: 17, lastNext2TX: 26, next2Write:17, ringSize: 512 inFlight: 29, delay(ms): 33508,txStopped: 0
2021-03-21T00:29:00.058Z cpu8:2099804)Vmxnet3: 24943: portID: 33554439, sop: 17 eop: 17 enableGen: 0 qid: 753, pkt: 0x459a49a27c00



2021-03-21T01:11:50.492Z cpu12:2100113)NetPort: 1580: disabled port 0x2000008
2021-03-21T01:11:50.494Z cpu12:2100113)Vmxnet3: 18576: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 2 port 0x2000008
2021-03-21T01:11:50.494Z cpu12:2100113)Vmxnet3: 18817: Using default queue delivery for vmxnet3 for port 0x2000008
2021-03-21T01:11:50.494Z cpu12:2100113)NetPort: 1359: enabled port 0x2000008 with mac 00:50:56:b4:10:4b
2021-03-21T01:14:00.059Z cpu2:2099804)Vmxnet3: 24930: HostMachine,00:50:56:b4:f9:62, portID(33554439): Hang detected,numHangQ: 1, enableGen: 821
2021-03-21T01:14:00.059Z cpu2:2099804)Vmxnet3: 24939: portID:33554439, QID: 0, next2TX: 15, next2Comp: 17, lastNext2TX: 33, next2Write:17, ringSize: 512 inFlight: 29, delay(ms): 28312,txStopped: 0
2021-03-21T01:14:00.059Z cpu2:2099804)Vmxnet3: 24943: portID: 33554439, sop: 17 eop: 17 enableGen: 0 qid: 821, pkt: 0x459a73e36940
2021-03-21T01:14:00.059Z cpu2:2099804)NetSched: 717: 0x2000002: received a force quiesce for port 0x2000007, dropped 544 pkts
2021-03-21T01:14:00.062Z cpu2:2099804)NetPort: 1580: disabled port 0x2000007
2021-03-21T01:14:00.063Z cpu2:2099804)Vmxnet3: 18576: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 2 port 0x2000007
2021-03-21T01:14:00.063Z cpu2:2099804)Vmxnet3: 18817: Using default queue delivery for vmxnet3 for port 0x2000007
2021-03-21T01:14:00.063Z cpu2:2099804)NetPort: 1359: enabled port 0x2000007 with mac 00:50:56:b4:f9:62
2021-03-21T01:14:30.489Z cpu14:2100113)Vmxnet3: 24930: Machine-Hosting,00:50:56:b4:10:4b, portID(33554440): Hang detected,numHangQ: 1, enableGen: 823
2021-03-21T01:14:30.489Z cpu14:2100113)Vmxnet3: 24939: portID:33554440, QID: 0, next2TX: 15, next2Comp: 17, lastNext2TX: 25, next2Write:17, ringSize: 512 inFlight: 29, delay(ms): 28720,txStopped: 0
2021-03-21T01:14:30.489Z cpu14:2100113)Vmxnet3: 24943: portID: 33554440, sop: 17 eop: 17 enableGen: 0 qid: 823, pkt: 0x459a73e67840
2021-03-21T01:14:30.489Z cpu14:2100113)NetSched: 717: 0x2000002: received a force quiesce for port 0x2000008, dropped 542 pkts



2021-03-21T05:48:55.482Z cpu23:2100113)NetSched: 717: 0x2000002: received a force quiesce for port 0x2000008, dropped 541 pkts
2021-03-21T05:48:55.485Z cpu23:2100113)NetPort: 1580: disabled port 0x2000008
2021-03-21T05:48:55.487Z cpu23:2100113)Vmxnet3: 18576: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 2 port 0x2000008
2021-03-21T05:48:55.487Z cpu23:2100113)Vmxnet3: 18817: Using default queue delivery for vmxnet3 for port 0x2000008
2021-03-21T05:48:55.487Z cpu23:2100113)NetPort: 1359: enabled port 0x2000008 with mac 00:50:56:b4:10:4b
2021-03-21T05:49:05.076Z cpu19:2099804)Vmxnet3: 24930: HostMachine,00:50:56:b4:f9:62, portID(33554439): Hang detected,numHangQ: 1, enableGen: 1243
2021-03-21T05:49:05.076Z cpu19:2099804)Vmxnet3: 24939: portID:33554439, QID: 0, next2TX: 15, next2Comp: 17, lastNext2TX: 35, next2Write:17, ringSize: 512 inFlight: 29, delay(ms): 18519,txStopped: 0
2021-03-21T05:49:05.076Z cpu19:2099804)Vmxnet3: 24943: portID: 33554439, sop: 17 eop: 17 enableGen: 0 qid: 1243, pkt: 0x459a7420cf40
2021-03-21T05:49:05.076Z cpu19:2099804)NetSched: 717: 0x2000002: received a force quiesce for port 0x2000007, dropped 539 pkts
2021-03-21T05:49:05.079Z cpu19:2099804)NetPort: 1580: disabled port 0x2000007
2021-03-21T05:49:05.081Z cpu19:2099804)Vmxnet3: 18576: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 2 port 0x2000007
2021-03-21T05:49:05.081Z cpu19:2099804)Vmxnet3: 18817: Using default queue delivery for vmxnet3 for port 0x2000007
2021-03-21T05:49:05.081Z cpu19:2099804)NetPort: 1359: enabled port 0x2000007 with mac 00:50:56:b4:f9:62
2021-03-21T05:50:02.467Z cpu17:2097865)DVFilter: 5963: Checking disconnected filters for timeouts
2021-03-21T05:50:37.765Z cpu2:2097769)<6>igb: vmnic0 NIC Link is Down
2021-03-21T05:50:54.401Z cpu6:2097769)<6>igb: vmnic0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
2021-03-21T05:50:56.467Z cpu4:2097268)NetqueueBal: 5032: vmnic0: device Up notification, reset logical space needed
2021-03-21T05:50:56.467Z cpu4:2097268)NetPort: 1580: disabled port 0x2000002
2021-03-21T05:50:56.467Z cpu0:2109984)NetSched: 654: vmnic0-0-tx: worldID = 2109984 exits
2021-03-21T05:50:56.467Z cpu4:2097268)Uplink: 11689: enabled port 0x2000002 with mac 60:eb:69:20:b8:84
2021-03-21T05:50:56.467Z cpu4:2097268)NetPort: 1580: disabled port 0x2000002
2021-03-21T05:50:56.467Z cpu13:2118447)NetSched: 654: vmnic0-0-tx: worldID = 2118447 exits
2021-03-21T05:50:56.467Z cpu4:2097268)Uplink: 11689: enabled port 0x2000002 with mac 60:eb:69:20:b8:84
2021-03-21T05:52:07.108Z cpu6:2097769)<6>igb: vmnic1 NIC Link is Down
2021-03-21T05:52:23.102Z cpu6:2097769)<6>igb: vmnic1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
2021-03-21T05:52:26.465Z cpu2:2097268)NetqueueBal: 5032: vmnic1: device Up notification, reset logical space needed
2021-03-21T05:52:26.465Z cpu2:2097268)NetPort: 1580: disabled port 0x2000004
2021-03-21T05:52:26.465Z cpu22:2110005)NetSched: 654: vmnic1-0-tx: worldID = 2110005 exits
2021-03-21T05:52:26.465Z cpu2:2097268)Uplink: 11689: enabled port 0x2000004 with mac 60:eb:69:20:b8:85
2021-03-21T05:52:26.465Z cpu2:2097268)NetPort: 1580: disabled port 0x2000004
2021-03-21T05:52:26.465Z cpu2:2097268)Uplink: 11689: enabled port 0x2000004 with mac 60:eb:69:20:b8:85
2021-03-21T05:52:26.465Z cpu5:2118455)NetSched: 654: vmnic1-0-tx: worldID = 2118455 exits
2021-03-21T06:00:02.459Z cpu17:2097865)DVFilter: 5963: Checking disconnected filters for timeouts
2021-03-21T06:10:02.454Z cpu17:2097865)DVFilter: 5963: Checking disconnected filters for timeouts
2021-03-21T06:20:02.446Z cpu17:2097865)DVFilter: 5963: Checking disconnected filters for timeouts
2021-03-21T06:21:56.333Z cpu12:2118699)J6: 2651: '101-1': Exiting async journal replay manager world

 

 

 

Labels (3)
0 Kudos
8 Replies
DavoudTeimouri
Virtuoso
Virtuoso

Is there any enabled security feature on network devices?

-------------------------------------------------------------------------------------
Davoud Teimouri - https://www.teimouri.net - Twitter: @davoud_teimouri Facebook: https://www.facebook.com/teimouri.net/
0 Kudos
parsdade
Contributor
Contributor

Hi

No, there is no roll on the switch router

Most importantly, if there was a problem with the router switch, access to the ESXi machine would not be possible, but only access to virtual machines would be lost and must restart Esxi machine or disable enable port lan in switch 

 

Thanks

 

0 Kudos
parsdade
Contributor
Contributor

hi 

i update esxi to VMware ESXi, 6.7.0, 17700523 but not solve problem

any update ?

0 Kudos
parsdade
Contributor
Contributor

any update and help ?

 

0 Kudos
snapfriend
Contributor
Contributor

I am also facing a similar issue where log says about the same. Below are the keywords in logs

- Hang detected,numHangQ: 1, enableGen:

- received a force quiesce for port

 

0 Kudos
snapfriend
Contributor
Contributor

I could find the similar keywords in vmware kb article but it says about UCS hardware & it can occur due to incorrect configuration of network adapter policy. See if it is relevant to you.

Article - https://kb.vmware.com/s/article/81574

 

0 Kudos
snapfriend
Contributor
Contributor

Hello 
Did you find below logs in the vmkernel logs? Just want to confirm with you about it

2019-06-09T21:42:34.863Z cpu14:69176)MemSched: 14635: Admission failure in path: smx/sfcb-ProviderMa.69170/uw.69170

 

Please find below article for such issue

Document - Advisory: HPE Integrity Superdome X - HPE SMX WBEM Providers Cause The VMware ESXi Operat...

0 Kudos
iwayCR
Contributor
Contributor

Just to give you guys (or any other person stumbling on this thread in the future) closure on this problem:

 

There is no solution, except tossing that network card and using one that runs with another driver (e1000 or igbn)

We looked very long for a solution (including paid vmware support cases) but in the end, there was no way around replacing the NICs. (and they run/ran very fine with older ESXi 6.0, 6.5 and 6.7 up to certain patch level)

 

We experienced that problem with dozens of our server and (slightly) different NICs (PCie and onboard) with different chipsets

If they are using the igb driver, then the were affected. How much? that depends on the network load. For management only, it could run for weeks. When doing backups it often only took minutes to disconnect.

0 Kudos