Dear,
We encountering connectivity issue on newly install esxi 6.5 on PE R430
Whenever we preform file copy (30-40GB files size) from one of the VM Guest to another machine thought network (100MB), it end up with connectivity issue.
Both VM guest and host became inaccessible thought network.
ESX console can still be access, try with "restart management network" and use "testing management network" with result OK on default gateway or other machine.
But unable to access both guest and host thought network.
Connection become normal after shutdown / restart the esx host.
Check with VM guest OS event log, indicated that it should still be running during issue and following the auto shutdown/startup setting of esx host.
Look into log and found the following warning keep prompt during copy task perform.
cpu## :67701)WARNING: ntg3-throttled: Ntg3XmitPktList:372: vmnic0:TX ring full (0)
I wonder if anyone with suggestion on this issue? thx
Hi Franc,
Could you contact me using the email address in my public profile? I would like to take a look at the full kernel log and perhaps ask you for more diagnostic information. Alternatively, you could file support with VMware and let me know the SR number.
Thanks,
Bo
Hi,
I have the same issue, using the latest ntg3 driver on Dell R630 with Broadcom BCM5719 for iSCSI traffic (MTU 9000).
Support Request #17474802805 is created, but some information if you want. vm-support results will be sent to support.
After setting "vsish -e set /system/modules/ntg3/loglevels/ntg3 1", I get the following:
2017-05-31T09:06:00.250Z cpu4:66109)vmnic0:STB:3-182 IDX R:452-452-0 T:189-189 SUM R:12740-0 T:585-0-0-0 I:12716
2017-05-31T09:06:00.259Z cpu23:66106)vmnic3:STB:1-150 IDX R:455-455-0 T:490-490 SUM R:405959-0 T:74786-0-0-0 I:80417
2017-05-31T09:06:00.260Z cpu2:66103)vmnic2:STB:3-78 IDX R:608-608-0 T:148-148 SUM R:393824-0 T:73536-0-0-0 I:78304
2017-05-31T09:06:00.276Z cpu12:66112)vmnic1:STB:1-192 IDX R:3008-960-0 T:0-0 SUM R:3008-0 T:0-0-0-0 I:3008
2017-05-31T09:06:00.578Z cpu20:66097)ntg3:vmnic6:Ntg3PhyStateGet:426:link state changed (auxSTS: 0x871f)
2017-05-31T09:06:01.092Z cpu15:65629)NetPort: 1879: disabled port 0x3000006
2017-05-31T09:06:01.092Z cpu16:71357)NetSched: 628: vmnic6-0-tx: worldID = 71357 exits
2017-05-31T09:06:01.092Z cpu15:65629)Uplink: 9893: enabled port 0x3000006 with mac 00:0a:f7:a5:f4:32
2017-05-31T09:06:01.578Z cpu20:66097)ntg3:vmnic6:Ntg3PhyStateGet:404:link down
2017-05-31T09:06:02.428Z cpu12:66100)ntg3:vmnic7:Ntg3PhyStateGet:426:link state changed (auxSTS: 0x871f)
2017-05-31T09:06:03.092Z cpu15:65629)NetPort: 1879: disabled port 0x3000008
2017-05-31T09:06:03.092Z cpu18:71361)NetSched: 628: vmnic7-0-tx: worldID = 71361 exits
2017-05-31T09:06:03.092Z cpu15:65629)Uplink: 9893: enabled port 0x3000008 with mac 00:0a:f7:a5:f4:33
2017-05-31T09:06:03.428Z cpu12:66100)ntg3:vmnic7:Ntg3PhyStateGet:404:link down
2017-05-31T09:06:05.581Z cpu20:66097)ntg3:vmnic6:Ntg3PhyStateGet:426:link state changed (auxSTS: 0x871f)
2017-05-31T09:06:06.092Z cpu15:65629)NetPort: 1879: disabled port 0x3000006
2017-05-31T09:06:06.092Z cpu19:71368)NetSched: 628: vmnic6-0-tx: worldID = 71368 exits
2017-05-31T09:06:06.092Z cpu15:65629)Uplink: 9893: enabled port 0x3000006 with mac 00:0a:f7:a5:f4:32
2017-05-31T09:06:06.429Z cpu12:66100)ntg3:vmnic7:Ntg3PhyStateGet:426:link state changed (auxSTS: 0x871f)
2017-05-31T09:06:06.581Z cpu20:66097)ntg3:vmnic6:Ntg3PhyStateGet:404:link down
2017-05-31T09:06:06.911Z cpu23:66091)vmnic4:STB:3-122 IDX R:138-138-0 T:0-0 SUM R:138-0 T:0-0-0-0 I:121
2017-05-31T09:06:06.911Z cpu23:66094)vmnic5:STB:3-204 IDX R:3031-983-0 T:0-0 SUM R:3031-0 T:0-0-0-0 I:3020
2017-05-31T09:06:07.092Z cpu15:65629)NetPort: 1879: disabled port 0x3000008
2017-05-31T09:06:07.092Z cpu17:71375)NetSched: 628: vmnic7-0-tx: worldID = 71375 exits
2017-05-31T09:06:07.092Z cpu15:65629)Uplink: 9893: enabled port 0x3000008 with mac 00:0a:f7:a5:f4:33
2017-05-31T09:06:07.429Z cpu12:66100)ntg3:vmnic7:Ntg3PhyStateGet:404:link down
2017-05-31T09:06:07.429Z cpu12:66100)vmnic7:STB:3-176 IDX R:0-0-0 T:256-256 SUM R:0-0 T:1277-0-0-0 I:1200
2017-05-31T09:06:07.582Z cpu20:66097)vmnic6:STB:3-158 IDX R:0-0-0 T:253-253 SUM R:0-0 T:1267-0-0-0 I:1182
The network card goes up and down at startup, not every time. I can do some tests as the server is not yet in production.
Any updates on this? Just upgraded from 5.5 to 6.5 U1 on two Dell R620s with Broadcom 5719 and 5720 NICs. Have the ntg3 4.1.2.0 driver and latest firmware (bc 1.39 ncsi 1.4.5.0). Now each of them flap one of their interfaces often several times an hour. It's always the same one, slot 2 port 1 in one server and slot 3 port 1 in the other server. I opened ticket with Dell but they really didn't know and suggested I contact VMware, which I'm going to do soon.
Basically the port goes down, the iDrac sends me an email:
"The NIC Slot 3 Port 1 network link is down."
"Detailed Description: The network link is down. Either the network cable is not connected or the network device is not working.
Recommended Action: Verify that the network port is enabled and if the port has Activity/Speed LEDs, that they are lit. Check the network cable, network cable connections, and the attached network switch.
Message ID: NIC100"
Then about 5-20 seconds later I get another email
"The NIC Slot 3 Port 1 network link is started"
"Detailed Description: The transition from network link not started (down) to network link started (up) has been detected on the NIC controller port identified in the message.
Recommended Action: No response action is required.
Message ID: NIC101"
So I know these ports are flapping for some reason, Dell homed in on the driver. They feel the driver is causing the issue. Then I found this thread.
We fixed this for our R620 servers with Broadcom 5719/5720 NICs by disabling the ntg3 driver and enabling the tg3 driver. We also updated tg3 to the latest version. VMware support was well aware of this issue and stated they are seeing lots of problem with the ntg3 driver. All problems immediately ceased after changing the drivers.
have the same problem with the Dell R430. Hope for a fast solution
Hi,
Chnb I send you an email because I always problems with 6.5 U1. Thanks for your response.
I believe I am having a similar issue.
2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.734Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.734Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.734Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
2017-11-03T09:41:12.738Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.738Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136
2017-11-03T09:41:12.738Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136
I am hitting a similar issue.
My only workaround is to used intel ports only and don't use the broadcom ports for now until we can figure something out. Seems to be working fine if I use an intel port as the VM Network.
Has anyone tried the 4.1.3.0 driver, ESXi650-201712407-BG? https://kb.vmware.com/s/article/2151313
For what it's worth, I've got two hosts (DL380p Gen8) on the 4.1.0.0 driver, and they haven't had any issues with the ntg3 driver.
Unfortunately the 4.1.3.0 driver, shipped with ESXi 6.7, has the same problem (on a HP Gen10 server)
6.5.0 Update 2 (Build 8294253)
Broadcom Corp. NetXtreme BCM5719 GB eth
-Firmware version: BC 1.45 ncsi 1.4.14.0
-Version 4.1.3.0
This is kernal logs from the firewall (only guest OS on host).
2018:11:09-12:02:33 vmxnet3 0000:13:00.0 eth2: intr type 3, mode 0, 9 vectors allocated
2018:11:09-12:02:33 vmxnet3 0000:13:00.0 eth2: NIC Link is Up 10000 Mbps
2018:11:09-12:02:33 vmxnet3 0000:13:00.0 eth2: resetting
2018:11:09-12:02:33 vmxnet3 0000:13:00.0 eth2: intr type 3, mode 0, 9 vectors allocated
Disabling the eth interface from VMware, and turning them back on fixed the issue. I see that 4.1.3.0 came out since the start of this thread. Has anyone had any luck on this?