Kennylcf
Contributor
Contributor

ESXi 6.5 connectivity issue on PowerEdge R430

Dear,

We encountering connectivity issue on newly install esxi 6.5 on PE R430

Whenever we preform file copy (30-40GB files size) from one of the VM Guest to another machine thought network (100MB),  it end up with connectivity issue.

Both VM guest and host became inaccessible thought network.

ESX console can still be access, try with "restart management network" and use "testing management network" with result OK on default gateway or other machine.

But unable to access both guest and host thought network.

Connection become normal after shutdown / restart the esx host.

Check with VM guest OS event log, indicated that it should still be running during issue and following the auto shutdown/startup setting of esx host.

Look into log and found the following warning keep prompt during copy task perform.

cpu## :67701)WARNING: ntg3-throttled: Ntg3XmitPktList:372: vmnic0:TX ring full (0)

I wonder if anyone with suggestion on this issue? thx

Tags (1)
50 Replies
chnb
VMware Employee
VMware Employee

Hi Franc,

Could you contact me using the email address in my public profile? I would like to take a look at the full kernel log and perhaps ask you for more diagnostic information. Alternatively, you could  file support with VMware and let me know the SR number.

Thanks,

Bo

0 Kudos
rverchere
Contributor
Contributor

Hi,

I have the same issue, using the latest ntg3 driver on Dell R630 with Broadcom BCM5719 for iSCSI traffic (MTU 9000).

Support Request #17474802805 is created, but some information if you want. vm-support results will be sent to support.

After setting "vsish -e set /system/modules/ntg3/loglevels/ntg3 1", I get the following:

2017-05-31T09:06:00.250Z cpu4:66109)vmnic0:STB:3-182 IDX R:452-452-0 T:189-189 SUM R:12740-0 T:585-0-0-0 I:12716

2017-05-31T09:06:00.259Z cpu23:66106)vmnic3:STB:1-150 IDX R:455-455-0 T:490-490 SUM R:405959-0 T:74786-0-0-0 I:80417

2017-05-31T09:06:00.260Z cpu2:66103)vmnic2:STB:3-78 IDX R:608-608-0 T:148-148 SUM R:393824-0 T:73536-0-0-0 I:78304

2017-05-31T09:06:00.276Z cpu12:66112)vmnic1:STB:1-192 IDX R:3008-960-0 T:0-0 SUM R:3008-0 T:0-0-0-0 I:3008

2017-05-31T09:06:00.578Z cpu20:66097)ntg3:vmnic6:Ntg3PhyStateGet:426:link state changed (auxSTS: 0x871f)

2017-05-31T09:06:01.092Z cpu15:65629)NetPort: 1879: disabled port 0x3000006

2017-05-31T09:06:01.092Z cpu16:71357)NetSched: 628: vmnic6-0-tx: worldID = 71357 exits

2017-05-31T09:06:01.092Z cpu15:65629)Uplink: 9893: enabled port 0x3000006 with mac 00:0a:f7:a5:f4:32

2017-05-31T09:06:01.578Z cpu20:66097)ntg3:vmnic6:Ntg3PhyStateGet:404:link down

2017-05-31T09:06:02.428Z cpu12:66100)ntg3:vmnic7:Ntg3PhyStateGet:426:link state changed (auxSTS: 0x871f)

2017-05-31T09:06:03.092Z cpu15:65629)NetPort: 1879: disabled port 0x3000008

2017-05-31T09:06:03.092Z cpu18:71361)NetSched: 628: vmnic7-0-tx: worldID = 71361 exits

2017-05-31T09:06:03.092Z cpu15:65629)Uplink: 9893: enabled port 0x3000008 with mac 00:0a:f7:a5:f4:33

2017-05-31T09:06:03.428Z cpu12:66100)ntg3:vmnic7:Ntg3PhyStateGet:404:link down

2017-05-31T09:06:05.581Z cpu20:66097)ntg3:vmnic6:Ntg3PhyStateGet:426:link state changed (auxSTS: 0x871f)

2017-05-31T09:06:06.092Z cpu15:65629)NetPort: 1879: disabled port 0x3000006

2017-05-31T09:06:06.092Z cpu19:71368)NetSched: 628: vmnic6-0-tx: worldID = 71368 exits

2017-05-31T09:06:06.092Z cpu15:65629)Uplink: 9893: enabled port 0x3000006 with mac 00:0a:f7:a5:f4:32

2017-05-31T09:06:06.429Z cpu12:66100)ntg3:vmnic7:Ntg3PhyStateGet:426:link state changed (auxSTS: 0x871f)

2017-05-31T09:06:06.581Z cpu20:66097)ntg3:vmnic6:Ntg3PhyStateGet:404:link down

2017-05-31T09:06:06.911Z cpu23:66091)vmnic4:STB:3-122 IDX R:138-138-0 T:0-0 SUM R:138-0 T:0-0-0-0 I:121

2017-05-31T09:06:06.911Z cpu23:66094)vmnic5:STB:3-204 IDX R:3031-983-0 T:0-0 SUM R:3031-0 T:0-0-0-0 I:3020

2017-05-31T09:06:07.092Z cpu15:65629)NetPort: 1879: disabled port 0x3000008

2017-05-31T09:06:07.092Z cpu17:71375)NetSched: 628: vmnic7-0-tx: worldID = 71375 exits

2017-05-31T09:06:07.092Z cpu15:65629)Uplink: 9893: enabled port 0x3000008 with mac 00:0a:f7:a5:f4:33

2017-05-31T09:06:07.429Z cpu12:66100)ntg3:vmnic7:Ntg3PhyStateGet:404:link down

2017-05-31T09:06:07.429Z cpu12:66100)vmnic7:STB:3-176 IDX R:0-0-0 T:256-256 SUM R:0-0 T:1277-0-0-0 I:1200

2017-05-31T09:06:07.582Z cpu20:66097)vmnic6:STB:3-158 IDX R:0-0-0 T:253-253 SUM R:0-0 T:1267-0-0-0 I:1182

The network card goes up and down at startup, not every time. I can do some tests as the server is not yet in production.

0 Kudos
vertices
Contributor
Contributor

Any updates on this?  Just upgraded from 5.5 to 6.5 U1 on two Dell R620s with Broadcom 5719 and 5720 NICs.  Have the ntg3 4.1.2.0 driver and latest firmware (bc 1.39 ncsi 1.4.5.0).  Now each of them flap one of their interfaces often several times an hour.  It's always the same one,  slot 2 port 1 in one server and slot 3 port 1 in the other server. I opened ticket with Dell but they really didn't know and suggested I contact VMware, which I'm going to do soon.

Basically the port goes down, the iDrac sends me an email:

"The NIC Slot 3 Port 1 network link is down."

"Detailed Description: The network link is down. Either the network cable is not connected or the network device is not working.

Recommended Action: Verify that the network port is enabled and if the port has Activity/Speed LEDs, that they are lit. Check the network cable, network cable connections, and the attached network switch.

Message ID: NIC100"

Then about 5-20 seconds later I get another email

"The NIC Slot 3 Port 1 network link is started"

"Detailed Description: The transition from network link not started (down) to network link started (up) has been detected on the NIC controller port identified in the message.

Recommended Action: No response action is required.

Message ID: NIC101"

So I know these ports are flapping for some reason, Dell homed in on the driver. They feel the driver is causing the issue.  Then I found this thread.

0 Kudos
vertices
Contributor
Contributor

We fixed this for our R620 servers with Broadcom 5719/5720 NICs by disabling the ntg3 driver and enabling the tg3 driver.  We also updated tg3 to the latest version.  VMware support was well aware of this issue and stated they are seeing lots of problem with the ntg3 driver. All problems immediately ceased after changing the drivers.

0 Kudos
Kampfwurst12
Contributor
Contributor

have the same problem with the Dell R430. Hope for a fast solution

0 Kudos
goyer
Enthusiast
Enthusiast

Hi,

Chnb I send you an email because I always problems with 6.5 U1. Thanks for your response.

0 Kudos
Aschwarzer
Contributor
Contributor

I believe I am having a similar issue.

2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.732Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.733Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.734Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.734Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.734Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.735Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.736Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.737Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

2017-11-03T09:41:12.738Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.738Z cpu10:4535094)Tcpip_Vmk: 129: get connection pkt trace failed with error code 195887136

2017-11-03T09:41:12.738Z cpu10:4535094)Tcpip_Vmk: 96: get connection stats failed with error code 195887136

0 Kudos
Aschwarzer
Contributor
Contributor

I am hitting a similar issue.

My only workaround is to used intel ports only and don't use the broadcom ports for now until we can figure something out. Seems to be working fine if I use an intel port as the VM Network.

0 Kudos
anthonybailey
Contributor
Contributor

Has anyone tried the 4.1.3.0 driver, ESXi650-201712407-BG? https://kb.vmware.com/s/article/2151313

For what it's worth, I've got two hosts (DL380p Gen8) on the 4.1.0.0 driver, and they haven't had any issues with the ntg3 driver.

0 Kudos
Schaedle
Enthusiast
Enthusiast

Unfortunately the 4.1.3.0 driver, shipped with ESXi 6.7, has the same problem (on a HP Gen10 server) Smiley Sad

JeffDurga
Contributor
Contributor

6.5.0 Update 2 (Build 8294253)

Broadcom Corp. NetXtreme BCM5719 GB eth

-Firmware version: BC 1.45 ncsi 1.4.14.0

-Version 4.1.3.0

This is kernal logs from the firewall (only guest OS on host).

2018:11:09-12:02:33 vmxnet3 0000:13:00.0 eth2: intr type 3, mode 0, 9 vectors allocated

2018:11:09-12:02:33 vmxnet3 0000:13:00.0 eth2: NIC Link is Up 10000 Mbps

2018:11:09-12:02:33 vmxnet3 0000:13:00.0 eth2: resetting

2018:11:09-12:02:33 vmxnet3 0000:13:00.0 eth2: intr type 3, mode 0, 9 vectors allocated

Disabling the eth interface from VMware, and turning them back on fixed the issue. I see that 4.1.3.0 came out since the start of this thread. Has anyone had any luck on this?

0 Kudos