VMware Cloud Community
stonent
Contributor
Contributor

Pings dropped on BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller

We're having an issue with our new Dell R740 hosts running ESXi 7.0 (tried u1 and u2) that the ping on the management interface will drop periodically.  This will cause random alerts with our monitoring software thinking the host dropped out.

We're running BCM57416 over CAT6A.  We also have R740s from a previous order that have Intel 10G nics and they don't have this issue.  We've tried newer drivers and firmware for the BCM57416 cards but it doesn't resolve the issue.

The strange part is it only affects the host management IP,  I can ping one of the VMs running on the host with no issues.  In one case it sent about 870,000 continuous pings to one of our VMs running on that host and it didn't drop a single one, but the management IP still drops.

Also if I SSH into the host the SSH session will sometime stall and recover repeatedly.  

On one host I pulled the QLogic 10G cards from one of our old hosts and installed it in there (running the qfle3 driver that was famously prone to PSODs in 6.7 until a new driver came out) and the problem seemed to go away.  So it's just specific to the BCM7416 cards apparently and I'd rather not run 4 year old cards in my new servers as a workaround.  

Anyone else dealing with this issue and/or have a fix for it?

Reply
0 Kudos
1 Reply
MCEnforcer
Contributor
Contributor

We have a very similar issue with Dell R650 that have two of these dual-port cards installed.

We are running VMware ESXi, 7.0.3, 20328353 using the Dell/EMC build.

The symptom we are seeing is that with four ports enabled, then all ports will start flapping. With one port shut down then everything is fine. We also see the same symptom if we connect two servers but only use two ports per server.

We are using Cisco Nexus 5K switches and it is definitely not a spanning tree issue. It doesn't matter how the vSwitch configuration is done, its always a function of enabling the fourth NIC.

Case open with Dell as these are brand-new servers. They advised latest firmware and beta NIC drivers - no change.

We only have one make/model of 10-Gig switch but it seems that when connected to a 1-Gig switch - everything is fine.

Reply
0 Kudos