Jmorrison_GY
Contributor
Contributor

Ports Flapping every 4 Hours

I have a 2 four host HP servers running ESXI with dual 10gbe nic cards hp 535t. They are connected to to a stack of cisco 9300s with 10gbe copper ports, and are in separate rooms. The server I'm having issues with houses hosts 1-4. The 10gbe connections to the hosts flap every 4 hours. I tried wiping the switches and tried every possible troubleshooting step with them and they always flap on the exact schedule. I hooked up a different stack of 9300s with 1gb SFPs and had zero issues. I moved the stack to the second room and it had zero issues when connected to hosts 5-8. so it appears to be either an issue with with HP hardware or Vmware. The ilo ports never flap and they are the only 1gb connections. There is nothing in any logs that I can find that sheds light on the issue. The ports just go down and come back a few seconds later. I currently have hosts 1-3 down as I was trying to isolate to an individual host. So in the attached picture you'll just see the connections for host four. I also tried capturing traffic using span from one of the host connections and it didn't show any issues. The ports just go down and come back up. Any help with this issue would be greatly appreciated. 

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

Hello @Jmorrison_GY,

So, as per my understanding, you are using 8 ESXi, split in two chassis? Could be possible if you do a silly diagram to help me understand what is your current topology?

Reply
0 Kudos
Jmorrison_GY
Contributor
Contributor

Jmorrison_GY_0-1680123034481.png

Only the 10Gb ports flap. They don't flap when set to 1Gb, but they only flap every four hours. The switch stack didn't flap when running 10Gb when I temporarily moved to second room with hosts 5-8.

Reply
0 Kudos
Jmorrison_GY
Contributor
Contributor

@Lalegre I made a typo each host has 5 connections to the switch stack 2 for esxi, 2 for vsan, and 1 for ILO.

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

@Jmorrison_GY,

Got it now, but the Host 5-8 on the second room do not flap right? So the issue is only with the 1-4 in the first room isn't it?

Reply
0 Kudos
Jmorrison_GY
Contributor
Contributor

@Lalegre Yes the issue is only in the first room. 

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

@Jmorrison_GY,

I recommend you to compare your show run output from the 9300 and the switches from the chassis to see if there is any difference between the rooms.

Now from the ESXi perspective, is there anything different in the configuration? e.g VDS?

Reply
0 Kudos
Jmorrison_GY
Contributor
Contributor

@Lalegre I've compared the run configs on the switches and the configs on the servers. I cant find anything that is different.

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

Could you please attach the logs from /var/log/vmkernel.log from a ESXi (1 to 4)?

Reply
0 Kudos
Jmorrison_GY
Contributor
Contributor

Here are the requested logs.

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

Hey @Jmorrison_GY,

Sorry for the delay.

I was checking the logs and I see there are a few warnings from the driver your NIC is using: bnxtnet. Maybe there is an issue with the driver version you are using. Could you please connect to Host 1 and Host 5 and run:

esxcfg-nics -l

esxcli software vib list

Copy both outputs here.

Tags (1)
Reply
0 Kudos
Jmorrison_GY
Contributor
Contributor

@Lalegre Those alarms didn't appear until recently. I updated the firmware on hosts 1-4 to see if it would fix the issue i was having. 

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

Let's see what happens, if it still fails, compare the driver version with the second command I shared with you.

Reply
0 Kudos
Jmorrison_GY
Contributor
Contributor

@Lalegre I also just noticed that 1 connection to vmnic3 on host 3  never flaps. There is nothing different I can find in the settings. Its the only 10GBe connection to hosts 1-4 that doesn't flap.  

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

That is interesting because is the same NIC, however I found the first difference:

Name PCI Driver Link Speed Duplex MAC Address MTU Description 
vmnic0 0000:b2:00.0 bnxtnet Up 10000Mbps Full 5c:ba:2c:6c:f8:80 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic1 0000:b2:00.1 bnxtnet Up 10000Mbps Full 5c:ba:2c:6c:f8:88 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic2 0000:61:00.0 bnxtnet Up 10000Mbps Full 5c:ba:2c:70:53:90 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic3 0000:61:00.1 bnxtnet Up 10000Mbps Full 5c:ba:2c:70:53:98 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
Name Version Vendor Acceptance Level Install Date
----------------------------- ------------------------------------ ------ ---------------- ------------
bnxtnet 222.0.118.0-1OEM.700.1.0.15843807
Name PCI Driver Link Speed Duplex MAC Address MTU Description 
vmnic0 0000:b2:00.0 bnxtnet Up 10000Mbps Full 5c:ba:2c:6d:0f:60 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic1 0000:b2:00.1 bnxtnet Up 10000Mbps Full 5c:ba:2c:6d:0f:68 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic2 0000:61:00.0 bnxtnet Up 10000Mbps Full 5c:ba:2c:6f:a0:40 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic3 0000:61:00.1 bnxtnet Up 10000Mbps Full 5c:ba:2c:6f:a0:48 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
Name Version Vendor Acceptance Level Install Date
----------------------------- ------------------------------------ ------ ---------------- ------------
bnxtnet 223.0.152.0-1OEM.700.1.0.15843807 BCM VMwareCertified 2023-03-29

 The drivers you are using are different on each of the hosts, both of them are supported but with a proper firmware version as shown here: https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=43270&deviceCa...

Version 222.0.118.0-1OEM.700.1.0.15843807 is what you are using on Host 5-8, try to download that same one from here and perform the downgrade on your ESXi 1-4: https://customerconnect.vmware.com/en/downloads/details?downloadGroup=DT-ESXI70-BROADCOM-BNXT-NET-RO...

Here the steps to perform the downgrade which is essentially uninstall and install again: https://kb.vmware.com/s/article/2079279