- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ports Flapping every 4 Hours
I have a 2 four host HP servers running ESXI with dual 10gbe nic cards hp 535t. They are connected to to a stack of cisco 9300s with 10gbe copper ports, and are in separate rooms. The server I'm having issues with houses hosts 1-4. The 10gbe connections to the hosts flap every 4 hours. I tried wiping the switches and tried every possible troubleshooting step with them and they always flap on the exact schedule. I hooked up a different stack of 9300s with 1gb SFPs and had zero issues. I moved the stack to the second room and it had zero issues when connected to hosts 5-8. so it appears to be either an issue with with HP hardware or Vmware. The ilo ports never flap and they are the only 1gb connections. There is nothing in any logs that I can find that sheds light on the issue. The ports just go down and come back a few seconds later. I currently have hosts 1-3 down as I was trying to isolate to an individual host. So in the attached picture you'll just see the connections for host four. I also tried capturing traffic using span from one of the host connections and it didn't show any issues. The ports just go down and come back up. Any help with this issue would be greatly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Jmorrison_GY,
So, as per my understanding, you are using 8 ESXi, split in two chassis? Could be possible if you do a silly diagram to help me understand what is your current topology?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Only the 10Gb ports flap. They don't flap when set to 1Gb, but they only flap every four hours. The switch stack didn't flap when running 10Gb when I temporarily moved to second room with hosts 5-8.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Lalegre I made a typo each host has 5 connections to the switch stack 2 for esxi, 2 for vsan, and 1 for ILO.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Got it now, but the Host 5-8 on the second room do not flap right? So the issue is only with the 1-4 in the first room isn't it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Lalegre Yes the issue is only in the first room.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recommend you to compare your show run output from the 9300 and the switches from the chassis to see if there is any difference between the rooms.
Now from the ESXi perspective, is there anything different in the configuration? e.g VDS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Lalegre I've compared the run configs on the switches and the configs on the servers. I cant find anything that is different.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please attach the logs from /var/log/vmkernel.log from a ESXi (1 to 4)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are the requested logs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey @Jmorrison_GY,
Sorry for the delay.
I was checking the logs and I see there are a few warnings from the driver your NIC is using: bnxtnet. Maybe there is an issue with the driver version you are using. Could you please connect to Host 1 and Host 5 and run:
esxcfg-nics -l
esxcli software vib list
Copy both outputs here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Lalegre Those alarms didn't appear until recently. I updated the firmware on hosts 1-4 to see if it would fix the issue i was having.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's see what happens, if it still fails, compare the driver version with the second command I shared with you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Lalegre I also just noticed that 1 connection to vmnic3 on host 3 never flaps. There is nothing different I can find in the settings. Its the only 10GBe connection to hosts 1-4 that doesn't flap.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is interesting because is the same NIC, however I found the first difference:
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:b2:00.0 bnxtnet Up 10000Mbps Full 5c:ba:2c:6c:f8:80 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic1 0000:b2:00.1 bnxtnet Up 10000Mbps Full 5c:ba:2c:6c:f8:88 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic2 0000:61:00.0 bnxtnet Up 10000Mbps Full 5c:ba:2c:70:53:90 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic3 0000:61:00.1 bnxtnet Up 10000Mbps Full 5c:ba:2c:70:53:98 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
Name Version Vendor Acceptance Level Install Date
----------------------------- ------------------------------------ ------ ---------------- ------------
bnxtnet 222.0.118.0-1OEM.700.1.0.15843807
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:b2:00.0 bnxtnet Up 10000Mbps Full 5c:ba:2c:6d:0f:60 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic1 0000:b2:00.1 bnxtnet Up 10000Mbps Full 5c:ba:2c:6d:0f:68 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic2 0000:61:00.0 bnxtnet Up 10000Mbps Full 5c:ba:2c:6f:a0:40 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
vmnic3 0000:61:00.1 bnxtnet Up 10000Mbps Full 5c:ba:2c:6f:a0:48 9000 Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller
Name Version Vendor Acceptance Level Install Date
----------------------------- ------------------------------------ ------ ---------------- ------------
bnxtnet 223.0.152.0-1OEM.700.1.0.15843807 BCM VMwareCertified 2023-03-29
The drivers you are using are different on each of the hosts, both of them are supported but with a proper firmware version as shown here: https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=43270&deviceCa...
Version 222.0.118.0-1OEM.700.1.0.15843807 is what you are using on Host 5-8, try to download that same one from here and perform the downgrade on your ESXi 1-4: https://customerconnect.vmware.com/en/downloads/details?downloadGroup=DT-ESXI70-BROADCOM-BNXT-NET-RO...
Here the steps to perform the downgrade which is essentially uninstall and install again: https://kb.vmware.com/s/article/2079279