Looking for help with the following issue:
Issue: When I team two or more physical nics on my ESXi 5.5 U1 host on the vswitch containing the management vmk and ping the host management vmk IP address from other devices it will drop packets.
Here is some information that I believe may be helpful to others in diagnosing my issue:
- The packet drop rate seems to be between 5-10% when pinging the mgmt vmk from another device on the network.
- When I ping from the ESXi host out to other network devices I get NO packet loss in my outbound pings.
- As soon as I remove the teamed NIC from the vSwitch and go back to using just one physical NIC the problem disappears.
- I am using "Route based on the originating virtual port ID" for my load balancing algorithm on the vSwitch and vmk port itself. No specific configuration settings have been done at the switch level in relationship to this.
- All the host's physical NICs are connected to a stack of 6 Dell PowerConnect M8024k switches. All 6 physical switches are cabled together and configured as one logical switch stack. M8024k switches are all I/O modules in the back of blade chassis.
- All physical NICs are dual-port Broadcom 57810 CNAs. There are total of 6 ports for each host spread across 3 CNAs.
- Al physical NICs are running NPAR to split each physical NIC port out into 4 virtual NICs. This results in a total of 24 vmnics for the host. (6 ports * 4 NPAR functions per port = 24)
- All NIC ports have a native untagged VLAN that the mgmt vmk uses and also carry additional tagged VLANs. Connecting ports on the M8024k switches run in General (hybrid) mode with the mgmt vmk vlan as the untagged vlan.
- Server hardware is Dell PowerEdge M620 Blade server.
Any help would be greatly appreciated. Have seen this issue on multiple hosts running off this particular blade chassis. On other hosts the problem seemed to magically disappear at some point but I cant seem to pin down what configuration change I made on them that resolved the issue for the other hosts.
Thanks,
Mike
It's best practice to have the management nics as active and backup rather than active/active.
I have never used active/passive on the mgmt vmk for any setup I have done in the past and I have not had this issue with other deployments. It seems unlikely to me that that this would be the root cause of the issue. I can add some additional information though:
- Removed the first NIC from the team and left the vswitch with only the 2nd NIC attached to the vSwitch. Still continued to have the problem.
- Powercycled the system with only the 2nd NIC attached to the vSwitch. Still continue to have the problem.
- Readded the first NIC to the vSwitch and removed the 2 NIC from the vSwitch. Problem goes away. Seems to indicate something with the second NIC maybe? I checked the port configs on the switch for the 2 ports that connect to these NICs. They are identical. This leads makes me think that its not a switch config issue because both ports are configured identically. Further, I had this same issue on other blades in this chassis in the past and their ports are also configured the same, yet at some point the issue magically cleared up for them. This makes me think its not a switch config or hardware issue because somehow the issue has resolved itself on other blades that exhibited the same behaviour in the past.
What's the output of esxcli network nic list
~ # esxcli network nic list
Name PCI Device Driver Link Speed Duplex MAC Address MTU Description
------- ------------- ------ ---- ----- ------ ----------------- ---- ------------------------------------------------------ -----------------------
vmnic0 0000:001:00.0 bnx2x Up 10000 Full 90:b1:1c:be:f5:e9 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic1 0000:001:00.1 bnx2x Up 10000 Full 90:b1:1c:be:f5:ec 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic10 0000:004:00.2 bnx2x Up 10000 Full 90:b1:1c:be:fc:85 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic11 0000:004:00.3 bnx2x Up 10000 Full 90:b1:1c:be:fc:88 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic12 0000:004:00.4 bnx2x Up 10000 Full 90:b1:1c:be:fc:8b 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic13 0000:004:00.5 bnx2x Up 10000 Full 90:b1:1c:be:fc:8e 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic14 0000:004:00.6 bnx2x Up 10000 Full 90:b1:1c:be:fc:91 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic15 0000:004:00.7 bnx2x Up 10000 Full 90:b1:1c:be:fc:94 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic16 0000:003:00.0 bnx2x Up 10000 Full 90:b1:1c:be:f5:ed 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic17 0000:003:00.1 bnx2x Up 10000 Full 90:b1:1c:be:f5:f0 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic18 0000:003:00.2 bnx2x Up 10000 Full 90:b1:1c:be:fc:49 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic19 0000:003:00.3 bnx2x Up 10000 Full 90:b1:1c:be:fc:4c 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic2 0000:001:00.2 bnx2x Up 10000 Full 90:b1:1c:be:fc:0d 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic20 0000:003:00.4 bnx2x Up 10000 Full 90:b1:1c:be:fc:4f 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic21 0000:003:00.5 bnx2x Up 10000 Full 90:b1:1c:be:fc:52 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic22 0000:003:00.6 bnx2x Up 10000 Full 90:b1:1c:be:fc:55 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic23 0000:003:00.7 bnx2x Up 10000 Full 90:b1:1c:be:fc:58 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic3 0000:001:00.3 bnx2x Up 10000 Full 90:b1:1c:be:fc:10 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic4 0000:001:00.4 bnx2x Up 10000 Full 90:b1:1c:be:fc:13 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic5 0000:001:00.5 bnx2x Up 10000 Full 90:b1:1c:be:fc:16 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic6 0000:001:00.6 bnx2x Up 10000 Full 90:b1:1c:be:fc:19 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic7 0000:001:00.7 bnx2x Up 10000 Full 90:b1:1c:be:fc:1c 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic8 0000:004:00.0 bnx2x Up 10000 Full 90:b1:1c:be:f5:f1 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
vmnic9 0000:004:00.1 bnx2x Up 10000 Full 90:b1:1c:be:f5:f4 9000 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet Multi Function
It must be some setting with the NIC.
Do you notice anything different if you run the below command on the working nic and the "faulty" one?
Esxcli network nic get -n "nic name"
Also network nic stats get -n
Both are identical (see below):
~ # esxcli network nic get -n vmnic0
Advertised Auto Negotiation: true
Advertised Link Modes: 1000baseT/Full, 10000baseT/Full
Auto Negotiation: true
Cable Type: FIBRE
Current Message Level: 0
Driver Info:
Bus Info: 0000:01:00.0
Driver: bnx2x
Firmware Version: FFV7.8.53 bc 7.8.82
Version: 1.78.80.v55.3
Link Detected: true
Link Status: Up
Name: vmnic0
PHYAddress: 1
Pause Autonegotiate: true
Pause RX: true
Pause TX: true
Supported Ports: FIBRE
Supports Auto Negotiation: true
Supports Pause: true
Supports Wakeon: true
Transceiver: internal
Wakeon: MagicPacket(tm)
~ # esxcli network nic get -n vmnic1
Advertised Auto Negotiation: true
Advertised Link Modes: 1000baseT/Full, 10000baseT/Full
Auto Negotiation: true
Cable Type: FIBRE
Current Message Level: 0
Driver Info:
Bus Info: 0000:01:00.1
Driver: bnx2x
Firmware Version: FFV7.8.53 bc 7.8.82
Version: 1.78.80.v55.3
Link Detected: true
Link Status: Up
Name: vmnic1
PHYAddress: 1
Pause Autonegotiate: true
Pause RX: true
Pause TX: true
Supported Ports: FIBRE
Supports Auto Negotiation: true
Supports Pause: true
Supports Wakeon: true
Transceiver: internal
Wakeon: MagicPacket(tm)
Will you run the stats command too, hopefully this will show a bit more detail on errors and narrow it down.
Sorry but can you provide the full CLI command please? I'm not sure what you meant when you said "stats"
Sure, I put it above on a previous post.
It's esxcli network nic stats get -n "nic name"
Things look pretty normal to me:
~ # esxcli network nic stats get -n vmnic0
NIC statistics for vmnic0
Packets received: 9611
Packets sent: 0
Bytes received: 718312
Bytes sent: 0
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors: 0
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
~ # esxcli network nic stats get -n vmnic1
NIC statistics for vmnic1
Packets received: 11821
Packets sent: 6278
Bytes received: 1504285
Bytes sent: 7820509
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors: 0
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
Vmnic0 hasn't sent a single packet. Not saying this is a cause but it seems a bit odd considering you ran tests using this nic alone.
I dont consider that traffic pattern odd because of the selected load balancing method.
I think I have resolved the issue though. I deleted and recreated the management vmk and the problem seems to have cleared. I am not sure why this has resolved the issue but but I think it might have something to do with the Dell FlexAddressing feature on the blade chassis. When I first looked at the MAC assigned to vmk0 before deleting it, it was the FlexAddressing assigned MAC used by the vmnic0 adapter (90:b1:xx:xx:f5:a8), which always worked correctly. When I removed vmnic0 and told it to use vmnic1 or told it to use vmnic0 and vmnic1 for vmk0 then the problems seemed to occur. When I deleted vmk0 and recreated it, the new vmk created itself with a 00:50:xx:xx:xx:xx based MAC. Perhaps by using the FlexAddressing assigned MAC originally it created some type of duplicate MAC issue? I don't see any signs of that in the vobd.log though so I'm still a bit perplexed by this all.
While I'm happy it's resolved I'd to better understand the what was really happening here. I have a feeling this may have been self-resolved on the other servers in the past by applying a host profile which caused them to delete and then recreate the vmk0 adapter, there-by fixing the issue in the same way for them. In the case of this server I hadn't applied any host profile to it so the vmk0 was still in its original default configuration.
I thought you did a test using just Vmnic0 to both send and receive traffic by removing vmnic1 from the equation but I may have just made that up.
I was going to suggest a possible MAC related issue but then I'd expect 100% drop if there was stale data in the table but I'm no network guru. Next stop would've been Wireshark but it's resolved.
Hi Did you ever resolve this issue, I have a Dell VRTX with 2 blade server M620's which use the same nics, I have the same problem, even if both are active on the VMK or if one is in standby mode.
I have an open call with Dell VMware team but need an answer quick.
Thanks in advance.
Yes I've continue to have this issue crop up. The only reliable fix seems to be to put one of the NICs used for the mgmt vmk to Standby in the NIC teaming section for the vmk interface. I dont consider that a real resolution because they should work even with both placed into the active section.
Mike
Please enter replies above this line *** You are receiving this message because a ticket was created for your company. Replies to this email will be added as a note onto this ticket. Ticket #T20150306.0037: New message: "Management vmk dropping packets when vmnics are teamed" , Thank you for contacting us. A service ticket (#T20150306.0037) has been created for MFG UK Limited. We will attend to your ticket as soon as possible. The details of the ticket are listed below. When replying to this ticket, please ensure that the ticket number is included in the email subject line. Ticket #: T20150306.0037 Created on 06-03-15 17:05 (GMT) by Autotask Administrator Title: New message: "Management vmk dropping packets when vmnics are teamed" Description: You can access your service ticket via our client portal by clicking the following link: . If you do not have access to the client portal and would like to use it, please let us know. Sincerely, Your Support Team If you are a staff member, you can access this ticket by clicking the following link: .
Hi Mike,
I have come across the exact same issue you are facing. The resolution for this was to recreate the management vswitch. When you notice the management vmk adapter, it will have a Dell Mac address not a vmware mac address, which is normal. But we have seen issues when the mac address was a Dell mac address. The steps for resolution are
Migrate all portgroups from current mgmt vswitch to another vswitch
Create a new mgmt port on the new vswitch
Connect to vsphere using new mgmt port
Remove the old vswitch0
Recreate vswitch 0
Create mgmt port on the new vswitch0
migrate all the portgroups back to vswitch0
I can't logically explain how this will fix but it has fixed for all cases i have encountered.
Thanks,
Please enter replies above this line *** You are receiving this message because a ticket was created for your company. Replies to this email will be added as a note onto this ticket. Ticket #T20150311.0055: New message: "Management vmk dropping packets when vmnics are teamed" , Thank you for contacting us. A service ticket (#T20150311.0055) has been created for MFG UK Limited. We will attend to your ticket as soon as possible. The details of the ticket are listed below. When replying to this ticket, please ensure that the ticket number is included in the email subject line. Ticket #: T20150311.0055 Created on 11-03-15 21:30 (GMT) by Autotask Administrator Title: New message: "Management vmk dropping packets when vmnics are teamed" Description: You can access your service ticket via our client portal by clicking the following link: . If you do not have access to the client portal and would like to use it, please let us know. Sincerely, Your Support Team If you are a staff member, you can access this ticket by clicking the following link: .
You are using VSS or VDD? LACP uses?