michaelcprice
Enthusiast
Enthusiast

Management vmk dropping packets when vmnics are teamed

Looking for help with the following issue:

Issue: When I team two or more physical nics on my ESXi 5.5 U1 host on the vswitch containing the management vmk and ping the host management vmk IP address from other devices it will drop packets.

Here is some information that I believe may be helpful to others in diagnosing my issue:

- The packet drop rate seems to be between 5-10% when pinging the mgmt vmk from another device on the network.

- When I ping from the ESXi host out to other network devices I get NO packet loss in my outbound pings.

- As soon as I remove the teamed NIC from the vSwitch and go back to using just one physical NIC the problem disappears.

- I am using "Route based on the originating virtual port ID" for my load balancing algorithm on the vSwitch and vmk port itself. No specific configuration settings have been done at the switch level in relationship to this.

- All the host's physical NICs are connected to a stack of 6 Dell PowerConnect M8024k switches. All 6 physical switches are cabled together and configured as one logical switch stack. M8024k switches are all I/O modules in the back of blade chassis.

- All physical NICs are dual-port Broadcom 57810 CNAs. There are total of 6 ports for each host spread across 3 CNAs.

- Al physical NICs are running NPAR to split each physical NIC port out into 4 virtual NICs. This results in a total of 24 vmnics for the host. (6 ports * 4 NPAR functions per port = 24)

- All NIC ports have a native untagged VLAN that the mgmt vmk uses and also carry additional tagged VLANs. Connecting ports on the M8024k switches run in General (hybrid) mode with the mgmt vmk vlan as the untagged vlan.

- Server hardware is Dell PowerEdge M620 Blade server.

Any help would be greatly appreciated. Have seen this issue on multiple hosts running off this particular blade chassis. On other hosts the problem seemed to magically disappear at some point but I cant seem to pin down what configuration change I made on them that resolved the issue for the other hosts.

Thanks,

Mike

0 Kudos
24 Replies
Titanomachia
Enthusiast
Enthusiast

It's best practice to have the management nics as active and backup rather than active/active.

0 Kudos
michaelcprice
Enthusiast
Enthusiast

I have never used active/passive on the mgmt vmk for any setup I have done in the past and I have not had this issue with other deployments. It seems unlikely to me that that this would be the root cause of the issue. I can add some additional information though:

- Removed the first NIC from the team and left the vswitch with only the 2nd NIC attached to the vSwitch. Still continued to have the problem.

     - Powercycled the system with only the 2nd NIC attached to the vSwitch. Still continue to have the problem.

- Readded the first NIC to the vSwitch and removed the 2 NIC from the vSwitch. Problem goes away. Seems to indicate something with the second NIC maybe? I checked the port configs on the switch for the 2 ports that connect to these NICs. They are identical. This leads makes me think that its not a switch config issue because both ports are configured identically. Further, I had this same issue on other blades in this chassis in the past and their ports are also configured the same, yet at some point the issue magically cleared up for them. This makes me think its not a switch config or hardware issue because somehow the issue has resolved itself on other blades that exhibited the same behaviour in the past.

0 Kudos
Titanomachia
Enthusiast
Enthusiast

What's the output of esxcli network nic list

0 Kudos
michaelcprice
Enthusiast
Enthusiast

~ # esxcli network nic list

Name     PCI Device     Driver  Link  Speed  Duplex  MAC Address         MTU  Description                                                                       

-------  -------------  ------  ----  -----  ------  -----------------  ----  ------------------------------------------------------                             -----------------------

vmnic0   0000:001:00.0  bnx2x   Up    10000  Full    90:b1:1c:be:f5:e9  1500  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic1   0000:001:00.1  bnx2x   Up    10000  Full    90:b1:1c:be:f5:ec  1500  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic10  0000:004:00.2  bnx2x   Up    10000  Full    90:b1:1c:be:fc:85  1500  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic11  0000:004:00.3  bnx2x   Up    10000  Full    90:b1:1c:be:fc:88  1500  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic12  0000:004:00.4  bnx2x   Up    10000  Full    90:b1:1c:be:fc:8b  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic13  0000:004:00.5  bnx2x   Up    10000  Full    90:b1:1c:be:fc:8e  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic14  0000:004:00.6  bnx2x   Up    10000  Full    90:b1:1c:be:fc:91  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic15  0000:004:00.7  bnx2x   Up    10000  Full    90:b1:1c:be:fc:94  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic16  0000:003:00.0  bnx2x   Up    10000  Full    90:b1:1c:be:f5:ed  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic17  0000:003:00.1  bnx2x   Up    10000  Full    90:b1:1c:be:f5:f0  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic18  0000:003:00.2  bnx2x   Up    10000  Full    90:b1:1c:be:fc:49  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic19  0000:003:00.3  bnx2x   Up    10000  Full    90:b1:1c:be:fc:4c  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic2   0000:001:00.2  bnx2x   Up    10000  Full    90:b1:1c:be:fc:0d  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic20  0000:003:00.4  bnx2x   Up    10000  Full    90:b1:1c:be:fc:4f  1500  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic21  0000:003:00.5  bnx2x   Up    10000  Full    90:b1:1c:be:fc:52  1500  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic22  0000:003:00.6  bnx2x   Up    10000  Full    90:b1:1c:be:fc:55  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic23  0000:003:00.7  bnx2x   Up    10000  Full    90:b1:1c:be:fc:58  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic3   0000:001:00.3  bnx2x   Up    10000  Full    90:b1:1c:be:fc:10  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic4   0000:001:00.4  bnx2x   Up    10000  Full    90:b1:1c:be:fc:13  1500  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic5   0000:001:00.5  bnx2x   Up    10000  Full    90:b1:1c:be:fc:16  1500  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic6   0000:001:00.6  bnx2x   Up    10000  Full    90:b1:1c:be:fc:19  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic7   0000:001:00.7  bnx2x   Up    10000  Full    90:b1:1c:be:fc:1c  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic8   0000:004:00.0  bnx2x   Up    10000  Full    90:b1:1c:be:f5:f1  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

vmnic9   0000:004:00.1  bnx2x   Up    10000  Full    90:b1:1c:be:f5:f4  9000  Broadcom Corporation NetXtreme II BCM57810 10 Gigabit                              Ethernet Multi Function

0 Kudos
Titanomachia
Enthusiast
Enthusiast

It must be some setting with the NIC.

Do you notice anything different if you run the below command on the working nic and the "faulty" one?

Esxcli network nic get -n "nic name"

Also network nic stats get -n

0 Kudos
michaelcprice
Enthusiast
Enthusiast

Both are identical (see below):

~ # esxcli network nic get -n vmnic0

   Advertised Auto Negotiation: true

   Advertised Link Modes: 1000baseT/Full, 10000baseT/Full

   Auto Negotiation: true

   Cable Type: FIBRE

   Current Message Level: 0

   Driver Info:

         Bus Info: 0000:01:00.0

         Driver: bnx2x

         Firmware Version: FFV7.8.53 bc 7.8.82

         Version: 1.78.80.v55.3

   Link Detected: true

   Link Status: Up

   Name: vmnic0

   PHYAddress: 1

   Pause Autonegotiate: true

   Pause RX: true

   Pause TX: true

   Supported Ports: FIBRE

   Supports Auto Negotiation: true

   Supports Pause: true

   Supports Wakeon: true

   Transceiver: internal

   Wakeon: MagicPacket(tm)

~ # esxcli network nic get -n vmnic1

   Advertised Auto Negotiation: true

   Advertised Link Modes: 1000baseT/Full, 10000baseT/Full

   Auto Negotiation: true

   Cable Type: FIBRE

   Current Message Level: 0

   Driver Info:

         Bus Info: 0000:01:00.1

         Driver: bnx2x

         Firmware Version: FFV7.8.53 bc 7.8.82

         Version: 1.78.80.v55.3

   Link Detected: true

   Link Status: Up

   Name: vmnic1

   PHYAddress: 1

   Pause Autonegotiate: true

   Pause RX: true

   Pause TX: true

   Supported Ports: FIBRE

   Supports Auto Negotiation: true

   Supports Pause: true

   Supports Wakeon: true

   Transceiver: internal

   Wakeon: MagicPacket(tm)

0 Kudos
Titanomachia
Enthusiast
Enthusiast

Will you run the stats command too, hopefully this will show a bit more detail on errors and narrow it down.

0 Kudos
michaelcprice
Enthusiast
Enthusiast

Sorry but can you provide the full CLI command please? I'm not sure what you meant when you said "stats"

0 Kudos
Titanomachia
Enthusiast
Enthusiast

Sure, I put it above on a previous post.


It's esxcli network nic stats get -n "nic name"

0 Kudos
michaelcprice
Enthusiast
Enthusiast

Things look pretty normal to me:

~ # esxcli network nic stats get -n vmnic0

NIC statistics for vmnic0

   Packets received: 9611

   Packets sent: 0

   Bytes received: 718312

   Bytes sent: 0

   Receive packets dropped: 0

   Transmit packets dropped: 0

   Total receive errors: 0

   Receive length errors: 0

   Receive over errors: 0

   Receive CRC errors: 0

   Receive frame errors: 0

   Receive FIFO errors: 0

   Receive missed errors: 0

   Total transmit errors: 0

   Transmit aborted errors: 0

   Transmit carrier errors: 0

   Transmit FIFO errors: 0

   Transmit heartbeat errors: 0

   Transmit window errors: 0

~ # esxcli network nic stats get -n vmnic1

NIC statistics for vmnic1

   Packets received: 11821

   Packets sent: 6278

   Bytes received: 1504285

   Bytes sent: 7820509

   Receive packets dropped: 0

   Transmit packets dropped: 0

   Total receive errors: 0

   Receive length errors: 0

   Receive over errors: 0

   Receive CRC errors: 0

   Receive frame errors: 0

   Receive FIFO errors: 0

   Receive missed errors: 0

   Total transmit errors: 0

   Transmit aborted errors: 0

   Transmit carrier errors: 0

   Transmit FIFO errors: 0

   Transmit heartbeat errors: 0

   Transmit window errors: 0

0 Kudos
Titanomachia
Enthusiast
Enthusiast

Vmnic0 hasn't sent a single packet. Not saying this is a cause but it seems a bit odd considering you ran tests using this nic alone.

0 Kudos
michaelcprice
Enthusiast
Enthusiast

I dont consider that traffic pattern odd because of the selected load balancing method.

I think I have resolved the issue though. I deleted and recreated the management vmk and the problem seems to have cleared. I am not sure why this has resolved the issue but but I think it might have something to do with the Dell FlexAddressing feature on the blade chassis. When I first looked at the MAC assigned to vmk0 before deleting it, it was the FlexAddressing assigned MAC used by the vmnic0 adapter (90:b1:xx:xx:f5:a8), which always worked correctly. When I removed vmnic0 and told it to use vmnic1 or told it to use vmnic0 and vmnic1 for vmk0 then the problems seemed to occur. When I deleted vmk0 and recreated it, the new vmk created itself with a 00:50:xx:xx:xx:xx based MAC. Perhaps by using the FlexAddressing assigned MAC originally it created some type of duplicate MAC issue? I don't see any signs of that in the vobd.log though so I'm still a bit perplexed by this all.

While I'm happy it's resolved I'd to better understand the what was really happening here. I have a feeling this may have been self-resolved on the other servers in the past by applying a host profile which caused them to delete and then recreate the vmk0 adapter, there-by fixing the issue in the same way for them. In the case of this server I hadn't applied any host profile to it so the vmk0 was still in its original default configuration.

0 Kudos
Titanomachia
Enthusiast
Enthusiast

I thought you did a test using just Vmnic0 to both send and receive traffic by removing vmnic1 from the equation but I may have just made that up.

I was going to suggest a possible MAC related issue but then I'd expect 100% drop if there was stale data in the table but I'm no network guru. Next stop would've been Wireshark but it's resolved.

0 Kudos
MFGCORP
Contributor
Contributor

Hi Did you ever resolve this issue, I have a Dell VRTX with 2 blade server M620's which use the same nics, I have the same problem, even if both are active on the VMK or if one is in standby mode.

I have an open call with Dell VMware team but need an answer quick.

Thanks in advance.

0 Kudos
michaelcprice
Enthusiast
Enthusiast

Yes I've continue to have this issue crop up. The only reliable fix seems to be to put one of the NICs used for the mgmt vmk to Standby in the NIC teaming section for the vmk interface. I dont consider that a real resolution because they should work even with both placed into the active section.

Mike

0 Kudos
MFGCORP
Contributor
Contributor

      • Please enter replies above this line *** You are receiving this message because a ticket was created for your company. Replies to this email will be added as a note onto this ticket. Ticket #T20150306.0037: New message: "Management vmk dropping packets when vmnics are teamed" , Thank you for contacting us. A service ticket (#T20150306.0037) has been created for MFG UK Limited. We will attend to your ticket as soon as possible. The details of the ticket are listed below. When replying to this ticket, please ensure that the ticket number is included in the email subject line. Ticket #: T20150306.0037 Created on 06-03-15 17:05 (GMT) by Autotask Administrator Title: New message: "Management vmk dropping packets when vmnics are teamed" Description: You can access your service ticket via our client portal by clicking the following link: . If you do not have access to the client portal and would like to use it, please let us know. Sincerely, Your Support Team If you are a staff member, you can access this ticket by clicking the following link: .

0 Kudos
Chandk
Contributor
Contributor

Hi Mike,

                I have come across the exact same issue you are facing. The resolution for this was to recreate the management vswitch. When you notice the management vmk adapter, it will have a Dell Mac address not a vmware mac address, which is normal. But we have seen issues when the mac address was a Dell mac address. The steps for resolution are

Migrate all portgroups from current mgmt vswitch to another vswitch

Create a new mgmt port on the new vswitch

Connect to vsphere using new mgmt port

Remove the old vswitch0

Recreate vswitch 0

Create mgmt port on the new vswitch0

migrate all the portgroups back to vswitch0

I can't logically explain how this will fix but it has fixed for all cases i have encountered.

Thanks,

0 Kudos
MFGCORP
Contributor
Contributor

      • Please enter replies above this line *** You are receiving this message because a ticket was created for your company. Replies to this email will be added as a note onto this ticket. Ticket #T20150311.0055: New message: "Management vmk dropping packets when vmnics are teamed" , Thank you for contacting us. A service ticket (#T20150311.0055) has been created for MFG UK Limited. We will attend to your ticket as soon as possible. The details of the ticket are listed below. When replying to this ticket, please ensure that the ticket number is included in the email subject line. Ticket #: T20150311.0055 Created on 11-03-15 21:30 (GMT) by Autotask Administrator Title: New message: "Management vmk dropping packets when vmnics are teamed" Description: You can access your service ticket via our client portal by clicking the following link: . If you do not have access to the client portal and would like to use it, please let us know. Sincerely, Your Support Team If you are a staff member, you can access this ticket by clicking the following link: .

0 Kudos
ricardo22
Contributor
Contributor

You are using VSS or VDD? LACP uses?

0 Kudos