VMware Cloud Community
ryanwilliams83
Contributor
Contributor
Jump to solution

NIC Teaming causing ESXi to re-transmit ethernet frames received.

I have a HP Proliant DL380 G5 running ESXi 5.1.0 connected to a Cisco 3750 Switch Stack.

While performing a tcpdump on an unrelated host I discovered that it was being bombarded with ethernet frames destined for the MAC address of a VM located on a host on the other side of the data centre. After some brief troubleshooting I discovered that I was able to stop these erroneous frames by disabling NIC teaming on my VMWare host.

I believe that in my situation when ESXi is configured to use two network adapters that it takes all frames received on vmnic0 and re-transmits them on vmnic1 and vice-vesa. I have experienced these symptoms when ESXi is configured for Fail-over with one adapter active and one adapter in standby and I have also experienced the same symptoms in the load-balancing configuration described below.


Symptoms: A short time after enabling load-balancing, all Ethernet Frames destined for the MAC address of a VM on the affected host are broadcast out every switch port in the entire data centre.

Steps to Reproduce:
1) Implement configuration below.
2) Disconnect ethernet cable running between vmnic1 and switch1 port gi2/0/4
3) Run tcpdump -q -n -e host 1.1.1.1 on any physical linux machine in the data centre (Doesn't even need to be connected to switch 1 directly).
Confirm there are no packets seen with the destination IP of 1.1.1.1
4) Reconnect ethernet cable between vmnic1 and switch1 port gi2/0/4
5) Wait 60-120 seconds
6) Observe a flurry of ethernet frames with the destination MAC address of the VM (that owns 1.1.1.1) in the tcpdump output

Cisco Configuration:

hostname switch1
!
port-channel load-balance src-dst-ip

!

interface GigabitEthernet1/0/4

description vmnic0.host0 (NIC 1)

switchport trunk encapsulation dot1q

switchport mode trunk

channel-group 4 mode on

spanning-tree portfast trunk

end

!

interface GigabitEthernet2/0/4

description vmnic1.host0 (NIC 2)

switchport trunk encapsulation dot1q

switchport mode trunk

channel-group 4 mode on

spanning-tree portfast trunk

end

!

interface Port-channel4

description host0

switchport trunk encapsulation dot1q

switchport mode trunk

spanning-tree portfast trunk

end


ESXi Configuration
vSwitch0

NIC Teaming / Load Balancing: Route based on IP Hash
NIC Teaming / Network Failover Detection: Link status only
NIC Teaming / Notify Switches: Yes
NIC Teaming / Fallback: Yes
NIC Teaming / Active Adapters: vmnic0, vmnic1
NIC Teaming / Standby Adapters: Nil
NIC Teaming / Unused Adapters: Nil
Security / Promiscious Mode: Reject

Security / MAC Address Changes: Accept
Security / Forged Transmits: Accept


Virtual Machine Port Group #1
Network Label: "Public"

VLAN: 27
NIC Teaming: All Unchecked (Inherited)

Virtual Machine Port Group #2
Network Label: "Trunk"
VLAN: 4095
NIC Teaming: All Unchecked (Inherited)

VM Kernel Port #1
Network Label: "Management"
VLAN: 2

NIC Teaming: All Unchecked (Inherited)

VM #1

OS: Windows Server 2003
NIC 1 / Adapter: Flexible
NIC 1 / Network Label: "Public"

IP Address: 1.1.1.1/24

0 Kudos
1 Solution

Accepted Solutions
ryanwilliams83
Contributor
Contributor
Jump to solution

I apologise for the delay in my reply.

By physically unplugging the cable to NIC 2 on the host and configuring port mirroring on the switchport facing NIC 1 of the host I was able to confirm that the VMWare host is transmitting frames out NIC1 that have a destination MAC address of one of it's own VM's.

After carefully reviewing the captured data I noticed that the destination MAC address of the erroneous frames '02:bf:cb:50:a2:76' actually belonged to 'Local Area Connection 1' of one of my Windows 2003 server VM's housed on the suspect host.


Digging a little deeper I discovered that this virtual NIC is presented to the operating system as
Name: Local Area Connection 1
Type:  VMware Accelerated AMD PCNet Adapter"
MAC:  02-BF-CB-50-A2-76

But the configuration of the VM in vCentre is
Name: Network adapter 1
Type: Flexible
MAC: 00:0c:29:fa:e8:83

Note the different MAC addresses.

Upon digging a little deeper I have discovered that the Windows 2003 server has Network Load Balancing configured on "Local Area Connection 1" and NLB is the cause of the altered MAC address.

Rather than investigate further I have simply placed a linux load balancer in front of the windows 2003 server cluster and have disabled NLB.

However I suspect there is still a fundamental problem in the VMware Virtual Switch with the way it learns MAC addresses of it's VM's using flexible vNICS. In particular with any VM which uses some form of MAC address override like that used by NLB.

View solution in original post

0 Kudos
4 Replies
rickardnobel
Champion
Champion
Jump to solution

The configuration looks fine I think.

It seems like your physical switch is doing some kind of flooding of the frames for some reason.

Two things:

1. Are you totally sure the cables are correct attached? One possible reason is that if the cables in some way are mismatched between the ESXi host and the physical switch this could be extremly confusing for the switches where MAC addresses appear "everywhere". Doublecheck and if possible enable CDP on your vSwitches and verify on the Cisco CLI.

2. Have you checked the logfiles of your physical switch? You might get some clues from any potential issues like MAC flapping and similar.

My VMware blog: www.rickardnobel.se
0 Kudos
MKguy
Virtuoso
Virtuoso
Jump to solution

Symptoms: A short time after enabling load-balancing, all Ethernet Frames destined for the MAC address of a VM on the affected host are broadcast out every switch port in the entire data centre.

Even if the vSwitch would forward every received frame on the other physical vmnic, it would not change the destination MAC so it should just be forwarded into the other vmnic again by the physical switch. And your entire network should quickly meltdown after the first broadcast or unknown unicast due to the loop that is not prevented by spanning tree.

that it was being bombarded with ethernet frames destined for the MAC address of a VM located on a host on the other side of the data centre

This also hints at an issue in the layer 2 forwarding on your physical network where this unrelated system is connected to. If another system sees unicast frames destined for another MAC, then the physical switch might not have learned the MAC by usual source-MAC learning for whatever reason.

Also, it's not a multicast MAC or multicasting isn't involved here at all right?

In any case, ESXi/the vSwitch itself does not forward layer 2 frames like this. This article explains pretty well how it works:

http://blog.ioshints.info/2010/11/vmware-virtual-switch-no-need-for-stp.html

Split-horizon forwarding

Packets received through one of the uplinks are never forwarded to other uplinks. This rule prevents forwarding loops through the virtual switch.

Also have a look at this article and compare the described config for etherchannel (non-LACP) with yours.

http://kb.vmware.com/kb/1004048

Virtual Machine Port Group #2

Network Label: "Trunk"

VLAN: 4095

NIC Teaming: All Unchecked (Inherited)

What exactly is this port group used for? Are you trunking all VLANs to some VM to have it tag/untag frames itself? Do you do anything fancy like bridging VLANs with a mutli-vNIC VM?

-- http://alpacapowered.wordpress.com
0 Kudos
ryanwilliams83
Contributor
Contributor
Jump to solution

Thanks for your suggestions.

1) Yes I am sure the cables are correctly connected, In-fact the same problem occurs if I physically unplug the cable connected to port 2 of the host.

2) My switches are configured correctly and the logs do not show any mac-flapping.

0 Kudos
ryanwilliams83
Contributor
Contributor
Jump to solution

I apologise for the delay in my reply.

By physically unplugging the cable to NIC 2 on the host and configuring port mirroring on the switchport facing NIC 1 of the host I was able to confirm that the VMWare host is transmitting frames out NIC1 that have a destination MAC address of one of it's own VM's.

After carefully reviewing the captured data I noticed that the destination MAC address of the erroneous frames '02:bf:cb:50:a2:76' actually belonged to 'Local Area Connection 1' of one of my Windows 2003 server VM's housed on the suspect host.


Digging a little deeper I discovered that this virtual NIC is presented to the operating system as
Name: Local Area Connection 1
Type:  VMware Accelerated AMD PCNet Adapter"
MAC:  02-BF-CB-50-A2-76

But the configuration of the VM in vCentre is
Name: Network adapter 1
Type: Flexible
MAC: 00:0c:29:fa:e8:83

Note the different MAC addresses.

Upon digging a little deeper I have discovered that the Windows 2003 server has Network Load Balancing configured on "Local Area Connection 1" and NLB is the cause of the altered MAC address.

Rather than investigate further I have simply placed a linux load balancer in front of the windows 2003 server cluster and have disabled NLB.

However I suspect there is still a fundamental problem in the VMware Virtual Switch with the way it learns MAC addresses of it's VM's using flexible vNICS. In particular with any VM which uses some form of MAC address override like that used by NLB.

0 Kudos