VMware Cloud Community
RSEngineer
Enthusiast
Enthusiast

ESX NIC TEAMING Part DEUX

VMware Experts:

Why is it the case that with IP Hash-based NIC Teaming, the uplinks need to be terminated on the same upstream switch or two separate switches that are stacked or have a Cisco vPC between them? It implies, of course, that the uplinks must be part of a LAG, so ipso fatso (as Archie Bunker would have said), they must terminate on the same switch or separate switches that are virtualized.

But I dont see why - why cant a vNIC simply be pinned to both uplinks, have the flows salt and peppered across them, and have those uplinks go to 2 separate switches without having to be part of a LAG?

HELP! :smileyconfused:

0 Kudos
16 Replies
jjkrueger
VMware Employee
VMware Employee

Virtual switches are relatively simple Layer 2 network constructs.

A VM's virtual NIC will attach to a vSwitch, and the VMkernel will assign it a vSwitch port. When the VM sends traffic, the vSwitch will do what any other L2 switch will do - inspect the Ethernet header for a destination MAC address, and forward the frame appropriately to the next hop.

If the vSwitch has 2 uplinks attached to two different switches, and the virtual NIC is sending traffic through both uplinks, my L2 network outside the host will get a bit confused - is the VM's MAC address on the other end of pSwitch 1/port 2, or at the other end of pSwitch 2/port2? In other words, from a L2 perspective, which port is the appropriate next hop to get traffic to the VM?

802.3ad provides for a mechanism to keep that information somewhat managed by presenting the physical switch a logical switch port. But even with that, the vSwitch can only work with static link aggregation, such that a flow will leave the ESXi host through one uplink, and that flow must return by that uplink. We can't deal with a flow leaving one uplink, and returning on another.

From a capabilities perspective, we can't think of vSwitches in the same way we think of our nice Cisco or Brocade gear that we're using in our enterprise datacenters. The vSwitches in vSphere are not the same caliber of intelligent, feature rich gear. This is why Cisco and IBM have stepped up and provided their own vSwitches that overlay VMware's Distributed vSwitch layer.

0 Kudos
RSEngineer
Enthusiast
Enthusiast

J:

If the vSwitch has 2 uplinks attached to two different switches, and the  virtual NIC is sending traffic through both uplinks, my L2 network  outside the host will get a bit confused - is the VM's MAC address on  the other end of pSwitch 1/port 2, or at the other end of pSwitch  2/port2?  In other words, from a L2 perspective, which port is the  appropriate next hop to get traffic to the VM?

This is the paragraph of interest for me ( I knew the other stuff). So, I suspected this is the answer, but I am trying to make sense out of it. I am trying to create a scenario in my head and break it, and Im not sure I can. What exactly is the problem with two separate physical switches having a path to the same MAC address? If pSwitch 1 receives the packet, it forwards it - the same for pSwitch 2. Im not saying I dont agree with you, Im just playing devils advocate so I can understand this.

0 Kudos
kjb007
Immortal
Immortal

In a non LAG/channel port config, traffic has to go in and out of the same interface.  If traffic leaves through 1 switch port, and tries to come back in from another, that traffic will get dropped, and you will have network disconnection.  If it does not get dropped, you have a loop.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RSEngineer
Enthusiast
Enthusiast

"If traffic leaves through 1 switch port, and tries to come back in from  another, that traffic will get dropped, and you will have network  disconnection."

Thats not accurate. No traffic gets dropped with asymmetric switching. The ony exception is with stateful forwarding, as with a FW.

0 Kudos
kjb007
Immortal
Immortal

It will get dropped by ESX.  Since ESX does not participate in STP, one of its loop avoidance techniques is the one I described.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RSEngineer
Enthusiast
Enthusiast

K...can you please elaborate further...didnt know that aout vmware...make it dummy-proof 🙂

0 Kudos
SteveFuller2011
Enthusiast
Enthusiast

Remember that a switch forwards traffic on the basis of the source MAC address seen on its ports. Also remember that the IEEE 802.3ad does not specify how traffic is load balanced across a LAG.

To get good distribution of load most LAG implementation will load balance trafifc on the basis of a conversation e.g., MAC 1 - MAC 2 via one link of the LAG, and MAC 1 - MAC 3 via another etc.

If the two uplinks are not part of a LAG i.e., they connect  to different physical switches,  the MAC 1 address would first be seen on a port of the 1st switch, then a port of the 2nd switch, then a port of the 1st switch... and so on.

It's obviously not desirable to have a MAC address thrashing around like this as traffic is likely to get dropped.

Regards

0 Kudos
kjb007
Immortal
Immortal

Here's a cisco doc describing some of the vSwitch features, it's old, but still valid :  http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/vmware/VMware.html

Since ESX is not talking STP, it can't negotiate and turn off redundant links, so it uses the avoidance feature I described earlier.  This can lead to some interesting scenarios where in certain cases, you can have vm's in adjacent hosts with multiple NICs, and depending on the way switches are trunked together, two adjacent vm's, one each on an adjacent host, can end up in a configuration where they can not talk to each other.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RSEngineer
Enthusiast
Enthusiast

K:...Please describe what you say you described earlier....how does that loop prevention mechanism work....please give me details on that specific loop prevention mechanism..thanks

0 Kudos
kjb007
Immortal
Immortal

ESX will not allow traffic to enter a physical NIC that it did not  leave out of.  Depending on the load balancing mechanism, a virtual port  will be used for vm traffic, which will ultimately route down a certain  physical NIC, and end up at a switch.  That traffic must return the  same way, or ESX will drop the packets so as to prevent this loop.

It will not forward that traffic  back to the switch to come back in from the physical NIC it was expecting, it will drop or not accept those packets.

This is shown in that VLAN Provision section of the link I posted earlier.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
RSEngineer
Enthusiast
Enthusiast

K:

I read the paper and I am not sure that you and the paper are saying the same thing. The paper says that the vswitch will refuse traffic destined for a vNIC if it receives that traffic on a vmnic other than the one it is pinned to. It doesnt say that the vswitch is maintaining an elaborate stateful forwarding table, like a FW, and will refuse return traffic on a vmnic if another vmnic was used for sending.

And, in the case of IP hashing NIC teaming, the vNIC is pinned to BOTH vmnics - it uses both to send, after all. So, it should accept traffic from both.

0 Kudos
kjb007
Immortal
Immortal

Well, not exactly.  With IP hash, the vmnic will be chosen for each "conversation", and will hash out to one vmnic,and traffic will be expected to return to that vmnic.  That algorithm is supposed to match up with the config on the switch, which would be in a channel, and will allow that MAC to live on that "interface", otherwise, without the channel, the MAC will live on one port or the other.  So when traffic for that hash is returned to an incorrect vmnic, that traffic will not be allowed.  You can see this happen even with a channel, if the switch does not use src-dst-ip hash, but using some other technique like mac-src-dst hashing.

This doesn't work across physical switches, because the vmnic and the physical NICs aren't using the same algorithm, and so the switch ports will have MAC addresses on two different ports because both switches will learn that mac address individually.  Not only will you have intermittent network connectivity to the vm, but you also create a network loop and can initiate a pretty large broadcast storm depending on how many vm's are on the host.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RSEngineer
Enthusiast
Enthusiast

K:

Thank you. The first paragraph makes sense....you lost me on this one.


"This doesn't work across physical switches, because the vmnic and the  physical NICs aren't using the same algorithm,

Isnt a vmnic what vmware calls a physical NIC? They are the same thing, no?


and so the switch ports  will have MAC addresses on two different ports because both switches  will learn that mac address individually.  Not only will you have  intermittent network connectivity to the vm, but you also create a  network loop and can initiate a pretty large broadcast storm depending  on how many vm's are on the host."

its true that the tell-tale sign of a loop is seeing the same MAC on 2 different switch ports, but the reverse is not true. Seeing the same MAC on 2 different switch ports does not cause a loop and a broadcast storm...intermittent connectivity for the VM? yes. Broadcast storm? No.

0 Kudos
kjb007
Immortal
Immortal

1. I meant vmnic and the physical switch.

For #2, Ok, what will happen when return traffic encounters two separate paths to the same MAC? Again, there's no STP, so both paths are valid and FW.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RSEngineer
Enthusiast
Enthusiast

2025092.png

If server traffic to D2 goes through A1 (green), and then the return traffic to the server goes through A2 (Red), given the rule that you are telling me about the vswitch, then yes, the vswitch will drop the traffic from A2 and the VM will have intermittent connectivity. But there is no loop. Where is the loop? remember that the vswitch in the hypervisor will never forward traffic from one uplink to the other.

0 Kudos
kjb007
Immortal
Immortal

The difference in this diagram is the L2 connection between the switches whereby you're trunking the VLANs presented.  In this design, you prevent the loop, but you introduce connectivity problems using IP hash.

Without the L2 connection, you end up with a loop by using IP hash.

Ultimately, you still end up with connectivity problems in both scenarios, but a loop in one.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos