VMware Cloud Community
hxman101
Contributor
Contributor

Network Teaming Load Balancing

We are seeing packet loss in our Cisco UCS environment running ESXi 5.1. What is the recommended teaming policy if the upstream switching environment is running Nexus switches and vPC?

I was under the impression when 2 uplinks are running active/active the recommended Load Balancing policy to use was "Route based on source MAC hash"?

Would be benefit from moving to ESXi 5.5 (LACP) enhancements?

Thanks Tom

Tags (1)
5 Replies
f10
Expert
Expert

Hi,

I have used route based on ip hash if ether channel is configured.

-f10

Regards, Arun Pandey VCP 3,4,5 | VCAP-DCA | NCDA | HPUX-CSA | http://highoncloud.blogspot.in/ If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos
VirtuallyMikeB

Good day,

Route based on Virtual Port ID works just fine in most cases.  This means MAC-hash would work fine, as well, perhaps with more distributed traffic flows.  The ESXi host does not know about the vPC configuration from the Fabric Interconnects to the Nexus' and therefore, configuring IP-hash for an ESXi host in a blade server does not make sense in most cases.  If you have Enterprise Plus licensing, Load-based Teaming may be a better option.  If you're required to separate traffic across links, like IP storage traffic, for instance, you can use Static Uplink Pinning.

You should configure your Chassis Discovery Policy to use a port-channel if at all possible.  This is from the chassis I/O module to the Fabric Interconnects, as you know, and comes with all the greatness that is LACP: single logical link for load-balancing and failover although admittedly, the UCS does a fine job of basic load-balancing across available links.  If you don't use a port-channel here and also don't configure a vNIC for fabric failover, a failure of that vNIC's path from the chassis I/O module to the Fabric Interconnect will result in a loss of connectivity for that vNIC.  It will not get re-pinned to another link in the fabric and it won't failover to the other fabric.  Use a port-channel.

chassis-discovery.jpg

Remember your ESXi hosts will see the Fabric Interconnects at their directly connected switch, even in End-Host mode.  From here, you essentially treat the ESXi host connectivity as if each host were connected to separate switches, something like a Cisco 4500-series switch without multi-chassis link aggregation.  As far as the ESXi host is concerned, that's how it sees the Fabric Interconnects, two separate switches and two separate fabrics.  It's not like connecting a rack-mounted ESXi host directly to the 5k, in which case you'd have the reasonable option to use LACP and vPCs.

FI-upstream.jpg

As a design decision, you can choose to let VMware do the failover (as you would traditionally with a rack server) or you can configure the UCS to failover vNICs (choosing to configure fabric failover for created vNICs).  I usually let VMware perform the failover because it does a fine job.

fabric-failover.jpg

Grab a whiteboard and trace out the traffic flows should you have a fabric failure for various configuration scenarios.

-----------------------------------------

Please consider marking this answer "correct" or "helpful" if you found it useful.

Mike Brown

VMware, Cisco Data Center, and NetApp dude

Consulting Engineer

michael.b.brown3@gmail.com

Twitter: @VirtuallyMikeB

Blog: http://VirtuallyMikeBrown.com

LinkedIn: http://LinkedIn.com/in/michaelbbrown

----------------------------------------- Please consider marking this answer "correct" or "helpful" if you found it useful (you'll get points too). Mike Brown VMware, Cisco Data Center, and NetApp dude Sr. Systems Engineer michael.b.brown3@gmail.com Twitter: @VirtuallyMikeB Blog: http://VirtuallyMikeBrown.com LinkedIn: http://LinkedIn.com/in/michaelbbrown
hxman101
Contributor
Contributor

And this was the reply from VMware:

The usage of teaming policy is subject to the requirement. In short the

recommendation for the policy depends on what you want to achieve. If you would

need specifically to have the traffic from VMs going out of teamed physical

uplinks to be distributed in certain order and not only through one uplink , you

need to have load balancing chosen as "Route based on IP hash" .  This requires

your physical switch ports which are connected to the uplinks , to be configured

with dynamic Link Aggregation (ether channel , LACP  depending on switch

vendor).

If you do not need any such load balancing based on IP hash or

MAC hash , you need to have the default load balancing policy as 'Based on

Virtual Port ID". the physical switch ports need not have any special link

aggregation to be configured.

The load balancing policy on the VDS is not

affected by the vendor of upstream switch. However server hardware and it's I/O

devices should be compatible with ESXi version.  As long as the physical switch

port is configured as needed , and link status of the ports are up (end to end)

it should be fine.

0 Kudos
VirtuallyMikeB

This is a generic answer that basically says, "it depends," then gives an example of several teaming policies, not taking into consideration blade server networking at all.  While generally correct, it doesn't help move you forward.

IP hash will not affect traffic north of the Fabric Interconnects - ESXi doesn't "see" the vPC.  Depending on source-destination pairs, IP hash will simply move back and forth between sending packets to Fabric Interconnect A and Fabric Interconnect B.  If you use this configuration, it's possible that the UCS could dynamically pin traffic to the same Nexus!  So there's no point to use IP hash in the first place in this case.  With proper planning (with an eye towards keeping complexity to a minimum), you can use Quality of Service configs and Active/Passive teaming on the ESXi host to purposefully place traffic on each fabric and from there, let UCS dynamically pin traffic north to the Nexus'.

Check out this excellent post from VMware's Brad Hedlund (it's an oldie but goodie) that speaks to using this method

http://bradhedlund.com/2010/09/15/vmware-10ge-qos-designs-cisco-ucs-nexus/

All the best,

Mike

-----------------------------------------

Please consider marking this answer "correct" or "helpful" if you found it useful.

Mike Brown

VMware, Cisco Data Center, and NetApp dude

Consulting Engineer

michael.b.brown3@gmail.com

Twitter: @VirtuallyMikeB

Blog: http://VirtuallyMikeBrown.com

LinkedIn: http://LinkedIn.com/in/michaelbbrown

Message was edited by: Mike Brown

----------------------------------------- Please consider marking this answer "correct" or "helpful" if you found it useful (you'll get points too). Mike Brown VMware, Cisco Data Center, and NetApp dude Sr. Systems Engineer michael.b.brown3@gmail.com Twitter: @VirtuallyMikeB Blog: http://VirtuallyMikeBrown.com LinkedIn: http://LinkedIn.com/in/michaelbbrown
0 Kudos
AnkitCP
Contributor
Contributor

You experience this issue when running ESXi/ESX on Cisco UCS B200 M1/2 blade servers configured to use the Route based on IP hash NIC teaming policy.

When enabled, the NIC teaming policy involves a team of at least two NICs that selects an uplink based on a hash of the source and destination IP addresses of each packet.

Host network performance degradation can occur when using the Route based on IP hash NIC teaming policy. This is because a configuration using cross-stack link aggregation or grouping of multiple physical ports on UCS 6100 Series Fabric Interconnects deployed as a redundant pair are not supported solutions.

As a result of the degraded network performance, you may see:

  • Intermittent packet loss.
  • The vSphere Client or vCenter Server may lose its connection to the ESXi/ESX host.

To resolve this issue, change the NIC teaming policy to Route based on originating virtual port ID.

Route based on IP hash is not supported with Cisco UCS B200 M1/M2 blade servers that use UCS 6100 Se...

UCS B-series Teaming, Bonding Options with the Cisco VIC Card - Cisco

Operating systemSupportedNot supported
VMWare ESXi
  1. Route Based on Originating Port ID
  2. Route Based on Source MAC Hash
  1. Route Based on IP Hash
  2. Route Based on Physical NIC Load
Thanks, Ankit Mehrotra
0 Kudos