VMware Cloud Community
elaguni
Enthusiast
Enthusiast

Help Needed for Network Load Sharing (vSwitch or DVS)

Hi everybody,

Hoping to get some advice about network load balancing using vSwitch or Distributed Switch (DVS - not sure what is the proper abbreviation now for this). Some info about the environment first:

  • New deployment of vSphere 6.5 with Enterprise Plus license
  • Several servers (same specs) with multiple 10Gb ports
  • Stacked 2x Dell S4048-ON switches

Part of the requirements is to use active-active multi-nic for all network traffic. We're not expecting link aggregation like 2x 10Gb links = 20Gb link, but we want to see the network traffic coming from multiple sources flow through different ports in and out of the ESXi hosts.

I configured several DVS with 2x 10Gb ports for some vSphere services (NFS, vMotion, Prod, etc) but observed the following in esxtop:

  1. DVS + LACP (1 LAG with 2 uplinks; route based on ip, tcpip, virtual port) + Switch LACP = only 1 port gets network traffic (port failover works)
  2. DVS + LACP (same LAG as #1) + NO switch LACP/portchannel (plain switchport) = connection lost (cannot even ping)
  3. DVS + 2 Active Uplinks (route based on IP hash; not LACP LAG) + Switch LACP = only 1 port gets network traffic (port failover works)
  4. DVS + 2 Active Uplinks (route based on IP hash; not LACP LAG) + NO switch LACP/portchannel (plain switchport) = 2 ports used (port failover works)

Need some configuration advice how to get active-active multi-nic on the virtualization hosts.

Reply
0 Kudos
9 Replies
daphnissov
Immortal
Immortal

Use the vDS and set both uplinks in active state. Change the teaming policy to route based on physical NIC load. Keep in mind if you're testing this that this is more of a load sharing mechanism. A given VM will only send traffic out of one of the other uplink, not both simultaneously. It is also possible if the combined traffic of all VMs is below 75% of your 10GbE uplinks that you may not see any traffic on the other, and this is normal.

Reply
0 Kudos
elaguni
Enthusiast
Enthusiast

daphnissov

I didn't choose Route based on physical NIC load because I understand that there is a big chance that all traffic will go to one NIC unless this mechanism deems it necessary to move some session to the other NIC(s). Tried it, and it took awhile + massive workloads to see it move some workloads to another NIC.

We want, say Prod vDS with vmnic0 and 1 as uplinks, 5x VMs use vmnic0 then 5x VMs use vmnic1. It might not be as balanced as that but at least multiple vmnics for multiple sessions (I understand 1 session will only flow through 1 vmnic).

vDS + 2x Active Uplinks (Route based on IP hash) + NO Physical Switch LACP/Portchannel (plain switchport) is the configuration that gives us that kind of load sharing.

Some concerns:

1. I thought Route based on IP hash needs LACP/Portchannel on the physical switch regardless vSwitch or vDS.

2. Any advice to better the config?

Reply
0 Kudos
Nick_Andreev
Expert
Expert

You are correct that Route Based on IP Hash requires static port channel (not LACP) on the physical switch. You will need LACP if you decide to use LAG feature in vDS.

Taking one step back, though. Why does the default Route Based on Originating Virtual Port not satisfy your requirements? If you have multiple VMs, network traffic will be balanced across both uplinks.

If it didn't in your testing, let us know how you are testing it. Because it absolutely has to.

---
If you found my answers helpful please consider marking them as helpful or correct.
VCIX-DCV, VCIX-NV, VCAP-CMA | vExpert '16, '17, '18
Blog: http://niktips.wordpress.com | Twitter: @nick_andreev_au
Reply
0 Kudos
daphnissov
Immortal
Immortal

As Nick said, the default of route based on originating virtual port ID should accomplish a behavior similar. ESXi will assign a VM to a given link but, unlike LBT, will not reevaluate its placement based on the load on that NIC.

I'm just curious as to why neither of these teaming configurations would satisfy your requirement to "see the network traffic coming from multiple sources flow through different ports in and out of the ESXi hosts." Can you elaborate?

Reply
0 Kudos
elaguni
Enthusiast
Enthusiast

For a second you got me thinking why in the world I didn't try that, until I looked at my notes - I did test Route based on originating virtual port. Attached are screenshots, one using standard vSwitch while the other was using vDS with 2x vmnics/uplinks (both active):

vSwitch

VirtualPortID_vSwitch.jpg

vDS

VirtualPortID_vDS.jpg

Early in the deployment I tested using a couple of small Linux VMs I have in my thumb drive, gave each a unique IP address (same subnet as the esxi mgmt), then started ping (with max bytes) to and from these VMs to two physical machines. In the screenshots we'll notice that both VMs were pinned to a single vmnic. Needless to say, network traffic flowed through that port alone. Anything wrong with what I did?

I will give it a try again next week with proper VMs + vNetwork with Route based on originating virtual port. Thank you very much for the inputs.

Reply
0 Kudos
daphnissov
Immortal
Immortal

With only two VMs you don't have a large enough sample set to see any of these algorithms function properly. It just so happened the ESXi selected the same uplink for both VMs. If you were using a vDS with LBT, it is also possible that same thing would occur, until at least one vmnic became more than 75% saturated. So, again, if this is unwanted, then what is the source of your requirement that both uplinks be utilized at all times?

Reply
0 Kudos
elaguni
Enthusiast
Enthusiast

Hi Nick_Andreev,

Let me correct myself with regard to the requirement, we need a VM use multiple ports when talking to several physical clients of different ip addresses. I reckon LB Route based on originating port won't cut it.

I'll check with the network team to make sure that they did static etherchannel.

Why not LACP LAG? Thank you for your help again.

Reply
0 Kudos
Nick_Andreev
Expert
Expert

In that case you're right. Port ID policy will pin VM to one uplink. So you do need an IP Hash policy with EtherChannel (static LAG) on the switch if you want to load balance based on source / destination IPs.

You can also use LACP feature of vDS. Just make sure to use LACP (dynamic LAG) on the switch as well.

---
If you found my answers helpful please consider marking them as helpful or correct.
VCIX-DCV, VCIX-NV, VCAP-CMA | vExpert '16, '17, '18
Blog: http://niktips.wordpress.com | Twitter: @nick_andreev_au
Reply
0 Kudos
elaguni
Enthusiast
Enthusiast

The configuration that finally worked for us:

Distributed Switch + 1 LACP LAG with 4 Uplinks set to Active + Route based on on Source and Destination IP, TCP/UDP port + Physical switch ports set to Passive LACP

All  this time I've been using LACP Active on both sides (virtual switch and physical switch), but this part of the guide recommends otherwise:

Create a Link Aggregation Group

Reply
0 Kudos