Solved: Re: Teamed NICs and Failback

StephenMoll · ‎01-30-2018

Imagine I have a host with:

2 pNICs
Both pNICs attached to a vSwitch:
- Route based on Originating Port ID
- Failover Detection : Link Status Only
- Notify Switches : No
- Failback : No
- Failover Order : vmnic0, vmnic1 : Active

On startup of the host and VMs, the VMs will get roughly evenly distributed between both pNICs, right?

If a pNIC is lost for some reason, all traffic will divert to the remaining NIC.

If the failed NIC returns, am I right in saying that all traffic will continue to traverse the link that didn't fail?

daphnissov · ‎01-31-2018

No, the failback option works when all vmnics are in an Active state as well. I'll illustrate this for you in my lab. All screenshots progress in a linear fashion from first step to last time-wise.

Start with a pretty simple setup. Two vmnic uplinks both in an active state with failback set to "Yes". The port group "vSS-Network-3" is in use here and consumed by 3 different VMs (Photon-02, -03, and -04).

We confirm the physical NIC selection in esxtop.

Now let's fail vmnic3.

Check again which uplinks are selected after failover.

All are now forced to use vmnic0.

Let's return vmnic3 to service.

Check again the teaming status.

You'll notice Photon-02 and -04 return to their positions in using vmnic3 as their preferred uplink from the switch.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

View solution in original post

daphnissov · ‎01-30-2018

On startup of the host and VMs, the VMs will get roughly evenly distributed between both pNICs, right?

"Evenly" cannot be guaranteed, but VM ports will be distributed amongst vmnics.

If a pNIC is lost for some reason, all traffic will divert to the remaining NIC.

Yes, to the remaining vmnic (uplink).

If the failed NIC returns, am I right in saying that all traffic will continue to traverse the link that didn't fail?

Yes, based on how you have the vswitch configured with failback=no.

Also, you say you have notify switches=no. You probably should set that to Yes in almost all cases unless you have a very specific use case that requires it.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

StephenMoll · ‎01-30-2018

The use case is some VMs running Microsoft Server Failover Clustering. So notify switch has to be "No".

I am trying to get my head around some of the settings on this environment that my predecessor left behind for me. I think the "Failback : No" is a mistake. I assume that if it is yes, even in an active:active setup, i.e. no pNICs in the "standby" list, load balancing won't rebalance when the failed pNIC/link is recovered.

daphnissov · ‎01-30-2018

There might be some confusion as to how that setting works. It basically sends a gratuitous ARP upstream to inform the physical layer that a VM has either powered on or vMotioned (and others). This article does a pretty good job in explaining that.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

StephenMoll · ‎01-30-2018

Nope, no confusion about "Notify Switches", see VMware Knowledge Base.

This is really about "Failback". Some descriptions of it suggest it only works when you have pNICs in the "Standby" list within the "Failover Order", in fact the most common descriptions only talk about it in this context. Now I have only 2 pNICs and both are in the "Active" portion of the list, I wanted to be clear as to its function. Does it do nothing in either case, or if set to "YES" will my teamed link rebalance?

daphnissov · ‎01-31-2018

No, the failback option works when all vmnics are in an Active state as well. I'll illustrate this for you in my lab. All screenshots progress in a linear fashion from first step to last time-wise.

Start with a pretty simple setup. Two vmnic uplinks both in an active state with failback set to "Yes". The port group "vSS-Network-3" is in use here and consumed by 3 different VMs (Photon-02, -03, and -04).

We confirm the physical NIC selection in esxtop.

Now let's fail vmnic3.

Check again which uplinks are selected after failover.

All are now forced to use vmnic0.

Let's return vmnic3 to service.

Check again the teaming status.

You'll notice Photon-02 and -04 return to their positions in using vmnic3 as their preferred uplink from the switch.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

StephenMoll · ‎01-31-2018

And with "Failback" set to "False" the last esxtop status would have shown all the VMs still on vmnic0, yes?

daphnissov · ‎01-31-2018

Yes

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

StephenMoll · ‎01-31-2018

Thanks.

What's after warrior? "Community Lieutenant", "Community General"?

daphnissov · ‎01-31-2018

Probably "Community Potwasher"

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

StephenMoll · ‎01-31-2018

Ha! Ha!

Good luck with that!

sspikent · ‎04-02-2019

One more thing please...

'And with "Failback" set to "False" the last esxtop status would have shown all the VMs still on vmnic0, yes?'. From the PortGroup uplink view, does that mean...

Uplink vmnic3 is moved into 'Standby uplinks' on PG?

Once vmnic3 is returned to service, with both vmnic0 and vmnic3 originally 'Active uplinks' (no 'standby' or 'unused') does it go back into 'Active uplinks' automatically and not into 'Standby' ?

I'm hoping vmnic3 goes back to 'Active uplinks' group automatically.

StephenMoll · ‎04-02-2019

Vmnic3 will come back into its previous state, so if it was Active before the failure it will be restored to the Active list.

Virtual0 · ‎09-09-2022

In "Route based on the originating virtual port", when a VM is powered on, the virtual switch choose an uplink based on a hash of the virtual port ID and on the # of uplinks on the virtual switch. Note: this is NOT round-robin amongst the uplinks.

For a VM still powered-on, this hash is re-calculated only whenever an uplink is added to or removed from the NIC Team. Whenever an active uplink fails it is "removed" from the Team.

In your situation with no standby uplinks, when the vmnic3 recovers, it rejoins the Team. Hence the hash will be recalculated. This is the reason why, those VMs move back to vmnic3. This "moving back" will occur, even if Failback had been set to No.

The Failback setting is relevant only if there is a standby uplink. This failback setting determines if the failed adapter (when recovers) will become active immediately (displacing the standby that took its place) [Failback = yes] or will remain unused (at least until another active adapter fails) [Failback = no].

This Failback setting has nothing to do with the 2 VMs “failing back” to vmnic3.