VMware Cloud Community
StephenMoll
Expert
Expert
Jump to solution

Teamed NICs and Failback

Imagine I have a host with:

  • 2 pNICs
  • Both pNICs attached to a vSwitch:
    • Route based on Originating Port ID
    • Failover Detection : Link Status Only
    • Notify Switches : No
    • Failback : No
    • Failover Order : vmnic0, vmnic1 : Active

On startup of the host and VMs, the VMs will get roughly evenly distributed between both pNICs, right?

If a pNIC is lost for some reason, all traffic will divert to the remaining NIC.

If the failed NIC returns, am I right in saying that all traffic will continue to traverse the link that didn't fail?

1 Solution

Accepted Solutions
daphnissov
Immortal
Immortal
Jump to solution

No, the failback option works when all vmnics are in an Active state as well. I'll illustrate this for you in my lab. All screenshots progress in a linear fashion from first step to last time-wise.

Start with a pretty simple setup. Two vmnic uplinks both in an active state with failback set to "Yes". The port group "vSS-Network-3" is in use here and consumed by 3 different VMs (Photon-02, -03, and -04).

pastedImage_0.png

We confirm the physical NIC selection in esxtop.

pastedImage_1.png

Now let's fail vmnic3.

pastedImage_2.png

Check again which uplinks are selected after failover.

pastedImage_3.png

All are now forced to use vmnic0.

Let's return vmnic3 to service.

pastedImage_4.png

Check again the teaming status.

pastedImage_5.png

You'll notice Photon-02 and -04 return to their positions in using vmnic3 as their preferred uplink from the switch.

View solution in original post

13 Replies
daphnissov
Immortal
Immortal
Jump to solution

On startup of the host and VMs, the VMs will get roughly evenly distributed between both pNICs, right?

"Evenly" cannot be guaranteed, but VM ports will be distributed amongst vmnics.

If a pNIC is lost for some reason, all traffic will divert to the remaining NIC.

Yes, to the remaining vmnic (uplink).

If the failed NIC returns, am I right in saying that all traffic will continue to traverse the link that didn't fail?

Yes, based on how you have the vswitch configured with failback=no.

Also, you say you have notify switches=no. You probably should set that to Yes in almost all cases unless you have a very specific use case that requires it.

Reply
0 Kudos
StephenMoll
Expert
Expert
Jump to solution

The use case is some VMs running Microsoft Server Failover Clustering. So notify switch has to be "No".

I am trying to get my head around some of the settings on this environment that my predecessor left behind for me. I think the "Failback : No" is a mistake. I assume that if it is yes, even in an active:active setup, i.e. no pNICs in the "standby" list, load balancing won't rebalance when the failed pNIC/link is recovered.

Reply
0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

There might be some confusion as to how that setting works. It basically sends a gratuitous ARP upstream to inform the physical layer that a VM has either powered on or vMotioned (and others). This article does a pretty good job in explaining that.

Reply
0 Kudos
StephenMoll
Expert
Expert
Jump to solution

Nope, no confusion about "Notify Switches", see VMware Knowledge Base​.

This is really about "Failback". Some descriptions of it suggest it only works when you have pNICs in the "Standby" list within the "Failover Order", in fact the most common descriptions only talk about it in this context. Now I have only 2 pNICs and both are in the "Active" portion of the list, I wanted to be clear as to its function. Does it do nothing in either case, or if set to "YES" will my teamed link rebalance?

Reply
0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

No, the failback option works when all vmnics are in an Active state as well. I'll illustrate this for you in my lab. All screenshots progress in a linear fashion from first step to last time-wise.

Start with a pretty simple setup. Two vmnic uplinks both in an active state with failback set to "Yes". The port group "vSS-Network-3" is in use here and consumed by 3 different VMs (Photon-02, -03, and -04).

pastedImage_0.png

We confirm the physical NIC selection in esxtop.

pastedImage_1.png

Now let's fail vmnic3.

pastedImage_2.png

Check again which uplinks are selected after failover.

pastedImage_3.png

All are now forced to use vmnic0.

Let's return vmnic3 to service.

pastedImage_4.png

Check again the teaming status.

pastedImage_5.png

You'll notice Photon-02 and -04 return to their positions in using vmnic3 as their preferred uplink from the switch.

StephenMoll
Expert
Expert
Jump to solution

And with "Failback" set to "False" the last esxtop status would have shown all the VMs still on vmnic0, yes?

Reply
0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

Reply
0 Kudos
StephenMoll
Expert
Expert
Jump to solution

Thanks.

What's after warrior? "Community Lieutenant", "Community General"?

Reply
0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

Probably "Community Potwasher"

StephenMoll
Expert
Expert
Jump to solution

Ha! Ha!

Good luck with that!

Reply
0 Kudos
sspikent
Enthusiast
Enthusiast
Jump to solution

One more thing please...

'And with "Failback" set to "False" the last esxtop status would have shown all the VMs still on vmnic0, yes?'. From the PortGroup uplink view, does that mean...

Uplink vmnic3 is moved into 'Standby uplinks' on PG?

Once vmnic3 is returned to service, with both vmnic0 and vmnic3 originally 'Active uplinks' (no 'standby' or 'unused') does it go back into 'Active uplinks' automatically and not into 'Standby' ?

I'm hoping vmnic3 goes back to 'Active uplinks' group automatically.

pastedImage_1.png

Reply
0 Kudos
StephenMoll
Expert
Expert
Jump to solution

Vmnic3 will come back into its previous state, so if it was Active before the failure it will be restored to the Active list.

Virtual0
Contributor
Contributor
Jump to solution

In "Route based on the originating virtual port", when a VM is powered on, the virtual switch choose an uplink based on a hash of the virtual port ID and on the # of uplinks on the virtual switch.  Note: this is NOT round-robin amongst the uplinks.

For a VM still powered-on, this hash is re-calculated only whenever an uplink is added to or removed from the NIC Team. Whenever an active uplink fails it is "removed" from the Team.

In your situation with no standby uplinks, when the vmnic3 recovers, it rejoins the Team. Hence the hash will be recalculated. This is the reason why, those VMs move back to vmnic3. This "moving back" will occur, even if Failback had been set to No.

The Failback setting is relevant only if there is a standby uplink. This failback setting determines if the failed adapter (when recovers) will become active immediately (displacing the standby that took its place) [Failback = yes] or will remain unused (at least until another active adapter fails) [Failback = no].

This Failback setting has nothing to do with the 2 VMs “failing back” to vmnic3.

Reply
0 Kudos