Imagine I have a host with:
On startup of the host and VMs, the VMs will get roughly evenly distributed between both pNICs, right?
If a pNIC is lost for some reason, all traffic will divert to the remaining NIC.
If the failed NIC returns, am I right in saying that all traffic will continue to traverse the link that didn't fail?
No, the failback option works when all vmnics are in an Active state as well. I'll illustrate this for you in my lab. All screenshots progress in a linear fashion from first step to last time-wise.
Start with a pretty simple setup. Two vmnic uplinks both in an active state with failback set to "Yes". The port group "vSS-Network-3" is in use here and consumed by 3 different VMs (Photon-02, -03, and -04).
We confirm the physical NIC selection in esxtop.
Now let's fail vmnic3.
Check again which uplinks are selected after failover.
All are now forced to use vmnic0.
Let's return vmnic3 to service.
Check again the teaming status.
You'll notice Photon-02 and -04 return to their positions in using vmnic3 as their preferred uplink from the switch.
On startup of the host and VMs, the VMs will get roughly evenly distributed between both pNICs, right?
"Evenly" cannot be guaranteed, but VM ports will be distributed amongst vmnics.
If a pNIC is lost for some reason, all traffic will divert to the remaining NIC.
Yes, to the remaining vmnic (uplink).
If the failed NIC returns, am I right in saying that all traffic will continue to traverse the link that didn't fail?
Yes, based on how you have the vswitch configured with failback=no.
Also, you say you have notify switches=no. You probably should set that to Yes in almost all cases unless you have a very specific use case that requires it.
The use case is some VMs running Microsoft Server Failover Clustering. So notify switch has to be "No".
I am trying to get my head around some of the settings on this environment that my predecessor left behind for me. I think the "Failback : No" is a mistake. I assume that if it is yes, even in an active:active setup, i.e. no pNICs in the "standby" list, load balancing won't rebalance when the failed pNIC/link is recovered.
There might be some confusion as to how that setting works. It basically sends a gratuitous ARP upstream to inform the physical layer that a VM has either powered on or vMotioned (and others). This article does a pretty good job in explaining that.
Nope, no confusion about "Notify Switches", see VMware Knowledge Base.
This is really about "Failback". Some descriptions of it suggest it only works when you have pNICs in the "Standby" list within the "Failover Order", in fact the most common descriptions only talk about it in this context. Now I have only 2 pNICs and both are in the "Active" portion of the list, I wanted to be clear as to its function. Does it do nothing in either case, or if set to "YES" will my teamed link rebalance?
No, the failback option works when all vmnics are in an Active state as well. I'll illustrate this for you in my lab. All screenshots progress in a linear fashion from first step to last time-wise.
Start with a pretty simple setup. Two vmnic uplinks both in an active state with failback set to "Yes". The port group "vSS-Network-3" is in use here and consumed by 3 different VMs (Photon-02, -03, and -04).
We confirm the physical NIC selection in esxtop.
Now let's fail vmnic3.
Check again which uplinks are selected after failover.
All are now forced to use vmnic0.
Let's return vmnic3 to service.
Check again the teaming status.
You'll notice Photon-02 and -04 return to their positions in using vmnic3 as their preferred uplink from the switch.
And with "Failback" set to "False" the last esxtop status would have shown all the VMs still on vmnic0, yes?
Yes
Thanks.
What's after warrior? "Community Lieutenant", "Community General"?
Probably "Community Potwasher"
Ha! Ha!
Good luck with that!
One more thing please...
'And with "Failback" set to "False" the last esxtop status would have shown all the VMs still on vmnic0, yes?'. From the PortGroup uplink view, does that mean...
Uplink vmnic3 is moved into 'Standby uplinks' on PG?
Once vmnic3 is returned to service, with both vmnic0 and vmnic3 originally 'Active uplinks' (no 'standby' or 'unused') does it go back into 'Active uplinks' automatically and not into 'Standby' ?
I'm hoping vmnic3 goes back to 'Active uplinks' group automatically.
Vmnic3 will come back into its previous state, so if it was Active before the failure it will be restored to the Active list.
In "Route based on the originating virtual port", when a VM is powered on, the virtual switch choose an uplink based on a hash of the virtual port ID and on the # of uplinks on the virtual switch. Note: this is NOT round-robin amongst the uplinks.
For a VM still powered-on, this hash is re-calculated only whenever an uplink is added to or removed from the NIC Team. Whenever an active uplink fails it is "removed" from the Team.
In your situation with no standby uplinks, when the vmnic3 recovers, it rejoins the Team. Hence the hash will be recalculated. This is the reason why, those VMs move back to vmnic3. This "moving back" will occur, even if Failback had been set to No.
The Failback setting is relevant only if there is a standby uplink. This failback setting determines if the failed adapter (when recovers) will become active immediately (displacing the standby that took its place) [Failback = yes] or will remain unused (at least until another active adapter fails) [Failback = no].
This Failback setting has nothing to do with the 2 VMs “failing back” to vmnic3.