ManuelDB
Enthusiast
Enthusiast

Strange Active/Standby behaviour

Hi,

I cannot explain this behaviour, and need help on it.

Environment: 2 Hosts  VMware ESXi, 7.0.3, 20328353

Network: vmnic0/1 1Gbps (Huawei Switch)
                vmnic2/3 10Gbps (Cisco Switch)
                vmnic4/5 disconnected

VSwitch0 configuration

ManuelDB_0-1692866567792.png


VM Portgroup WITHOUT override

VM vmnic allocation:

ManuelDB_1-1692866745891.png

What's happening there? A lot of VMs are running on vmnic0/1 that are standby.
All vmnics are up
Both hosts same problem

Thanks
Manuel

 

Reply
0 Kudos
scott28tt
VMware Employee
VMware Employee

What is the failover order on the port groups within the vSwitch that the VM network connections are configure for?

-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
Reply
0 Kudos
ManuelDB
Enthusiast
Enthusiast

On portgrtoups failover order is not overriden, so it's the same as the switch.

Now I have vmotioned all the vms between hosts (more than 1 time forward and back) and all vmnics are correct again (one single vmotion forward and back had not been not enough for some VMs).

Could this be because vmnic0 and 1 was active before (like vmnic0-5 all active on failover order on vswitch0), so VMs had associated these nics and then the failover order had been changed switching these nics to standby (and this hadn't triggered a port change on VMs)?
I'm not following directly this infrastructure (only in these days because the owner is in holiday) so I don't know what happened on the past, but I'm genuinely cuorious on what could trigger this behaviour.

Reply
0 Kudos
Kinnison
Expert
Expert

Hello,


I don't find anything strange about it, if the purpose of a "standby" VMnic is to intervene when an "active" VMnic for some reason "fail", then the "failover" mechanism is just doing its job. You have configured four VMnics as "active" but only two are connected, so let's say you have two "failed", but at the same time you have as many in "standby" to kick in, which is what happened.


I've just read that it's not your vSphere infrastructure, so the good dispassionate advice is not to touch anything and "leave the problem to its owner".


Regards,
Ferdinando

Reply
0 Kudos
ManuelDB
Enthusiast
Enthusiast

mmm... this is a good consideration. I was expecting that the disconnected interfaces were not considered on the failover policy, or at least, that until I have at least 1 connected interface on the active failover configuration, all the VMs would be migrated on it, and only once the last active interface fails, then there would be a switch to standby interfaces.... but effectively your point could cause the behaviour I was seeing...

So this can be explained by your point, but from my point of view that's not the best/clearest way to behave.

At this point how can I obtain what I was expecting (that only active ports are used until no one active ports would be connected)?
Because I think that I cannot use the Hash Load Balancing and create 2 Eth-Trunk (one per active on Cisco switch and one for standby on Huawei switch) because probably the vSwith would try to negotiate the Eth-Trunk with all the ports together... so is it not possible?

Reply
0 Kudos
Kinnison
Expert
Expert

Hello,


And how does a system distinguish the reason why an intended "active" VMnic is instead in a "link down" state because it has never been connected or due to a malfunction or something else. From a practical point of view it is the exact same condition, the unavailability of a VMnic that I have specified in a bunch of "active" VMnics in a load balancing context, which are supposed to be in use and intrinsically functional (all of them), to which accordingly one (or more) of the available "standby" VMnic replaces the non-functional one(s) in the role. Don't you think that a different behavior would be somehow inconsistent.


Also take into account the settings of each single "portgroup" that lands on the same vSwitch because the settings of each one, if not inherited, can produce a different use of each VMnic, although in general terms when the "failover" behavior does not change, when an "active" VMnic goes down the configured "standby" VMnic kicks in (the explicit failover order is a separate story).


Take a look here, which is a little clearer:
https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-D34B1ADD-B8A7-4...


Regards,
Ferdinando

Reply
0 Kudos
ManuelDB
Enthusiast
Enthusiast

I understand your point of view, but I was thinking about active/sstandby more like two ethertrunk and standby disabled by STP, so while there is at least one active connection, the active ethertrunk was working and then, when no more active link found, it would switch on standby links.

Good to learn the correct behaviour, so thanks! :thumbs_up:

Reply
0 Kudos
Kinnison
Expert
Expert

Hello ManuelDB,


A rather concrete example, a vSwitch thus configured,

Standard load balancing based on "ID" and everything else as default
VMNIC0, VMNIC1, VMNIC2 all as "Active".
VMNIC3 as "Standby".


By detaching any (it doesn't matter which) of the "active" VMnic as expected the "standby" VMnic took over the role and the VMs were "rebalanced" using it. Then there are a number of other combinations that produce results, so to speak (very little) "amusing" but that's another story, what we can assume however is that it doesn't work at all like a barely decent physical switch configured with the same "logic".


Regards,
Ferdinando

Reply
0 Kudos