In our environment we have six ESX in one cluster on a set
of switches. Each vSwitch has a redundant connection to both physical switches
for redundancy of all networks including management / service console. In the
last four weeks we have had an issue where one of the UPS’ to which one of our
switches connect has tripped. This causes our entire cluster to become
unavailable and while some vm's remain online others do not. This looks to be
attributed to the heartbeat interval and the default policy to shutdown VM’s
when host connection is lost, since it will hit the timeout interval while the
We have redundant service console connections on each
server. Vmnic0 goes to switch 1 and vmnic1 goes to switch 2. I currently have
the service console/vmotion vswitch configured in active/active mode with both
nic’s. I have been reviewing the HA documents and they talk about using an
active/passive with rolling failover policy. So my question is, if I have both
NIC’s in active/active going to two different switches, is that possibly why I
am losing service console/management access when one of the switches reboots? I
am confused because I know for a fact that if I pull one of the cables from the
san or network vswitches the vm’s automatically failover to other nic’s in the
vswitch team and service continues uninterrupted however this has not proven to
be the case with the service console connections. As such I am confused as to
why the second active connection is not keeping access running when one of the
Any thoughts or ideas on this would be greatly appreciated.
Im still hoping someone can weigh in on this. I am looking for a definitive answer as to whether or not the service console nic's must be configured in an active/passive state for proper failover or whether that is simply a best practice.
Secondly, how are other configuring their environment to avoid these types of outages? Are you increasing the heartbeat interval? I am still confused as to why with a switch going offline, even though the second service console nic was connected to switch 2 and was active I lost management access one switch 1 rebooted.