VMware Cloud Community
vmproteau
Enthusiast
Enthusiast
Jump to solution

Unintented HA failover events

In our environment, it appears that network changes or Host misconfiguration has a greater liklihood of triggering an HA event than an actual Host failure.

The things I've done to reduce these events:

  • Always having 2-pNics on the Service Console vSwitch.

  • Each NIC goes to a seperate physical switch.

  • Spanning tree protocol (STP)- disable STP on physical network interfaces connected to the ESX Server host. For Cisco-based networks, enable port fast mode for access inter­faces or portfast trunk mode for trunk interfaces (saves about 30 seconds during initialization of the physical switch port).

  • Etherchannel negotiation, such as PAgP or LACP - must be disabled because they are not supported.

  • Trunking negotiation (saves about four seconds).

What other things could I do?

0 Kudos
1 Solution

Accepted Solutions
BUGCHK
Commander
Commander
Jump to solution

You could increase the timeout: das.failuredetectiontime (milliseconds). In most environments 15 seconds is a bit too eager, I think.

View solution in original post

0 Kudos
4 Replies
weinstein5
Immortal
Immortal
Jump to solution

Also have a second service console port on a different network segment -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
vmproteau
Enthusiast
Enthusiast
Jump to solution

I've considered that so, with a 2nd Service Console, a Host isn't considered isolated unless both IPs are inaccessible? Is that correct?

0 Kudos
BUGCHK
Commander
Commander
Jump to solution

You could increase the timeout: das.failuredetectiontime (milliseconds). In most environments 15 seconds is a bit too eager, I think.

0 Kudos
vmproteau
Enthusiast
Enthusiast
Jump to solution

Agreed, 15 seconds is a relative hair trigger and I remmebered that I had already set this to 60 seconds.

0 Kudos