Hello All,
After the vMotion of One node of the MSCS virtual machine, Failover triggered at the guest Cluster level, Here would like to understand the reason behind it.
Failover cluster nodes use the network to send heartbeat packets to other nodes of the cluster. If a node does not receive a response from another node for a specified period of time, the cluster removes the node from cluster membership. By default, a guest cluster node is considered down if it does not respond within 5 seconds. Other nodes that are members of the cluster will take over any clustered roles that were running on the removed node.
An MSCS virtual machine can stall for a few seconds during vMotion. If the stall time exceeds the heartbeat time-out interval, then the guest cluster considers the node down and this can lead to unnecessary failover.
Here my question is how & where to look to find the stall time of the virtual machine during vMotion based on that value will compare it with guest cluster heartbeat which is currently set at default i.e. 5secs.
The earliest response is highly appreciated!!!
Regards,
Azhar Shaikh.
Can anyone please respond on this, any suggestion?
Hello
I suggest to perform a test as follows:
1. perform a continuous ping to one of the VM(MSCS).
2. run a vmotion of this VM(MSCS)
Check the latency and the changes during the test.
set the cluster heartbeat to 10 seconds
attached link
https://docs.vmware.com/en/VMware-vSphere/6.7/vsphere-esxi-vcenter-server-67-setup-mscs.pdf