Good Morning,
We recently received alerts from Microsoft SCOM that nodes of a cluster were not able to communicate. As it turns out, the alerts correspond to when the VMs were being migrated from one host to another. Communication was lost and established again. The communication that is being lost is our Cluster Heartbeat communicaton. This is a Windows Failover Clustering on a Windows 2012 R2 guest. The hosts are ESXi 6.0 U1. VMware tools are running and current on the guest as well. No cluster alerts are generated. I am just wondering if anyone has a similar configuration and if so, has seen this type of behavior.
I will also be reaching out to Microsoft for this as well. I wanted to approach it from all angles.
Thank You
Brian Dougherty
Please check the following guide for vmotion support, some of the clustering type wont support vmotion which will cause failover
We had similar issues. The default heartbeat settings are too low for when the VM is stun to transfer to the new host. As such, we adjusted our clusters to the same settings as if the hyper-v role was installed.
Default Settings
Windows Server 2012 and later: ( MSDN Blog {B.})
Parameter | Fast Failover (Default) | Relaxed | Maximum |
SameSubnetDelay | 1 second | 1 second | 2 seconds |
SameSubnetThreshold | 5 heartbeats | 10 heartbeats | 120 heartbeats |
CrossSubnetDelay | 1 second | 1 seconds | 4 seconds |
CrossSubnetThreshold | 5 heartbeats | 20 heartbeats | 120 heartbeats |
The Fast Failover column defines the default values for WSFC heartbeat. If the servers are on the same subnet or a different subnet, the failover will occur after 5 failed heartbeats that are 1 second part for a total of 5 seconds.
The Relaxed values are the recommended settings when the Hyper-V role is installed. If the servers are on the same subnet the failover will occur after 10 failed heartbeats that are 1 second part for a total of 10 seconds. If the servers are on a different the failover will occur after 20 failed heartbeats that are 1 second part for a total time of 20 seconds.
These settings can be configured via powershell
Import the cmdlets
Import-Module FailoverClusters
View current settings
Get-cluster | fl *subnet*
Adjust the values
(get-cluster).SameSubnetThreshold = 10
(get-cluster).CrossSubnetThreshold = 20
HI Mate ,
Need your input regarding changing below value fixed your issues :
Do we need to change to the given value :
Adjust the values
(get-cluster).SameSubnetThreshold = 10
(get-cluster).CrossSubnetThreshold = 20
In Our environment , we have multi subnet failover cluster :
Always on Availability cluster , after vmotion , cluster panic and starts failover . .
Regards,
Santosh