BrianDougherty
Contributor
Contributor

Issue with Windows Cluster communications dropping during Vmotion.

Good Morning,

We recently received alerts from Microsoft SCOM that nodes of a cluster were not able to communicate.  As it turns out, the alerts correspond to when the VMs were being migrated from one host to another.  Communication was lost and established again. The communication that is being lost is our Cluster Heartbeat communicaton.  This is a Windows Failover Clustering on a Windows 2012 R2 guest.  The hosts are ESXi 6.0 U1.  VMware tools are running and current on the guest as well. No cluster alerts are generated.  I am just wondering if anyone has a similar configuration and if so, has seen this type of behavior.

I will also be reaching out to Microsoft for this as well.  I wanted to approach it from all angles.

Thank You

Brian Dougherty

0 Kudos
3 Replies
SureshKumarMuth
Commander
Commander

Please check the following guide for vmotion support, some of the clustering type wont support vmotion which will cause failover

Microsoft Windows Server Failover Clustering on VMware vSphere 5.x: Guidelines for supported configu...

Regards, Suresh https://vconnectit.wordpress.com/
0 Kudos
techguy129
Expert
Expert

We had similar issues. The default heartbeat settings are too low for when the VM is stun to transfer to the new host. As such, we adjusted our clusters to the same settings as if the hyper-v role was installed.

Default Settings

Windows Server 2012 and later: ( MSDN Blog {B.})

  Parameter

  Fast Failover (Default)

  Relaxed

  Maximum

SameSubnetDelay

  1 second

  1 second

  2 seconds

SameSubnetThreshold

  5 heartbeats

  10 heartbeats

  120 heartbeats

CrossSubnetDelay

  1 second

  1 seconds

  4 seconds

CrossSubnetThreshold

  5 heartbeats

  20 heartbeats

  120 heartbeats

The Fast Failover column defines the default values for WSFC heartbeat. If the servers are on the same subnet or a different subnet, the failover will occur after 5 failed heartbeats that are 1 second part for a total of 5 seconds

The Relaxed values are the recommended settings when the Hyper-V role is installed. If the servers are on the same subnet the failover will occur after 10 failed heartbeats that are 1 second part for a total of 10 seconds. If the servers are on a different the failover will occur after 20 failed heartbeats that are 1 second part for a total time of 20 seconds.

These settings can be configured via powershell

Import the cmdlets

Import-Module FailoverClusters

View current settings

Get-cluster | fl *subnet*

Adjust the values

(get-cluster).SameSubnetThreshold = 10

(get-cluster).CrossSubnetThreshold = 20

0 Kudos
santoshannie
Contributor
Contributor

HI Mate ,

Need your input regarding changing below value fixed your issues :

Do we need to change to the given value :

Adjust the values

(get-cluster).SameSubnetThreshold = 10

(get-cluster).CrossSubnetThreshold = 20

In Our environment , we have multi subnet failover cluster :

pastedImage_4.png

Always on Availability cluster , after vmotion , cluster panic and starts failover . .

Regards,
Santosh

0 Kudos