VMware Cloud Community
Ved369
Contributor
Contributor

Network partition scenario with 3 hosts

Hello All,

I have 3 hosts with VSAN 6.2.

When host1 is network partitioned, VM does not restart on other hosts. Did not VM supposed to be killed as no quorum and restarted on host2 and host 3.

Reply
0 Kudos
17 Replies
TheBobkin
Champion
Champion

Hello,

Are ALL of the test VM Objects (Disks + Namespace + any present snapshots) using a Storage Policy with FTT=1?

If so, are they compliant with this policy?

How are you isolating just one host?

Does the VM you are testing have any individual host isolation response over-rides configured?

Do you have HA configured and enabled on this cluster?

Did you reconfigure HA after making any changes to the Networking in the cluster?

More useful information regarding HA in vSAN from depping​ :

http://www.yellow-bricks.com/2013/09/19/isolation-partition-scenario-with-vsan-cluster-handled/

(Relatively old article but I can't see anything obvious that has changed since)

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

Reply
0 Kudos
Ved369
Contributor
Contributor

Hello,

thanks for replying.

Please see my response below.

Are ALL of the test VM Objects (Disks + Namespace + any present snapshots) using a Storage Policy with FTT=1? Yes

If so, are they compliant with this policy? Yes

How are you isolating just one host? I am doing network partition instead of host isolation by disconnecting NICs for VSAN connection.

Does the VM you are testing have any individual host isolation response over-rides configured? VM has default setting as cluster i.e. Leave VM power on, though same result on Shutdown VM on isolation, as scenario is network partition not isolation.

Do you have HA configured and enabled on this cluster? Yes

Did you reconfigure HA after making any changes to the Networking in the cluster? Yes

In the senario of 4 hosts,  as per http://www.yellow-bricks.com/2013/09/19/isolation-partition-scenario-with-vsan-cluster-handled   in third senario, VM would restart in host 3 or 4. I was thinking of same type of behaviour in 3 hosts as well. BUT VM does not power on in host 2 and 3 just goes unresposive and keep pinign on network.

Reply
0 Kudos
TheBobkin
Champion
Champion

Okay, thanks for clarifying all of that.

I think the point might be the difference between definition of 'isolation' and 'partition' here.

depping clarifies this better here:

https://communities.vmware.com/thread/514497

Thus I don't think in this scenario it will power off the VM.

Have you tested if the VM restarts after killing it?

By the way, a better test for pulling vSAN traffic than removing the NICs is to simply untick the 'vSAN Traffic' box on the configured vmkernel interface.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

Reply
0 Kudos
Ved369
Contributor
Contributor

Hi,

I understand difference between isolation and partition.

Just wanted to know if this is expected behaviour?

As per yellow bricks, looked like VM was supposed to be restarted at host 2 or 3 as that partition had more VM components.

Reply
0 Kudos
TheBobkin
Champion
Champion

Hello,

Yes, looks like it is expected behaviour:

"Note that the VM in Partition-1 will not be powered off, even if you have configured the isolation response to do so"

Isolation / Partition scenario with VSAN cluster, how is this handled?

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

Reply
0 Kudos
Ved369
Contributor
Contributor

Hello,

But VM was supposed to restart at host 2 or 3 also, as they had quorum. But VM did not start anywhere.

Reply
0 Kudos
martinriley
Hot Shot
Hot Shot

Hi there,

What are your HA isolation addresses set to?

Reply
0 Kudos
Ved369
Contributor
Contributor

Hello,

It was default management gateway, so host could still ping management default gateway and made it network partition. I could work with host isolation, though network partition swings my mind.

Reply
0 Kudos
admin
Immortal
Immortal

you need to configure HA for when host is isolated. default is leave powered on.

VMware Virtual SAN & vSphere HA Recommendations - VMware vSphere Blog

this may help you.

Reply
0 Kudos
martinriley
Hot Shot
Hot Shot

When vSAN and HA is enabled in the same cluster the intergaent HA heartbeat traffic leverages the storage network not the management network, therefore in your scenario I believe this would be an isolation condition not a partition condition as your host would be unable to contact the isolation address, and your VM then behaved accordingly. 

You can test this by changing the isolation response to 'Power off and restart' and perform the same test, your VM should be restarted on one of your other hosts.

vM

-----------------------

VCAP-DCD / VCAP-DCA / VCP-CLOUD / VCP-DT / VCP-NV / VCP6 / VCP5 / VCP4

-----------------------

vMustard.com

Reply
0 Kudos
depping
Leadership
Leadership

what does fdm.log say? Does it call it out as an isolation?

Reply
0 Kudos
Ved369
Contributor
Contributor

Hello,

Thanks all for replying to my doubt.

@martinriley :::To get host isolation scenario, no host isolation is put. though VSAN network is used in VSAN, isolation is not  chahnged by default. It is network partition as default gateway of management network is reachable from host1.

@Byounghee::Network isolation setting does not help here as it is network partition.

@ depping:: In vCenter, it is mentioned as network partition and host 1 shows in group 1 and host 2 and host 3 in group 2.

pastedImage_0.png

pastedImage_1.png

Below is line form fdm.log, these lines seems to be talking about other 2 hosts in cluster, as they have 13 and 14 poweredon VMs.

2017-04-19T16:43:40.805Z verbose fdm[FFB13B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] [ClusterManagerImpl::ProcessSlavePowerOnListChanges] host host-13789 listVersion=7084356088372 isolated=false poweredOnVms=14

2017-04-19T16:43:40.805Z verbose fdm[FFB13B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] [ClusterManagerImpl::ProcessSlavePowerOnListChanges] host host-13785 listVersion=7082655673765 isolated=false poweredOnVms=13

Reply
0 Kudos
depping
Leadership
Leadership

For vSAN it shows the partition, you have 3 partitions by the looks of it, in other words each host is isolated from a VSAN stance. HA however says there's no (FDM log) isolation, hence the Isolation response it not triggered, and this is because the gateway is probably still reachable.

Reply
0 Kudos
Ved369
Contributor
Contributor

Hello Depping,

From screenshot, host1 is in group1 and host 2 and host 3 are in group 2. So only two partition, isn't it?

Reply
0 Kudos
depping
Leadership
Leadership

sorry, correct, two partitions from a vSAN perspective, no isolation from an HA perspective, hence nothing happened

Reply
0 Kudos
Ved369
Contributor
Contributor

Hello depping

Thanks for update.

HA uses same network as VSAN in VSAN cluster, so isolation or partition should be same from HA and VSAN perspective.

And i wanted to understand if this is behavior in 3 Node cluster in partition, how behavior is different in http://www.yellow-bricks.com/2013/09/19/isolation-partition-scenario-with-vsan-cluster-handled/ ,then in this case also VM should not restart anywhere, as VM is partitioned in this case also.

Regards
Ved

Reply
0 Kudos
depping
Leadership
Leadership

Difficult to say why this is from the outside. If you aren't getting what it is expected please contact VMware support and let them analyze the environment and the situation.

Reply
0 Kudos