VMware Cloud Community
rleon_vm
Enthusiast
Enthusiast
Jump to solution

VSAN, HA, Isolation Response, Network Partition outcomes

Hi all,

I guess this post is 50% question and 50% sharing of information.

I was just casually writing down some failure scenarios for VSAN, and the next thing I know, it became this monster table.

What I'm still unclear about, is the usefulness of Datastore Heartbeating (DH for short) in a VSAN enviornment. (Through FC, iSCSI or NFS datastores in addition to the VSAN datastore).

Sure, I get that it is not required and is not usually implemented in a VSAN cluster, but in all the outcomes in which DH is present, it seems to do more bad than good. (Outcomes A1,B1,C1 and D1 below)

If anyone has more information on the benefits of DH in a VSAN cluster, please let us know.

I'm specifically referring to this line from the VMware® vSAN™ Design and Sizing Guide (last checked on Mar/2018):

"Heartbeat datastores are not necessary for a vSAN cluster, but like in a non-vSAN cluster, if available, they can provide additional benefits. VMware recommends provisioning Heartbeat datastores when the benefits they provide are sufficient to warrant any additional provisioning costs."

... is pretty vague. I hope they would clarify the "additional benefits" in a future update.

Before you read any further, please be aware that:

  • The following information is valid for both local and Stretched Clusters.
  • If you want to look at it from a Stretched Cluster perspective, then just manually/mentally change all "FTT"s to SFTTs in the following text.
  • Datastore Heartbeating is not supported in a Stretched Cluster configuration. Therefore, whenever I talk about the effects of a Datastore Heartbeat, you can assume it is just a local VSAN cluster.
  • Although unlikely in most VSAN deployments, you can give a VSAN cluster Datastore Heartbeating by giving every host shared access to 2x FC/iSCSI/NFS luns.
  • Please be reminded that in a VSAN cluster, HA traffic between the HA Master (aka FDM Master) and the HA host agents use the VSAN network and not the Management network.
  • But for simplicity, whenever I talk about a network partition or host isolation, you can assume that both the VSAN and the Management networks go bad together.
  • In all scenarios, we can assume that the VM network(s) (production traffic) is not affected. This will help emphasize the importance of a running VM in some situations.

Let's assume that we have a 6 host VSAN cluster. (again, for this topic it doesn't matter if it is a local 6 host VSAN cluster or a 3+3+1witness Stretched Cluster. The site failure scenarios for a Stretched Cluster is a whole other topic.).

Visually, we have: H-H-H-H-H-H

Outcome

Scenario: Network partition, E.g.: H-H-H---x----H-H-H, where the hosts are now split in to two groups.

Note: Everything below are from the perspective of a specific VM and the host it is currently running on. For simplicity, let's say the VM only has a single VMDK.

A.

FTT=1(or above), you can end up with either A1 or A2 below.

A1.

The network partition where the VM is running does not have quorum for the VM's VMDK data-components.

  • VSAN makes what's left of the VMDK data-component copies (even if any) inaccessible to the VM in this partition (IF it runs in this partition), but VSAN does not power-off the VM, HA Isolation Response makes this decision. (Reminder: The VSAN.AutoTerminateGhostVm=1 mechanism is only applicable in a Stretched Cluster, that's why VSAN does not power-off the VM in this situation.)
  • However, HA Isolation Response is not triggered, therefore VM is not powered-off. (Note: A host does not consider itself isolated for as long as it can communicate with at least one other host)
  • HA Master (aka FDM Master) in the other partition powers-on the VM where the VMDK's data-component has VSAN quorum, creating a VM split-brain. This may or may not cause IP/MAC conflicts depending on whether the VM network is also partitioned.
  • When the network recovers, the stale data-component gets resync from the other partition where the new VM instance is running, and HA kills the "ghost" VM still running in this partition.
  • If Datastore Heartbeat is available in the cluster, HA Master in the other partition would learn that the host where the "ghost" VM is still running (though without access to its VMDK) is actually still alive, and therefore would not power-on a new instance of the VM, making the situation worst.
A2.The network partition where the VM is running has quorum for the VM's VMDK data-components.
  • VM remains running and can still access its VMDK data-components in this partition.
  • HA Master in the other partition cannot power-on a new instance of the VM where VSAN has no quorum, since the VMDK data-components (even if any) are made inaccessible.
B.FTT=0, you can end up with either B1 or B2 below.
B1.The network partition where the VM is running does not have the VM's VMDK data-component.
  • VM loses access to its VMDK data-component, but VSAN does not power-off the VM, HA Isolation Response makes this decision.
  • However, HA Isolation Response is not triggered because host is not isolated, therefore VM is not powered-off. (Note: A host does not consider itself isolated for as long as it can communicate with at least one other host)
  • HA Master in the other partition powers-on the VM where the VMDK's data-component is still available and accessible, creating a VM split-brain. This may or may not cause IP/MAC conflicts depending on whether the VM network is also partitioned.
  • When the network recovers, HA kills the "ghost" VM still running in this partition.
  • If Datastore Heartbeat is available in the cluster, HA Master in the other partition would learn that the host where the "ghost" VM is still running (though without access to its VMDK) is actually still alive, and therefore would not power-on a new instance of the VM, making the situation worst.
B2.The network partition where the VM is running has the VM's VMDK data-component.
  • VM remains running and can still access its data-component in this partition.
  • HA Master in the other partition cannot power-on a new instance of the VM since the VM's VMDK data-component is not available in that partition.

Outcome

Scenario: Host Isolation, E.g.: H---x----H-H-H-H-H, where a single host is network isolated from the other hosts.

C.FTT=1(or above), you can only end up with C1 below.
C1.An isolated host would not have component-quorum for any VMDK (reminder: FTT is 1 or above, so each VMDK would have more than one component):
  • VSAN makes all VMDK data-components inaccessible to all VMs running on this host, but VSAN does not power-off the VMs, HA Isolation Response makes this decision.
  • HA Isolation Response is triggered.
  • HA Master in the other partition powers-on the VMs where their data-components have quorum.
  • Depending on the Isolation Response, possible VM split-brain situation. A VM split-brain may or may not cause IP/MAC conflicts depending on whether the VM network is also partitioned.
  • When the network recovers, the stale data-component gets resync from the other partition where the new VM instances are running, and HA kills the "ghost" VMs still running on the isolated host. (If the Isolation Response was to leave power-on.)
  • The VM split-brain situation could be prevented if the Isolation Response is set to powere-off.
  • If the Isolation Response is leave power-on (not recommended for VSAN anyway), and Datastore Heartbeat is available in the cluster, HA Master in the other partition would learn that the host where the "ghost" VM is still running (though without access to its VMDK) is actually still alive, and therefore would not power-on a new instance of the VM, making the situation worst.
D.FTT=0, you can end up with either D1 or D2 below.
D1.If a VM's data-components are not located on the isolated host but the VM is running on the host:
  • VSAN makes all VMDK data-components inaccessible to all VMs running on the isolated host, but VSAN does not power-off the VMs, HA Isolation Response makes this decision.
  • HA Isolation Response is triggered.
  • HA Master in the other partition powers-on the VMs where their data-components are still available and accessible.
  • Depending on the Isolation Response, possible VM split-brain situation. A VM split-brain may or may not cause IP/MAC conflicts depending on whether the VM network is also partitioned.
  • When the network recovers, HA kills the "ghost" VMs still running on the isolated host. (If the Isolation Response was to leave power-on.)
  • The VM split-brain situation could be prevented if the Isolation Response is set to powere-off.
  • If the Isolation Response is to leave power-on (not recommended for VSAN anyway), and Datastore Heartbeat is available in the cluster, HA Master in the other partition would learn that the host where the "ghost" VM is still running (though without access to its VMDK) is actually still alive, and therefore would not power-on a new instance of the VM, making the situation worst.
D2.If a VM's data-components are located on the isolated host and the VM is running on the same host:
  • VM remains running and can still access its data-components.
  • However, HA Isolation Response is triggered.
  • HA Master in the other partition cannot power-on the VM since the VM's VMDK data-components are not available in that partition. (Because FTT=0)
  • Depending on the Isolation Response, the VM may remain running or get powered-off. (This is one outcome where IR=power-off is the worst option)

OutcomeScenario: The entire network is down, where all hosts are network isolated from each other.
E.FTT=1(or above), you can only end up with E1 below.
E1.An isolated host would not have component-quorum for any VMDK (reminder: FTT is 1 or above, so each VMDK would have more than one component):
  • No component has VSAN quorum, and therefore all VMDKs will become inaccessible on all hosts.
  • HA Isolation Response is triggered on all hosts. (But even if VMs are left powered-on, they would not have access to their VMDKs.) - Note: Actually not sure about this one, because in a "traditional" FC/iSCSI/NFS based datastore cluster, if an isolated host cannot see the "protectedlist" being locked by a HA Master, then it would not actually trigger its Isolation Response. Update: HA Isolation Response would not be triggered on any host. See Depping's answer below.
  • If your VSAN cluster does not have Datastore Heartbeat, then all hosts would trigger the Isolation Response. This is one outcome where IR=power-off is the worst option.
  • If your VSAN cluster has Datastore Heartbeat, then through it, each host would see that no HA Master is elected (because every host is isolated), and would therefore not trigger the Isolation Response, which leaves VMs powered on. This is one scenario where having Datastore Heartbeat can make the situation "less bad".
  • However, please note that even if VMs are left powered-on, they would not have access to their VMDKs, limiting their usefulness.
  • There will be no HA Master to power-on VMs elsewhere, but regardless, all VMDKs will become inaccessible for powering-on.
F.FTT=0
F1.Same as D1, except there will be no HA Master to restart any VMs on any host.
F2.Same as D2.
1 Solution

Accepted Solutions
depping
Leadership
Leadership
Jump to solution

First of all, you have perception about what is good and what is bad. What may sound like a bad situation to you, may not be bad for me? Lets look at the scenarios:

You need to ask yourself first: how is the network connected? And what are the chances that a user can still connect to the VMs running in the partition which has the components go inaccessible? Why is this important? Well as the VMs may be restarted in the other location when you do not have a heartbeat datastore, this could easily lead to a situation where a client is connected to a server, writing data, but that server may never be able to write the data to disk. Very undesirable situation. I don't know if you tested these scenarios, but I ran through many of these in the past, and Windows could easily sit 5-10 minutes without being able to write to disk before blue screening. And depending on how it fails, the network will also be experiencing some very strange issues with duplicate mac addresses and duplicate ip's. It will not be pretty, hence to prevent this the heartbeat datastore is very valuable!

A1 >> Correct, but see above where this could be a problem

A2 >> Not a problem indeed, correct

B1 >> Correct, but "worse" depends on what your requirements are. I would prefer to avoid duplicate IPs and mac addresses personally

B2 >> Correct

C1 >> Again, I am not sure the situation is worse with a heartbeat datastore. See above

E1 >> This should not trigger the isolation response, there's no healthy host, but I have not tested this with vSAN to be honest.

So for my understanding, what is the purpose of this exercise? What are you trying to design for, or are you trying to prevent from happening?

View solution in original post

6 Replies
depping
Leadership
Leadership
Jump to solution

I wroten an article on this topic a while back, actually multiple if you do a search on my blog.

vSphere HA heartbeat datastores, the isolation address and vSAN - Yellow Bricks

Anyway, the heartbeat datastore can be used during an isolation or partition to inform the other side what has happened. Datastore heartbeating is supported in stretched as well, you just need to make sure you have a "shared datastore" local to the location, which is also accessible remotely.

Also good to know, in a stretched cluster when there's a partition or an isolation then VSAN can kill the VMs in that particular segment of the partition where ALL components have become inaccessible. It can do the same for an isolation situation. You do not need the Isolation Response configured for that.

Reply
0 Kudos
rleon_vm
Enthusiast
Enthusiast
Jump to solution

Hi Duncan,

Thanks for your reply.

I do follow your articles and read your HA Deep Dive. Where else can one get such insightful information on these topics? Smiley Happy

This is just me trying to compile everything in a single place, primarily for my own referencing convenience.

I do understand how the Datastore Heartbeat could let the other partition find out more, but if you would be so kind as to read through events A1,B1,C1 and D1 in my post, you could see why I think having Datastore Heartbeat in a VSAN cluster would make things worst. Note: your point about VSAN killing VMs (VSAN.AutoTerminateGhostVm=1) did get a mention in A1.

In fact, you even said so in your own article you linked:

However, the VMs which are running on the isolated host are more or less useless as they cannot write to disk anymore.

... which is the exact point I'm trying to make. Having Datastore Heartbeat in this situation makes the outcome worst because why wouldn't you want the HA Master to restart the VM elsewhere if the original VM has already lost access to its VMDK? For a Windows guest OS, it would have likely BSOD'd in 1 minute, rendering the original VM useless anyway. (To be clear, imagine a situation where the VM's production network on the ESXi host is also isolated/partitioned, meaning that there will be no IP conflict even if HA Master restarts the VM elsewhere.)


The lack of further elaboration on this point in the VMwarer vSAN Design and Sizing Guide doesn't help either. It just sort of lampshaded Datastore Heartbeat and VSAN in a short uninformative paragraph.

On a side note, I was hoping you would also shed some light on event E1, on whether all hosts would trigger the Isolation Response or none will.

Thank you for your time!

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

First of all, you have perception about what is good and what is bad. What may sound like a bad situation to you, may not be bad for me? Lets look at the scenarios:

You need to ask yourself first: how is the network connected? And what are the chances that a user can still connect to the VMs running in the partition which has the components go inaccessible? Why is this important? Well as the VMs may be restarted in the other location when you do not have a heartbeat datastore, this could easily lead to a situation where a client is connected to a server, writing data, but that server may never be able to write the data to disk. Very undesirable situation. I don't know if you tested these scenarios, but I ran through many of these in the past, and Windows could easily sit 5-10 minutes without being able to write to disk before blue screening. And depending on how it fails, the network will also be experiencing some very strange issues with duplicate mac addresses and duplicate ip's. It will not be pretty, hence to prevent this the heartbeat datastore is very valuable!

A1 >> Correct, but see above where this could be a problem

A2 >> Not a problem indeed, correct

B1 >> Correct, but "worse" depends on what your requirements are. I would prefer to avoid duplicate IPs and mac addresses personally

B2 >> Correct

C1 >> Again, I am not sure the situation is worse with a heartbeat datastore. See above

E1 >> This should not trigger the isolation response, there's no healthy host, but I have not tested this with vSAN to be honest.

So for my understanding, what is the purpose of this exercise? What are you trying to design for, or are you trying to prevent from happening?

rleon_vm
Enthusiast
Enthusiast
Jump to solution

I have just been trying to consolidate every VSAN HA outcome into one small block of condensed information. Kind of like a cheat sheet, if you will.

This could potentially save me and my clients alot time during POC exercieses, and/or during real VSAN HA scenarios that could require me and my team's effort to investigate and do post mortem reports. And now, I hope it could help the others who are reading this thread too.

I do understand and agree with you that the "bad" situations I described that can be cause by the Datastore Heartbeat, could just as easily demonstrate how valuable it is.

But then again, as you also said, it does depend on how it fails.

That is why I added the line about how let's assume a scenario where the isolated host's VM production network is also isolated along with the VSAN network, which effectively seals off the possibility of an IP/MAC conflict.

Thanks for clarifying E1, I'll update my notes. I haven't gotten around to testing this in VSAN either, so I had my suspicions on whether it would turn out the same as when using a FC shared lun.

Also, thanks for telling us that a Heartbeat Datastore can be used in a Stretched Cluster.

The reason why I thought it's not supported in a Stretched Cluster is because that's what it says in the VSAN Admin Guide:

Configure HA settings for the stretched cluster.

  • HA must be enabled on the cluster.
  • HA rule settings should respect VM-Host affinity rules during failover.
  • Disable HA datastore heartbeats.

Thank you so much for your time.

I have everything I need on Datastore Heartbeat and VSAN now.

depping
Leadership
Leadership
Jump to solution

No problem, and it is always good to see people who aim to get to the bottom of things!

Reply
0 Kudos
rleon_vm
Enthusiast
Enthusiast
Jump to solution

Hi Ducan,

For event E1 in my opening post, what actually happens is:

  • If your VSAN cluster does not have Datastore Heartbeat, then all hosts would trigger the Isolation Response. This is one outcome where IR=power-off is the worst option.
  • If your VSAN cluster has Datastore Heartbeat, then through it, each host would see that no HA Master is elected (because every host is isolated), and would therefore not trigger the Isolation Response, which leaves VMs powered on. However, note that even if VMs are left powered-on, they would not have access to their VMDKs (FTT=1 or above), limiting their usefulness.

I'll again update the opening post.

Reply
0 Kudos