VMware Cloud Community
suhag79
Hot Shot
Hot Shot
Jump to solution

vSAN failover

Hi All,

we have a requirement to host around 12-15 VMs with 5 TB of storage. i was reading one pdf where they are suggesting to use minimum of 4 vSAN host.

The minimum configuration required for vSAN is 3 ESXi hosts, or two hosts in conjunction with an external witness node. However, this smallest environment has important restrictions. In vSAN, if there is a failure, an attempt is made to rebuild any virtual machine components from the failed device or host on the remaining cluster. In a 3-node cluster, if one node fails, there isnowhere to rebuild the failed components. The same principle holds for a host that is placed in maintenance mode. One of the maintenance mode options is to evacuate all the data from the host. However, this will only be possible if there are 4 or more nodes in the cluster, and the cluster has enough spare capacity. One additional consideration is the size of the capacity layer. Since virtual machines deployed on vSAN are policy driven, and one of those policy settings (NumberOfFailuresToTolerate) will make a mirror copy of the virtual machine data, one needs to consider how much capacity is required to tolerate one or more failures. This design consideration will be discussed in much greater detail shortly. Design decision: 4 nodes or more provide more availability options than 3 node configurations. Ensure there is enough storage capacity to meet the availability requirements and to allow for a rebuild of the components after a failure.

As per above, in 3 node vSAN cluster, if one node goes down, VMs running on that host will not fail over...is this correct and how ?

regards,

1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello suhag79,

Okay, let's back up a bit here:

As far as component placement goes, each ESXi host participating in the vSAN cluster is essentially considered a Fault Domain - a 3-node vSAN cluster with standard FTT=1 Objects will place one of each of the three components that make an Object (data component + data component + witness component) on each of the three hosts. Thus if a host goes down you would only have access to 2 out of these 3 components but regardless of which is not accessible you still have access to a data component and also access to the MAJORITY of components, so this Object is still accessible and thus usable.

Bob

View solution in original post

6 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello suhag79,

If HA is configured on the cluster with a  'Restart VMs' response to 'Host Failure' then the VMs will get restarted on the remaining 2 nodes in the cluster.

If there is quorum of the components accessible (e.g. 2/3 available of an FTT=1 standard vSAN Object) then the Objects (vmdk and namespace Objects) will be available to the two remaining nodes and the VMs can be restarted.

This can be easily tested using HOL lab for vSAN and vsish panic command to PSOD a host:

http://labs.hol.vmware.com/HOL/catalogs/

Bob

0 Kudos
suhag79
Hot Shot
Hot Shot
Jump to solution

Thanks Bob,

then why VMware pdf  have below notes,

In a 3-node cluster, if one node fails, there is no where to rebuild the failed components. The same principle holds for a host that is placed in maintenance mode. One of the maintenance mode options is to evacuate all the data from the host.

above is correct of 3/3 component got failed ?

regards,

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello suhag79,

"In a 3-node cluster, if one node fails, there is no where to rebuild the failed components."

Yes, this is true as there would only be two Fault-Domains available, however as I said above - 3/3 components do not need to be available for an Object to be accessible, only a majority (e.g. 2/3 components available) for the VM Objects to be accessible.

I do advise going with a 4-node cluster instead of a 3-node cluster for this reason - if I a host fails in a 3-node cluster, it can't tolerate another failure after that(or recover from multiple permanent failures), however a 4-node cluster can resync the data and then tolerate another failure after the data has been resynced.

"The same principle holds for a host that is placed in maintenance mode."

A host in Maintenance Mode is not an available Fault-Domain for rebuilding components in the same manner that a host that has crashed.

"above is correct of 3/3 component got failed ?"

Not sure what you mean here, let me know if the above doesn't clarify things.

Bob

0 Kudos
suhag79
Hot Shot
Hot Shot
Jump to solution

Thanks Bob,

getting clear now.

so the below statement is applicable only if we are using fault domain in vSAN cluster,  correct ?

"In a 3-node cluster, if one node fails, there is no where to rebuild the failed components."

Anyways, In my case, all the hosts are going to be in a single rack or max two racks.

Regards,

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello suhag79,

Okay, let's back up a bit here:

As far as component placement goes, each ESXi host participating in the vSAN cluster is essentially considered a Fault Domain - a 3-node vSAN cluster with standard FTT=1 Objects will place one of each of the three components that make an Object (data component + data component + witness component) on each of the three hosts. Thus if a host goes down you would only have access to 2 out of these 3 components but regardless of which is not accessible you still have access to a data component and also access to the MAJORITY of components, so this Object is still accessible and thus usable.

Bob

suhag79
Hot Shot
Hot Shot
Jump to solution

thanks Bob,

it was really helpful

0 Kudos