VMware Cloud Community
cvinod
Contributor
Contributor

VSAN Erasure coding & failure domains

Hello,

<'am new to this forum. Please point me to the right place to ask this query if this is not the right forum >

Consider a scenario of a single VSAN-AF cluster of >6 nodes (say 16 or 32 nodes) .The intent is to use 

RAID6 EC storage policy for the hosted VMs. [Assume 2 or 3 DG's per node]

Is it possible to design this cluster (i.e. nodes spread across multiple fault domains) in such away that one can

tolerate 4 node failures in the cluster and still allow VMs in the cluster to continue to have access to their data ?

If that is really feasible, then is there a minimum node count needed in the cluster to support such a scenario,

and how would those nodes be spread across the recommended number of fault domains ?

Thanks

Vinod

Reply
0 Kudos
8 Replies
GreatWhiteTec
VMware Employee
VMware Employee

Hi cvinod,

You can configure fault domain to "group" hosts per rack and be able to sustain a full rack outage. For a 4 node failure scenario, using default FTT=1, you will need a bare minimum of 12 hosts ) "grouped" into 4 fault domains (3x4=12). However, it is highly recommended to have an additional fault domain so that you can rebuild that date in case of longer outages. In this case, I would recommend a minimum of 16 hosts "grouped" into 4 fault domains. In this configuration, you would be able to sustain 1 rack failure (4 hosts), and vSAN would still be able to rebuild that data elsewhere in the cluster, giving you full redundancy (FTT=1) even when 1 rack is down.

Hope this helps.

Reply
0 Kudos
TheBobkin
Champion
Champion

Hello Vinod,

Welcome to Communities, and yes this is the correct sub-forum for vSAN topics.

"Is it possible to design this cluster (i.e. nodes spread across multiple fault domains) in such away that one can

tolerate 4 node failures in the cluster and still allow VMs in the cluster to continue to have access to their data ?"

This really depends what you mean - if for instance you configured 6 Fault Domains (FDs) of 4 nodes, then this *technically* could tolerate the failure of 2 of these FDs (8 hosts) and all FTT=2 data should remain accessible. However (in the same configuration) if you lost 1 node from each of 3 separate FDs then likely some VMs will go inaccessible.

Bob

Reply
0 Kudos
cvinod
Contributor
Contributor

Thanks  "BobKin" and "GreatWhiteTec" !  A couple of follow-ups/clarifications if I may based on what both of you have said...

a) if there are 4 fault domains (with 4 nodes in each) then one can tolerate 1 fault domain (4 nodes) failure,

b) if there are  6 fault domains (with 4 nodes in each) then we can tolerate up to 2 fault domain (8 nodes) failures? 

In case "b" we  still have FTT=1 even after 2 fault domains failed... but in "a" that won't be the case

after 1 fault domain failure ?

[Understand about the 1 node failure in 3 different fault domains ]

Will the following work :

Customer starts off with a VSAN-AF cluster of 4 nodes, and then at a later time chooses to increase the size of the cluster with 4 node increment. i.e. 8, 12, 16... etc. [Granted that with the first 4 nodes the customer can only have RAID5 (FTT=1) and then after the second 4 nodes are added they will have RAID6(FTT=2) etc. ]

They can define the first fault domain for the first 4 nodes and  when they add the second set of 4 nodes they can define the second fault domain with those 4 nodes and so on.

Once their cluster sizes reaches  >=4 fault domains(4 nodes per domain) in their cluster they can survive a  failure of a single fault domain (4 node) ? 

Is my understanding correct ?

Thanks

Vinod

Reply
0 Kudos
GreatWhiteTec
VMware Employee
VMware Employee

OK. So vSAN requires a minimum of 3 fault domains. IF you do not manually configure Fault Domains (FDs), then each host is implicitly a Fault Domain. So if you have 4 hosts in a vSAN cluster, then each host is its own FD. vSAN can sustain a failure of a fault domain when speaking about hosts (within a host a drive failure is 1 failure, etc.) with FTT1.

If you are have 4,6,12, or 16 Fault Domains you can only sustain 1 failure, IF you have the default policy of FTT=1 regardless of FTM (RAID1 or RAID5).

If you want to be able to sustain 2 failures, and have 6 host or 6 FDs, then you can do FTT=2, and still have full redundancy. Otherwise, if you have 6 FDs with FTT=1, and have 2 failures, some of the data will be unavailable.

Fault domains are used so that vSAN knows where to spread the data while maintaining compliance of the policy for failures to tolerate. Also remember that the policies are applied to the object level, so you can have an object with FTT=2, and another object with FTT=1.

"They can define the first fault domain for the first 4 nodes and  when they add the second set of 4 nodes they can define the second fault domain with those 4 nodes and so on"

-- No. You cannot have only one FD. If you have 4 hosts, you will have 4 implicit FDs, or you will create 4 FDs.

Makes sense?

Reply
0 Kudos
TheBobkin
Champion
Champion

Hello Vinod,

"In case "b" we  still have FTT=1 even after 2 fault domains failed... but in "a" that won't be the case"

No, with RAID6 6 FDs are required - you lose 1 and the Objects are FTT=1, you lose 2 and they are FTT=0, lose another FD and they become inaccessible).

RAID5 requires 4 FDs, so lose one and the data is FTT=0, lose another and it becomes inaccessible.

"Will the following work :"

You wouldn't add the second set of 4 nodes as a a new FD - this would be imbalanced as when no FDs are set the nodes as units act as FDs for component-placement.

Starting as 4-nodes RAID5, adding (2-4)more and either going RAID6, then adding 4 more and having RAID6 on 6x2-node FDs would work (which could tolerate 2x2-node FDs being lost).

Beyond that, how it work depends on what you are trying to achieve - like how many racks this being spread over actually benefits.

Could also consider stretched with per-site RAID5/6 when scaling out.

Bob

Reply
0 Kudos
cvinod
Contributor
Contributor

Thanks for the clarifications !

Vinod

Reply
0 Kudos
cvinod
Contributor
Contributor

Hello,

I have a follow up question...

Say a user had  24 hosts in a VSAN cluster with the default failure domains (i.e. each by default host is a failure domain) .

This cluster is up and running and hosting VMs etc. with some  vmdk objects using RAID1 and some others using RAID6.

Can the user now decide to change the  failure domains from the default to the following :

- 6 custom failure domains with each failure domain containing 4 hosts.

Can this be done live without impacting the running VMs  as long as there is enough spare space in the cluster for rebalancing to align with the new failure domains ?

Thanks

Vinod

Reply
0 Kudos
TheBobkin
Champion
Champion

Hello Vinod,

"Can the user now decide to change the  failure domains from the default to the following :

- 6 custom failure domains with each failure domain containing 4 hosts."

Yes.

Likely all that would occur at that point is all VMs becoming noncompliant with their Storage Policies (SP) (other than any that are already by chancehavre their components distributed across enough Fault Domains).

"Can this be done live without impacting the running VMs  as long as there is enough spare space in the cluster for rebalancing to align with the new failure domains ?"

I don't think this would kick off resyncing all the data into spread across enough Fault Domains. However, the user re-applying SP(s) en-masse would cause a potentially massive resync which would add a lot of extra workload to the cluster which would offer contention - ensure that this is done either batch by batch e.g. re-apply to a few TB of VM data at a time and/or during non-peak hours.

Do note that it doesn't just move the data but recreates the data in the new location(s) and then discards the original so this can temporarily utilise a potentially huge amount of space if done in bulk at once.

Bob

Reply
0 Kudos