VMware Cloud Community
vSohill
Expert
Expert
Jump to solution

fault domain vs RAID

Hi,

RAID 5 or 6 and  fault domains what is the different ?

1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello vSohill

In vSAN, RAID5/6 refers to Storage Policies applied to Objects/VMs that are for RAID5 (FTT=1) requiring minimum 4 Fault Domains, and for RAID6 (FTT=2) requiring minimum 6 Fault Domains.

Using RAID 5 or RAID 6 Erasure Coding

Fault Domains are by default the nodes that are contributing storage to the vSAN cluster (unless manually set for some form of pseudo-rack-awareness or in Stretched clusters).

Managing Fault Domains in vSAN Clusters

Bob

View solution in original post

15 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello vSohill

In vSAN, RAID5/6 refers to Storage Policies applied to Objects/VMs that are for RAID5 (FTT=1) requiring minimum 4 Fault Domains, and for RAID6 (FTT=2) requiring minimum 6 Fault Domains.

Using RAID 5 or RAID 6 Erasure Coding

Fault Domains are by default the nodes that are contributing storage to the vSAN cluster (unless manually set for some form of pseudo-rack-awareness or in Stretched clusters).

Managing Fault Domains in vSAN Clusters

Bob

vSohill
Expert
Expert
Jump to solution

Many thanks for your good clarification TheBobkin

In case of flash device failed will it be managed on FTT as capacity device failure ?

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello vSohill​,

If by "flash device failed" you mean the Cache-tier SSD/NVMe has failed then that entire Disk-Group will be marked as failed and all data residing on the Capacity-tier devices marked as degraded - provided you have an FTT=1 (or greater) Storage Policy applied to the Objects these will remain accessible and be repaired back to compliance with their Storage Policy e.g. if they were FTT=1, they will essentially be FTT=0 after the failure and then repaired back to FTT=1.

Note however that repairing these Objects back to compliance requires adequate space and eligible node/Fault Domains for the replacement components to be placed - e.g. if you have a 4-node cluster with only a single Disk-Group per node and are using a RAID5 Storage Policy, if you had a failed Disk-Group there would be nowhere eligible to repair the components (as this requires a minimum of 4 nodes/Fault Domains and you are left with only 3) and the data would remain FTT=0 until the failed Disk-Group was fixed.

Bob

vSohill
Expert
Expert
Jump to solution

Thank you TheBobkin

Regarding the Fault domains. what the different between those 3 examples on the below screenshots :

1-   Six nodes under one fault domain

pastedImage_0.png

2- Two fault domins. 3 nodes under each tow fault domains

pastedImage_1.png

3-  six stand alone hosts

pastedImage_2.png

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello vSohill​,

You are most welcome - happy to help.

1. Would not be able to place anything except FTT=0 Objects as you only have one Fault Domain (FD).

2. Again, if this was a standard cluster it would only be able to place FTT=0 Objects as a minimum of 3 FDs are required for FTT=1,FTM=RAID1 Objects - this is the typical layout for a Stretched cluster but with Witness being the 3rd FD, this basically places components as a RAID1 across sites (e.g. data-replica on each site and witness components stored on Witness).

3. This is a normal standard cluster layout and could place RAID1 or RAID5/6 with FTT=1 or even FTT=2 Objects - FDs don't need to be defined in standard clusters for the reason I noted above, e.g. each node is an FD when undefined otherwise.

Bob

nicholas1982
Hot Shot
Hot Shot
Jump to solution

If I may add to this discussion, if you have dedupe and compress enabled, losing a capacity disk will also destroy the disk group.

Nicholas
vSohill
Expert
Expert
Jump to solution

Thank you again TheBobkin ​ Bob for your help and your clear and simple explanation. In my first example I will have single FD contains 6 nodes. This cluster will be only able place FFT=0 (as you mentioned) Is there is a use case for this kind of FD ? or it's nonoptimal design ?

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello vSohill​,

No, I can't really think of any use case where configuring everything in one FD would be in any way beneficial.

An example of where configuring multiple FDs (in a non-Stretched cluster) might be beneficial would be if you had a 32-node cluster with 4 sets of 8 servers in 4 different racks and wanted to have data placed in such a way that a rack or ToR-switch failure would only cause data to be reduced-availability instead of a mix of fine, reduced and inaccessible as they would be if data wasn't placed like this.

Bob

Reply
0 Kudos
vSohill
Expert
Expert
Jump to solution

Thank you,

Reply
0 Kudos
vSohill
Expert
Expert
Jump to solution

TheBobkin​ FTT user FTT use algorithm of 2n+1. What is the  algorithm  for erasure coding  for example RAID 5 with FTT 1

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello vSohill​,

Not sure exactly what you mean but guessing you are asking about FTT=X with FTM=RAID1 vs RAID5/6, here is breakdown of how this is composed:

FTT=0, FTM=RAID0 = 2(0)+1 = 1 component  (1 data-replica)

FTT=1, FTM=RAID1 = 2(1)+1 = 3 components (2 data-replicas + 1 witness component)

FTT=2, FTM=RAID1 = 2(2)+1 = 5 components (3 data-replicas + 2 witness components)

FTT=3, FTM=RAID1 = 2(3)+1 = 7 components (4 data-replicas + 3 witness components)

FTT=0, FTM=RAID5 = doesn't exist as RAID5/6 doesn't work like this on vSAN nor on any storage platform

FTT=1, FTM=RAID5 = 4 components (all components contain a combination of data and a single set of distributed parity data)

FTT=2, FTM=RAID6 = 6 components (all components contain a combination of data and two sets of distributed parity data)

So no, there is no formula for RAID5/6 minimums other than minimum 4/6 node/Fault Domains needed for FTT=1, FTM=RAID5 and FTT=2, FTM=RAID6 respectively.

Bob

vSohill
Expert
Expert
Jump to solution

Thanks Bob,

Reply
0 Kudos
vSohill
Expert
Expert
Jump to solution

TheBobkin​ I have a question regarding physical disk placement. In my case I have a VM with RAID 5 protection. VM run on ESXI01. The VM Home are distributed over ESXi2, ESXi3,ESXi4 and ESXi5. In this exempel the home Component such as NVRAM and vmx file are not mapped to ESXI01 which hosting the VM. I could't get how this function. If esxi01 went down does the vm need to be restarted on other host ?

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello vSohill​,

So, ALL of the VMs data are stored as objects: the .vswp, .vmdk and the namespace (which contains the descriptors for all Objects and some data stored as files e.g. the .vmx, .vmsd, ,vmsn and vmx-vswp etc.).

None of these are "mapped" to any particular host, they are simply being accessed by a host as they can all see vsanDatastore.

If a host goes down then yes of course any VMs need to be restarted on other hosts - this will occur provided the namespace and other Objects are still available (e.g. only 1 host failed).

Bob

Reply
0 Kudos
vSohill
Expert
Expert
Jump to solution

Many thanks Bob TheBobkin

I think I need more clarification from you. Kindly see the screenshot below which showing the compnents accross 4 hosts.

The VM run on host ESXI01 which  is not one of thoses  hosts. As you maientioned the namespace are not mapped to a prticiular host. On my understatnding those objects are located and saved on a disks attached to  those nodes  and ESXI01 and ESXi07 not a member on that raid protection (if there is no failure). vmx, nvram....etc but still vm folder on host ESXi. Is it mean that esxi 2-6 function only as sotrage protection for the vm folder?

pastedImage_0.pngvsan01.png

Reply
0 Kudos