14 Replies Latest reply on Aug 7, 2020 4:56 AM by TheBobkin

    fault domain vs RAID

    vSohill Hot Shot

      Hi,

      RAID 5 or 6 and  fault domains what is the different ?

       

        • 1. Re: fault domain vs RAID
          TheBobkin Virtuoso
          VMware EmployeesvExpert

          Hello vSohill

           

          In vSAN, RAID5/6 refers to Storage Policies applied to Objects/VMs that are for RAID5 (FTT=1) requiring minimum 4 Fault Domains, and for RAID6 (FTT=2) requiring minimum 6 Fault Domains.

          Using RAID 5 or RAID 6 Erasure Coding

           

          Fault Domains are by default the nodes that are contributing storage to the vSAN cluster (unless manually set for some form of pseudo-rack-awareness or in Stretched clusters).

          Managing Fault Domains in vSAN Clusters

           

          Bob

          • 2. Re: fault domain vs RAID
            vSohill Hot Shot

            Many thanks for your good clarification TheBobkin

            In case of flash device failed will it be managed on FTT as capacity device failure ?

            • 3. Re: fault domain vs RAID
              TheBobkin Virtuoso
              vExpertVMware Employees

              Hello vSohill,

               

              If by "flash device failed" you mean the Cache-tier SSD/NVMe has failed then that entire Disk-Group will be marked as failed and all data residing on the Capacity-tier devices marked as degraded - provided you have an FTT=1 (or greater) Storage Policy applied to the Objects these will remain accessible and be repaired back to compliance with their Storage Policy e.g. if they were FTT=1, they will essentially be FTT=0 after the failure and then repaired back to FTT=1.

              Note however that repairing these Objects back to compliance requires adequate space and eligible node/Fault Domains for the replacement components to be placed - e.g. if you have a 4-node cluster with only a single Disk-Group per node and are using a RAID5 Storage Policy, if you had a failed Disk-Group there would be nowhere eligible to repair the components (as this requires a minimum of 4 nodes/Fault Domains and you are left with only 3) and the data would remain FTT=0 until the failed Disk-Group was fixed.

               

              Bob

              • 4. Re: fault domain vs RAID
                vSohill Hot Shot

                Thank you TheBobkin

                Regarding the Fault domains. what the different between those 3 examples on the below screenshots :

                 

                1-   Six nodes under one fault domain

                2- Two fault domins. 3 nodes under each tow fault domains

                 

                 

                3-  six stand alone hosts

                 

                • 5. Re: fault domain vs RAID
                  TheBobkin Virtuoso
                  VMware EmployeesvExpert

                  Hello vSohill,

                   

                  You are most welcome - happy to help.

                   

                  1. Would not be able to place anything except FTT=0 Objects as you only have one Fault Domain (FD).

                   

                  2. Again, if this was a standard cluster it would only be able to place FTT=0 Objects as a minimum of 3 FDs are required for FTT=1,FTM=RAID1 Objects - this is the typical layout for a Stretched cluster but with Witness being the 3rd FD, this basically places components as a RAID1 across sites (e.g. data-replica on each site and witness components stored on Witness).

                   

                  3. This is a normal standard cluster layout and could place RAID1 or RAID5/6 with FTT=1 or even FTT=2 Objects - FDs don't need to be defined in standard clusters for the reason I noted above, e.g. each node is an FD when undefined otherwise.

                   

                  Bob

                  • 6. Re: fault domain vs RAID
                    nicholas1982 Hot Shot

                    If I may add to this discussion, if you have dedupe and compress enabled, losing a capacity disk will also destroy the disk group.

                    • 7. Re: fault domain vs RAID
                      vSohill Hot Shot

                      Thank you again TheBobkin Bob for your help and your clear and simple explanation. In my first example I will have single FD contains 6 nodes. This cluster will be only able place FFT=0 (as you mentioned) Is there is a use case for this kind of FD ? or it's nonoptimal design ?

                      • 8. Re: fault domain vs RAID
                        TheBobkin Virtuoso
                        VMware EmployeesvExpert

                        Hello vSohill,

                         

                        No, I can't really think of any use case where configuring everything in one FD would be in any way beneficial.

                        An example of where configuring multiple FDs (in a non-Stretched cluster) might be beneficial would be if you had a 32-node cluster with 4 sets of 8 servers in 4 different racks and wanted to have data placed in such a way that a rack or ToR-switch failure would only cause data to be reduced-availability instead of a mix of fine, reduced and inaccessible as they would be if data wasn't placed like this.

                         

                        Bob

                        • 9. Re: fault domain vs RAID
                          vSohill Hot Shot

                          Thank you,

                          • 10. Re: fault domain vs RAID
                            vSohill Hot Shot

                            TheBobkin FTT user FTT use algorithm of 2n+1. What is the  algorithm  for erasure coding  for example RAID 5 with FTT 1

                            • 11. Re: fault domain vs RAID
                              TheBobkin Virtuoso
                              vExpertVMware Employees

                              Hello vSohill,

                               

                              Not sure exactly what you mean but guessing you are asking about FTT=X with FTM=RAID1 vs RAID5/6, here is breakdown of how this is composed:

                              FTT=0, FTM=RAID0 = 2(0)+1 = 1 component  (1 data-replica)

                              FTT=1, FTM=RAID1 = 2(1)+1 = 3 components (2 data-replicas + 1 witness component)

                              FTT=2, FTM=RAID1 = 2(2)+1 = 5 components (3 data-replicas + 2 witness components)

                              FTT=3, FTM=RAID1 = 2(3)+1 = 7 components (4 data-replicas + 3 witness components)

                               

                              FTT=0, FTM=RAID5 = doesn't exist as RAID5/6 doesn't work like this on vSAN nor on any storage platform

                              FTT=1, FTM=RAID5 = 4 components (all components contain a combination of data and a single set of distributed parity data)

                              FTT=2, FTM=RAID6 = 6 components (all components contain a combination of data and two sets of distributed parity data)

                               

                              So no, there is no formula for RAID5/6 minimums other than minimum 4/6 node/Fault Domains needed for FTT=1, FTM=RAID5 and FTT=2, FTM=RAID6 respectively.

                               

                              Bob

                              • 12. Re: fault domain vs RAID
                                vSohill Hot Shot

                                Thanks Bob,

                                • 13. Re: fault domain vs RAID
                                  vSohill Hot Shot

                                  TheBobkin I have a question regarding physical disk placement. In my case I have a VM with RAID 5 protection. VM run on ESXI01. The VM Home are distributed over ESXi2, ESXi3,ESXi4 and ESXi5. In this exempel the home Component such as NVRAM and vmx file are not mapped to ESXI01 which hosting the VM. I could't get how this function. If esxi01 went down does the vm need to be restarted on other host ?

                                  • 14. Re: fault domain vs RAID
                                    TheBobkin Virtuoso
                                    vExpertVMware Employees

                                    Hello vSohill,

                                     

                                    So, ALL of the VMs data are stored as objects: the .vswp, .vmdk and the namespace (which contains the descriptors for all Objects and some data stored as files e.g. the .vmx, .vmsd, ,vmsn and vmx-vswp etc.).

                                    None of these are "mapped" to any particular host, they are simply being accessed by a host as they can all see vsanDatastore.

                                    If a host goes down then yes of course any VMs need to be restarted on other hosts - this will occur provided the namespace and other Objects are still available (e.g. only 1 host failed).

                                     

                                    Bob