baber
Expert
Expert

What is the benefits to reside 2 hosts in a fault domain

Would you say please which one is better and why ?

I have 8 hosts

1 - Create 8 fault domain that each one include 1 host

2 - Create 4 fault domain that each include 2 hosts

What is the benefits to design for each one ?

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
10 Replies
dimyke
Enthusiast
Enthusiast

Hi

First I would like to give you a link to the VMware documentation where they explain fault domains in detail: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-8491C4B0-6F94-4...

There is no answer to your question because you do not provide enough information.

Let's start with a few questions:
1) Why do you want to use FD's?
2) How is your rack layout.
3) What RAID level would you like to use?

If you read the information from VMware docs in the link above you will see that the concept of FD's is introduced to group hosts that are in the same rack or same chassis and then do some logic with it.

Also Microsoft has a similar definition see https://docs.microsoft.com/en-us/windows-server/failover-clustering/fault-domains

Hope this already helps a bit.
Kr
Dimitri

0 Kudos
TheBobkin
VMware Employee
VMware Employee

@baber, The only conceivable benefit of configuring 4 FDs would be if you are aiming for a form of pseudo-rack-awareness e.g. that the cluster could (in theory) withstand 2 hosts to fail (in the same rack/FD) and still have data availability even while using an FTT=1 Storage Policy.


However, I have seen this cause problems e.g. if you were using a RAID5 Storage Policy and 1 node failed then it wouldn't be able to rebuild the missing data components anywhere other than the remaining node in that FD and this might cause space issues or not have enough space to repair everything - this wouldn't happen if there were no FDs as there would be 4 applicable nodes to repair the data on.


If not going with 4 FDs, you don't need to explicitly define FDs at all - when not configured at all, each node is defined as an FD with regard to component placement in compliance with the Storage Policies.

0 Kudos
baber
Expert
Expert

1- Is that your means when we have 20 hosts it is better and vmware recommendation is create 20 FD (by default) instead of 10 FD with 2 hosts ?

2- I have been read this doc :

https://blogs.vmware.com/virtualblocks/2019/12/20/design-operation-considerations-using-vsan-fault-d...

3- according to it I understand if we put 2 hosts in a FD and one of these hosts failed that FD is not reliable thus can  not use that FT due create new machine for use SPBM is that correct ?

4 - If this is correct thus best recommendation is just put 1 host in each FD is that correct?

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
dimyke
Enthusiast
Enthusiast

1 - it is per VMware recommendation to use a fault domain per rack as per described in the blog you link in item 2.
If you have 20 hosts in a rack then there is 1 fault domain and you should not create a fault domain.
If you have 2 racks with 10 hosts each then you have 2 fault domains with each 10 hosts.
Note that you need at least 3 fault domains for vsan to work unless you have a stretched cluster with a whitness node somewhere.

3 - if you have 2 hosts in a FD and one host fails, nothing happens for 60 minutes (default timer + depends on what kind of failure).
With an unrecoverable failure the rebuilt is instant.
After this 60 minutes and if the host is still offline, vSAN will rebuilt the lost blocks on the other host in the same fault domain.

If one host is down from a FD and you create a new VM, the remaining host will be used to store the data.

The image in the blog is showing what happens in case of one host failure (I explained above) and what happens after a rack failure.

The response when a complete fault domain is down is the same as with one host. If it's due planned maintenance or absent, then a timer of 60 minutes starts. After this 60 minutes it will rebuild.
I advice you also look into how vSAN rebuilts objects on (permanent) failures for example this blog post: https://blogs.vmware.com/virtualblocks/2017/11/09/understanding-vsan-rebuilds/

If you will only put 1 host in an FD just forget about it and do not use FD's. It does not have any use.

0 Kudos
baber
Expert
Expert

As I understood if we have 4 FD (FD1....FD4) and each FD contains 3 hosts and created some vm with FTT=1 now want to know if happened disk failure or host failure on host1 in FD1 now FD1 is still reliable ?

want to know if I create a new vm will place any components on FD1 or it will not use FD1  for new vms till repair it ?

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
dimyke
Enthusiast
Enthusiast

FD1 will remain available for old and new VM's, you don't even have to fix the drive as vSAN will rebuilt if needed.
So 1 disk failure will not impact the rest of the FD and it will remain operational.

I think that is what you want to know, right?

EDIT: also a host failure will not impact your FD.

0 Kudos
baber
Expert
Expert

Because I had heard if we had 4 FDs and each of them contains 10 hosts if one host fail in FD1 and after 60 min could not repair FD1 is not reliable and new vms can not use from FD1 till it host repairs but your are saying if one or three host in a FD failed not happen and still can use from that FD for new and previous vms

is that correct ?

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
TheBobkin
VMware Employee
VMware Employee

@baber , No it will use the other 9 available nodes in that FD with 1 node failed - marking 10 nodes worth of data as Absent/Degraded because one node failed would be insane and make no sense.

 

By the by, 4 FDs each with 2 nodes Vs 4 FDs each with 10 nodes is a completely different story, for a start what I mentioned above regarding ability to repair with the available capacity it that FD wouldn't really be a problem.

0 Kudos
baber
Expert
Expert

Actually I have been read that document but really could not understand

consider we have 16 hosts and want use just FTT=1 now :

what is difference between : (All hosts reside in 2 RAC each rac has 8 hosts)

"16 FD with one host" VS  "8 FD with 2 hosts" and "4FD with 4hosts" ?

 

 

Please mark helpful or correct if my answer resolved your issue.
0 Kudos
dimyke
Enthusiast
Enthusiast

@TheBobkinis right. If one hosts fails, the rest keeps working. It would be stupid to stop using all nodes in a rack when only 1 node fails.

You do not need fault domains for vsan to work,....
Your questions do not make sense... As mentioned multiple times, a fault domain is a rack or chassis. If you do not have multiple racks, do not use fault domains.

If you have 2 racks you could create a stretched cluster with a whitness node somewhere else.

Or you could create a RAID per rack (and a cluster per rack).

There is NO use in creating multiple fault domains in the same rack unless these servers have different switches and power supplies (or multiple chassis in the same rack).
There is no use in creating fault domains to just create fault domains, you need a different solution for that like I proposed above.
So I ask you again, what do you want to do? What do you have? Your questions do not make sense and we can not explain it with the information you are giving us.

"16 FD with one host" just no. Forget this. Then just do not use fault domains whatsoever.
I am not going to comment on the others because you did not provide enough information to answer this question.

 

0 Kudos