VMware Cloud Community
lucasitteam
Enthusiast
Enthusiast
Jump to solution

Consider 4 or more nodes for the vSAN cluster design for maximum availability

i'm unable to understand why 4 nodes are recommended for maximum availability.

I understand for FTT=1, you need 3 nodes (2n+1).

But then design guide states, if ESXi host fails you cannot rebuild component.

Quoted

"The implications of this are that if a node fails, vSAN cannot rebuild components, nor can it provision new VMs that tolerate failures"

For example there are Node1, Node2 and Node3. For simplicity there is only one VM is running on it. By Design there are 2 replicas of the data and a witness, and these must all reside on different hosts. Assume, Node 3 is holding Witness, if Node 3 is down what will be impact? I believe VM will continue to run but then witness will not be rebuild right?

Suppose we go with Four node cluster and put Node-3 into maintenance will it move the witness to Node-4?

There is statement in "essential-virtual-san - second edition book " chapter -09. I unable to understand why 8 nodes and not 7 nodes. I'm sorry unless you have read the book only Author of the book can explain.

  1. Minimum of six hosts to support RAID-6.
  2. Additional host to allow for full recovery and re-protection (self-healing) after a failure, which means seven hosts minimum.
  3. Additional host to allow for recovery and re-protection during maintenance, which means eight hosts minimum.

0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello lucasitteam,

"Node 3 is holding Witness, if Node 3 is down what will be impact? I believe VM will continue to run but then witness will not be rebuild right?"

Yes the VM will remain accessible provided the majority of its Objects components are accessible - however the VM may be restarted if the latest-updated data-component of an Object becomes inaccessible. If there are not enough available Fault Domains (FDs) available to satistfy the rules in teh Storage Policy (SP) then it cannot be rebuilt.

"Suppose we go with Four node cluster and put Node-3 into maintenance will it move the witness to Node-4?"

If a host is put in Maintenance Mode (MM) with 'Ensure Accessibility' the components residing on this nodes storage will be marked as 'absent' and only rebuilt after 60 minutes ( with default Clom Repair Delay settings)

In the event of an abrupt failure regardless of whether it is a witness-component or data-component if a component is marked as 'degraded' due to abrupt loss of a node/device it will be rebuilt on an available node provided it can satisfy the FDs of the applied SP and has adequate storage-space to do so.

"I unable to understand why 8 nodes and not 7 nodes."

6 is required minimum FDs as RAID6 Objects have 6 components.

7 is so that 6 remain available in the event of one failing.

8 so that with one node failed there is still the ability to put a host in MM with 'Full Data Evacuation' as opposed to 'Ensure Accessibility'.

Hope this helps.

Bob

View solution in original post

0 Kudos
2 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello lucasitteam,

"Node 3 is holding Witness, if Node 3 is down what will be impact? I believe VM will continue to run but then witness will not be rebuild right?"

Yes the VM will remain accessible provided the majority of its Objects components are accessible - however the VM may be restarted if the latest-updated data-component of an Object becomes inaccessible. If there are not enough available Fault Domains (FDs) available to satistfy the rules in teh Storage Policy (SP) then it cannot be rebuilt.

"Suppose we go with Four node cluster and put Node-3 into maintenance will it move the witness to Node-4?"

If a host is put in Maintenance Mode (MM) with 'Ensure Accessibility' the components residing on this nodes storage will be marked as 'absent' and only rebuilt after 60 minutes ( with default Clom Repair Delay settings)

In the event of an abrupt failure regardless of whether it is a witness-component or data-component if a component is marked as 'degraded' due to abrupt loss of a node/device it will be rebuilt on an available node provided it can satisfy the FDs of the applied SP and has adequate storage-space to do so.

"I unable to understand why 8 nodes and not 7 nodes."

6 is required minimum FDs as RAID6 Objects have 6 components.

7 is so that 6 remain available in the event of one failing.

8 so that with one node failed there is still the ability to put a host in MM with 'Full Data Evacuation' as opposed to 'Ensure Accessibility'.

Hope this helps.

Bob

0 Kudos
lucasitteam
Enthusiast
Enthusiast
Jump to solution

Dear Bob,

Thanks a lot for taking time to explain in detailed the challenges.

For my understanding, if I wish to have N+1 capability even during the maintenance window you should have minimum 5 node vSAN cluster?

Do we also need to consider Slack space when we are considering rebuild capacity?

0 Kudos