MJMVCIX
Contributor
Contributor

vSAN Stretched Cluster Failure Scenarios

Jump to solution

Hi All, 

I want to double check the impact of the below scenarios.

  • The Cluster is vSAN 7.0 update 3, All Flash and is a stretched cluster. 
  • The storage policy for all VMs is "Dual site mirroring", FTM = "RAID-5 (Erasure Coding)", FTT = "1"

So the above means, there is RAID 1 between the data sites and then RAID 5 of that data within each data site with metadata on the vSAN Witness appliance in 3rd site

Regarding the improvements to votes in vSAN 7.0u3 (Enhanced Stretched Cluster durability), i believe this has now improved the failure handling in scenarios.

  1. Based on the above storage policy highlighted in Red, if the Witness appliance failed/was offline, i believe there would be no impact to the VMs in either site? The storage policy would simply show as non compliant?
  2. Following on from the above scenario, if one of the data sites were then to also fail and all hosts in that site be offline, is it correct that the remaining data site VMs and data/storage would still be online with VMs unaffected?

(This would then mean the Witness and 1 data site were offline. of course in this scenario, vSphere HA would failover the VMs so there would be downtime for them during that time however this question focusses on the vSAN Data and whether the VMs would still be online if the witness and 1 data site were offline with the above storage policy.)

Thanks,

0 Kudos
1 Solution

Accepted Solutions
TheBobkin
VMware Employee
VMware Employee

@MJMVCIX 1. Yes, correct.

2. Unfortunately no, not if it failed in that order (Witness first and then subsequent data-site failure). The reason why it would work in the opposite failure order is changing votes structure based on data-site+witness is straightforward and predictable, whereas doing anything similar to that with 2 data-sites (and no witness) is really not, this may be a feature in future but is currently not implemented.

View solution in original post

2 Replies
TheBobkin
VMware Employee
VMware Employee

@MJMVCIX 1. Yes, correct.

2. Unfortunately no, not if it failed in that order (Witness first and then subsequent data-site failure). The reason why it would work in the opposite failure order is changing votes structure based on data-site+witness is straightforward and predictable, whereas doing anything similar to that with 2 data-sites (and no witness) is really not, this may be a feature in future but is currently not implemented.

depping
Leadership
Leadership
  1. If the witness fails, all VMs would still be alive in both locations, this has been the case for all versions of vSAN stretched clusters
  2. No, if one of the sites subsequently fails the VMs will become inaccessible.

What you are referring to is the situation where the Witness goes down AFTER one of the sites has gone down. In this case when a site goes down, after a few minutes, the votes are recalculated as we will want to ensure the VMs will be available even when the witness goes down next. But the order is specific: Site --> witness. not the other way around.