VMware Cloud Community
Cipo800
Enthusiast
Enthusiast

vsan 7.0U3 Stretched cluster site/witness failure resiliency

Hi guys, I just read the release notes of the 7.0U3 and this new feature is very interesting in my environment: 6 Vxrail nodes splitted on two building, witness temporary in one of the two building.

  • Stretched cluster site/witness failure resiliency. This release enables stretched clusters to tolerate planned or unplanned downtime of a site and the witness. You can perform site-wide maintenance (such as power or networking) without concerns about witness availability. 

This means that I don't need anymore to move the witness in a third site or to external cloud? How it works?

 

Thanks

0 Kudos
3 Replies
nkaneda
Enthusiast
Enthusiast

This means that I don't need anymore to move the witness in a third site or to external cloud? How it works?

No. In my understanding, it's still the same requirement for witness site.

How it works, is described in external blog.

vSAN 7.0 U3 enhanced stretched cluster resiliency, what is it? | Yellow Bricks (yellow-bricks.com)

 

Let's say, DC A and witness are in the same site A', and DC B is in the site B', in this case, if the site A' becomes totally down, all the object in DC B also becomes down because of loss of quorum.

If witness is in third site, and there is enough time between loss of DC A and Witness site and then all the object could change the vote layout (make it vote 0 in witness site), all the object in DC B would still remain accessible.

0 Kudos
kastlr
Expert
Expert

Hi,

AFAIK, the difference between pre & post 7.0U3 vSAN Witness handling is the following.

When a site failure occurs, the remaining site and the witness are still needed to allow vSAN to handle the failure properly.
While VMware HA will try to restart all affected VMs on the remaining nodes it interacts with vSAN to check if the vmdks could be accessed.
When vSAN allows access to the vmdks (to be more precise, all objects stored in vSAN which belongs to those VM), HA could continue and would finally start the VMs.

My understanding of the new functionality is that during the "recovery phase" the witness has to be fully operational.
Even if later on also the vSAN Witness went offline (planned or unplanned) access to the vmdks are still granted.

So your design would still end in a cluster wide downtime when the site where you run the vSAN witness will be hit by an incident.


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
depping
Leadership
Leadership

Please read the article. HA can restart VMs when the witness is down. This new mechanism basically recalculates the votes when one of the two data sites has an outage. The recalculation occurs to take a potential failure of the witness host in to account. if then this failure would happen, all components would still be available!

0 Kudos