VMware Cloud Community
SrVMwarer
Hot Shot
Hot Shot

vSAN Stretched Cluster

Hello vSAN Gurus,

Considering the topology below:-

- What is We've lost connectivity between the two sites but still maintain it between each site and the Witness, VMs work run on the preferred site, ok then non-preferred site how it's gonna tell that it shouldn't power it on However that non preferred still maintain connectivity to the Witness site

What if preferred site lost connectivity to Witness in that case while non-pref saying humm! I am not the preferred I will not power VMs on as the preferred will do so :smileylaugh:

Capture.PNG

Regards, İlyas
9 Replies
Techstarts
Expert
Expert

I might be wrong but this scenario is described in the deep dive book.

let me clarify if I understood you.

connectivity between SiteA-Witness-Ok

connectivity between SiteB-Witness-Ok

But SiteA and SiteB lost i.e. ISL lost.

In this scenario, the cluster is formed between the preferred site and the Witness site. all VMs are powered off from the secondary site.

and when preferred site losses connectivity, there is a no quorum and the entire vSAN infrastructure down.

Witness plays a critical role when ISL is broken.

With Great Regards,
depping
Leadership
Leadership

the above is correct indeed and is described in the vSAN Deep Dive book, as well as on storagehub!

Failure Scenarios | vSAN Stretched Cluster Guide | VMware

SrVMwarer
Hot Shot
Hot Shot

Thank you all..

Now what if Witness lost connectivity to both sites, VMs will stay up ad running with  no issues, fine now how meta data is being maintained ?

Thanks again!

Regards, İlyas
Reply
0 Kudos
kmcd03
Contributor
Contributor

If only the Witness host loses connectivity to both sites, VMs will stay online.

Last week our 14+1 stretched cluster (between two data centers) lost connectivity to third data center where the witness was located. (redundant network to all sites, but outage was caused by firewall misconfig)  There was no affect to the VMs at either data center (preferred and secondary fault domain).

vSAN health checks alerted on multiple errors, like connection to the Witness host and network partition has occurred. But no affect to guest VMs.

However when connectivity was restored several hours later, the witness would not rejoin the cluster.  We also have two 2-node clusters with witness at third site that wouldn't re-connect to their Witnesses either.  I confirmed that could ping between vSAN hosts and witness across the appropriate interface (vmk) for all clusters. I opened ticket with GSS and only solution was to disable the stretch configuration.  Then creating the stretch configuration again, putting the hosts in the correct fault domain and choosing the Witness host. 

Once the Witness was back online and participating in the cluster could see in the health check that objects were rebuilding on the Witness.

Techstarts
Expert
Expert

Hi kmcd03

This is super. Glad to know it works as expected. You might be surprised,  @Jasemccartyin his VMworld 2019 US presentation repeated FW being the main reason for such issues.

would you be kind to help, how are you managing isolation addresses per site.

With Great Regards,
kmcd03
Contributor
Contributor

For the isolation address, I referenced Duncan Epping's blog (vSphere HA heartbeat datastores, the isolation address and vSAN | Yellow Bricks)  We created a Switch Virtual Interface (SVI) on the physical switches.  With the IP in same subnet as vSAN and one for each site.  Then configured the advanced option setting das.isolationaddress0.

Techstarts
Expert
Expert

Quote : vSAN Stretch Cluster

When the existing vSAN Witness Host comes back online or a new vSAN Witness Host is deployed, metadata changes are resynchronized between the main Stretched Cluster sites and the vSAN Witness Host. The amount of data that needs to be transmitted depends on a few items such as the number of objects and the number of changes that occurred while the vSAN Witness Host was offline. However, this amount of data is relatively small considering it is metadata, not large objects such as virtual disks.

With Great Regards,
Techstarts
Expert
Expert

Thanks kmcd03​​ again. Though not the right forum. I would definitely like to know more about it.

With Great Regards,
ankithb
Contributor
Contributor

thanks for the replies .

Reply
0 Kudos