Hi wise ones,
currently running in a strange issue with an eight node vSAN Cluster (6.7.17167734) when creating a stretched vSAN Cluster.
We do use WTS for traffic separation and are able to vmkping between ESXi vSAN vmks and Witness vmk1 (and vice versa).
But still receive a cluster partition message when running Skyline health checks.
But as you could see, the witness Node is listed twice in this report!
Anybody ever seen a similar behavior with vSAN stretched Cluster?
And if so, how to get it fixed?
Would really appreciate any feedback on this.
Regards,
Ralf
@kastlr,This could be caused by numerous things and I would advise checking these first:
- Is the witness vmk (on the Witness node) tagged for 'vsan' and 'witness' traffic (should be only 'vsan' (yes, we are aware that is a tad confusing perhaps))?(if so, remove witness traffic tag on vmk1 if using vmk0)
- Does the Witness node have multiple node tagged for 'vsan' traffic - it should have only one (and comes with vmk1 tagged when deployed)? (if so, remove the one)
- Is there any disparity in static routes between nodes if using these to communicate with the Witness node? (esxcfg-route -l informs of these)
- Are there any other Network health checks showing what is failing to communicate from vmkX to vmkX?
Hi,
first let me say thanks for joining the ride.
Not sure if I got all of your points correctly, but here're some more details.
Thanks again for providing feedback on this weird issue.
Regards,
Ralf
On your vSAN Witness Appliance, are vmk0 & vmk1 on the same network segment?
If so, it would be necessary to untag "vSAN Traffic" on vmk1 and tag it on vmk0.
vSAN uses the same TCP stack as management, and in this situation where multi-homing comes into play. (https://kb.vmware.com/kb/2010877)
While vmk1 is tagged for vSAN Traffic, it actually uses vmk0.
The discrepancy can cause a partition.