Suppose in a VSAN Stretched Cluster:
Just want to see if it is possible for a VSAN Stretched Cluster to automatically vMotion a VM to the other site when for example:
Normally, with the affinity rule in effect, HA would restart the VM on the surviving ESXi in the Preferred-Site, even if it means the storage traffic has to traverse over the Inter-Site-Link to the Secondary-Site (Where SFTT is still compliant).
Is there a way to make it so that in such scenarios, DRS would just ignore the affinity "should" rule for the VM, and just vMotion the VM across to the other site where the VM's storage traffic could happen locally?
Sorry for the bump. I will rephrase my question to make it clearer.
Preferred-Site: 2 of 3 hosts failed
Secondary-Site: 3 of 3 hosts still healthy
What happens at the VMDK level:
VMs can only access their VMDKs using the Secondary-Sites component copies.
What happens at the VM runtime level:
VMs can still run on the Preferred-Site's remaining host, but access to their VMDKs will have to go through the ISL to the Secondary-Site.
What I'm asking:
In the above scenario, it would be better for HA (or DRS) to just restart (or vMotion) all VMs to the Secondary-Site, ignoring the affinity rules. Anyway to do this?
Sorry for the necro thread bump.
I created this thread based on VSAN 6.7U3.
I reckon problem probably didn't have a solution due to no replies.
Just bumping this topic up again to see if there is a solution / fix / workaround for this problem in 7.0U1.
Should rules are only ignored when HA cannot restart the VM in the "site" you defined, or when DRS sees a significant imbalance. There's nothing else right now unfortunately.
I have filed a few feature requests around this in the past, and hopefully those will make it in over time.
Thanks for confirming. I might have to come up with some PowerCLI workaround for now, at least until this feature is build-in to HA+DRS+VSAN.
I'm thinking of some script to periodically check to see if - due to HA restart and DRS Affinity-should-rules - a VM is running in a site where its VSAN disk components are not actually accessible, and that disk I/O traffic is actually going across the intersite-link *shudders*.
And if found, it will just vMotion the VM to the other site so the VM would achieve some kind of "disk read write locality" again.
Note to self: Make PowerCLI script remove affected VMs from the Affinity rule after the cross-site vMotion, then send some log or email telling the world what it did so it can be manually undone later once things are fixed.