Solved: Re: FTT0 Object Placement and Maintenance Mode

mike-p · ‎02-03-2023

Hi,

i have several customers with small stretched clusters. Some of the vm's are MS Exchange or SQL DAGs or other systems with own HA options which run on a policy of FTT=0. vSAN spreads the objects of this vm's on the nodes of each site. If I take one of this nodes in maintenance mode without shutting down the vm's a long process of object evacuation starts. Is it mandatory to power off such machines or is there another option to handle this?

Regards Mike

TheBobkin · ‎02-07-2023

@mike-p

"This is a bit weird because this 2 nodes are in different fault domains. I expected that the data will be placed inside one FD."
No, actually this is expected as this policy doesn't explicitly pin the data to Preferred/non-Preferred site so it will place components purely based on capacity-disk usage (e.g. place them on lowest used disk regardless of node/site).

"Originally this was a 2 node stretched cluster and the Raid0 policy was Set to None-stretched Cluster."
Yes, that was the assumption here as you didn't specify whether using site affinity or not.

"I changed this now to Site mirroring - stretched cluster:"
This is changing your data to FTT=1 across the sites, not FTT=0 and will double your capacity usage for these data - this is all well and fine if this is what you want but if that is the case then you may as well just set it for all data and use MM EA option (which then should state 0B to move to satisfy that MM type).
If it is not what you want (e.g. want data to be stored as FTT=0, have lower storage footprint and rely on HA in the application level) then the above isn't how that would be achieved - that would be achieved by making 2 FTT=0 policies, one pinning data to Preferred site, other pinning data to non-Preferred site, then applying the policies to each half of the VMs (e.g. assuming these are redundant pairs of VMs ,DAG-VM1 gets Preferred policy and DAG-VM2 gets non-Preferred policy), and also configure the appropriate DRS affinity rules (e.g. should/must run rules and anti-affinity).

View solution in original post

TheBobkin · ‎02-03-2023

@mike-p Are you doing Maintenance Mode with 'Ensure Accessibility' option? If yes then this will move any FTT=0 data of that node regardless of whether the VMs are powered-on or not.

If you were for instance running VMs as FTT=0 and had 2 VMs clustered in application later running on cluster you would be better off pinning each of those VMs data to opposing nodes and using Maintenance Mode 'No Action'.

mike-p · ‎02-03-2023

Yes, i choose this option because the other machines should stay available.

TheBobkin · ‎02-03-2023

@mike-p Then is it expected behaviour for that to move all FTT=0 data off that node before entering MM, it does exactly as it says e.g. it ensures object availability (whether they are FTT=0, FTT=1 or FTT=2), for any FTT=0 data stored on that node then it obviously needs to migrate that data off that node to ensure accessibility.

Putting a node in MM with NA (No Action) option should only result in FTT=0 data from that node becoming inaccessible - this can be confirmed by doing MM precheck for the node with NA option and confirming what VMs/objects would become inaccessible, this is also very simple to script from command line (e.g. list the path of the objects that would become inaccessible.

The risk/caveat with doing this however would be that you would want to be sure nothing FTT=1 is non-compliant/reduced-redundancy when this is done (but then again precheck should tell you that they would become inaccessible if MM NA was used).

mike-p · ‎02-05-2023

Hi, i tested it.

the objects of both DAG VMs are distributed over the nodes. If I put one node in maintenance mode with the NA option I loose objects of the active vm too.

This is a bit weird because this 2 nodes are in different fault domains. I expected that the data will be placed inside one FD.

I think I found the reason for this. Originally this was a 2 node stretched cluster and the Raid0 policy was Set to None-stretched Cluster. I changed this now to Site mirroring - stretched cluster:

"Defines whether to use standard, stretched or 2 node cluster. In case of stretched clusters whether data is mirrored at both sites (Site mirroring) or whether it's constrained within only one of the sites in the cluster. In case of 2 node cluster the data is mirrored at both hosts."

I will wait until the object resync will be executed and check the MM with NA again.

TheBobkin · ‎02-07-2023

@mike-p

"This is a bit weird because this 2 nodes are in different fault domains. I expected that the data will be placed inside one FD."
No, actually this is expected as this policy doesn't explicitly pin the data to Preferred/non-Preferred site so it will place components purely based on capacity-disk usage (e.g. place them on lowest used disk regardless of node/site).

"Originally this was a 2 node stretched cluster and the Raid0 policy was Set to None-stretched Cluster."
Yes, that was the assumption here as you didn't specify whether using site affinity or not.

"I changed this now to Site mirroring - stretched cluster:"
This is changing your data to FTT=1 across the sites, not FTT=0 and will double your capacity usage for these data - this is all well and fine if this is what you want but if that is the case then you may as well just set it for all data and use MM EA option (which then should state 0B to move to satisfy that MM type).
If it is not what you want (e.g. want data to be stored as FTT=0, have lower storage footprint and rely on HA in the application level) then the above isn't how that would be achieved - that would be achieved by making 2 FTT=0 policies, one pinning data to Preferred site, other pinning data to non-Preferred site, then applying the policies to each half of the VMs (e.g. assuming these are redundant pairs of VMs ,DAG-VM1 gets Preferred policy and DAG-VM2 gets non-Preferred policy), and also configure the appropriate DRS affinity rules (e.g. should/must run rules and anti-affinity).

mike-p · ‎02-14-2023

Thanks for your help,

I found the according documentation:

https://core.vmware.com/resource/vsan-stretched-cluster-guide#sec7306-sub5:~:text=The%20following%20....

All

FTT0 Object Placement and Maintenance Mode