VMware Cloud Community
lux209
Contributor
Contributor

stretched cluster - full site maintenance

Hello,

I'm running a vsan 6.5 stretched cluster and I would need to put an entire site in maintenance for 1 day for a power maintenance.

Our cluster has 5 nodes on each site with a FTT=1 and a third site running the witness. I'm going to put the preferred site in maintenance, and I did not found any official doc on the best way to do it but I found a couple of post with some answers:

Doing maintenance on a Two-Node (Direct Connect) vSAN configuration - Yellow Bricks

Re: How to safely shut down one side of a 2+2 node stretched cluster

I'm planning to apply this:

- check vsan health

- change the preferred site (fault domain) to the one that will remain UP

- temporally stop DRS and manually move all vm on the remaining site/hosts (doing it manually to avoid DRS to move the vm on hosts in the same site)

- place all hosts on the site that will shutdown in maintenance mode using "ensure accessibility"

- Restart DRS

- Shutdown the hosts

Does it make sense ?

Do you know if vsan will also re-apply the FTT=1 policy and resync the vm within the same site after 1 hour ? If yes I guess it will kill the performance as we have over 60TB of used space (showing in cluster vsan capacity overview so 30TB to resync I guess ?) I currently have 108TB of free space in the whole cluster so 54TB with half nodes down, it should have enough space to re-sync locally if needed.

Thank you !

0 Kudos
2 Replies
TheBobkin
Champion
Champion

Hello lux209​,

Your plan in general seems fine - though as always, make sure you take and verify back-ups before proceeding with this as a single disk failure while running on reduced availability may result in data loss.

"Do you know if vsan will also re-apply the FTT=1 policy and resync the vm within the same site after 1 hour ?"

If you just have PFTT=1 as your Storage Policy then no it will not as it will not have enough Fault Domains to perform this (site+site+Witness as essentially the FDs for this)

You can set the clom repair delay to a higher value if you need peace of mind that it won't try to do this.

https://kb.vmware.com/s/article/2075456

Bob

0 Kudos
lux209
Contributor
Contributor

Thanks Bob !

0 Kudos