VMware Cloud Community
Gauteal
Contributor
Contributor

Rebalance storage load due to unused space <30% - Describe performance and/or availability impact

We have a VSAN 6.5 with 9 nodes which is currently at 69% storage utilization and 204TB, with +320 VMs.

Future growth is a bit difficult to predict.

According to VMware sizing guidelines;

Keep at least 30 percent unused space to prevent Virtual SAN from rebalancing the storage load.

Virtual SAN rebalances the components across the cluster whenever the consumption on a single capacity device reaches 80 percent or more.

The rebalance operation might impact the performance of applications. To avoid these issues, keep storage consumption to less than 70 percent.

Could anyone shed any light on how these rebalancing storage load operations would affect the VMs from an availability and performance standpoint?

Would it look like the VM is unavailable or just not responding? Would it affect all VMs or just the subset of VMs on the host >80% load?

At what additional risk is the VSAN when storage load is at 70% vs 80% used space?

Looking for advice and experience.

regards

Gaute

1 Reply
TheBobkin
Champion
Champion

Hello Gaute,

Welcome to posting on vSAN Community, I assume you intended to post this as a 'question' as opposed to 'discussion' but nonetheless:

"Could anyone shed any light on how these rebalancing storage load operations would affect the VMs from an availability and performance standpoint?"

Reactive Rebalance would not in any way affect the availability of the underlying data-components - vSAN starts moving data off individual capacity-tier devices once they reach 80% utilised, provided there are disks with utilisation under 80% they can move the data to (at 80% when using default settings - this can be changed but understand what the implications of what you are doing if modifying this).

As with any non-compliance-related-reconfig or 'data-move' vSAN doesn't discard the original data until it has recreated the data in its new location (on a drive <80% used here) - this temporarily uses what could be considered excess space during this process e.g. 3 copies of the data instead of 2 (for FTT=1 data).

Coincidentally this is one of the primary rationales behind why VMware advise 25-30% slack space, so as to allow space for reconfiguring Objects (e.g. changing from FTM RAID-1 to RAID-5 or changing Striping), the other main reason being recovery space in the event of a node/controller/disk-group failure.

"Would it look like the VM is unavailable or just not responding? Would it affect all VMs or just the subset of VMs on the host >80% load?"

No, as per the above this shouldn't have any negative impact aside from the fact that if you have a lot of disparity in usage on disks then it will try to move stuff once individual disks hit 80% and this can add a lot of extra read and write IOs to the cluster - if all is even and at >80% it won't try to move anything (as what would be the point).

"At what additional risk is the VSAN when storage load is at 70% vs 80% used space?"

If you have disks at ~75% and others at 81% it will try to reactive rebalance and this can add a lot of additional reads and writes on the cluster (and without warning and during business hours etc.), provided you are not going crazy changing Storage Policies and understand how the slack space is used when doing this (e.g. not trying to change all data in the cluster at once...) you could potentially be fine with less than the recommended slack space, you only need ~11% to rebuild a single node failure in a 9-node cluster, however going to 89% (in a 9-node) is not recommended for various other reasons such as the fact that you are likely running all Thin-provisioned, no capability to recover from a second failure after the first has rebuilt etc.

I will say that for a 9-node or larger cluster that my personal opinion is that 30% is too conservative but there is a lot of 'it depends' factored into this opinion.

VMware references for more info:

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.virtualsan.doc/GUID-2EC7054E-FBCC-4...

https://storagehub.vmware.com/export_to_pdf/intelligent-rebuilds-in-vsan-6-6

Bob