After manually running a proactive rebalance from the vCenter CLI (rvc -c "vsan.proactive_rebalance -s -v 0.10 -t 259200 -r 200000 ......), we've noticed that the webclient will stil display a warning even though the maximum variance is below the default of 30%. Rather than the healthcheck basing the threshold on the default 30% and thus turning green, it seems to be basing it on the variance threshold that was last used from the CLI. (10%) Is this expected behavior? I would have expected the webclient healthcheck to always base it on the default of 30%. I suspect that if I now ran the rebalance from the webclient, it would quickly update and turn green. Just trying to get a better understanding on this. We'd like to be able to automate the rebalances from the CLI and target a 10% threshold and also have the healtcheck report OK (green). Otherwise, even though we're below the 30% threshold, the healtcheck is going to continue to display a warning. Thank you.
Update: I did try running a "retest" from the webclient but it still displays a warning, even with a maximum variance currently at 19%.
I vaguely recall seeing this at some point, e.g. it looked like it was using the last used variance threshold as you noted, IMO it *should* use the default variance threshold always and I will see about getting that changed (or at least documented) if that is the case (once I'm back from travels). In current state if you start manually using -v .30 then immediately stop it, does retest show green or does it just not allow and say nothing to rebalance?
Thank you for the reply. What's interesting is that vsan.proactive_rebalance_info from the CLI actually reports correctly:
Proactive rebalance is not running!
Max usage difference triggering rebalancing: 30.00%
Average disk usage: 62.00%
Maximum disk usage: 73.00% (19.00% above minimum disk usage)
Imbalance index: 8.00%
No disk detected to be rebalanced
I suspect that if I run it via the webclient, it will quickly update and go green. (I'm pretty sure I did this in the past) I'll give your suggestion a try this evening from the CLI and see if that clears it up as well. Thanks!
The quick start/stop of the rebalance from the CLI using a 30% variance does in fact reset the webclient back to the default variance and thus goes green in our case. (I've added the start/stop sequence to our script at the end)
Another observation, our maximum variance is presently at 20%. Last evening I ran a rebalance with a variation target of 10%. Based on the graphs, very little resync behavior was noticed last evening and we're still at 20% 13 hours later. There seems to be point of diminishing returns as far as reducing the variance. My understanding was that vSAN should be able to chunk up the components as needed in order to achieve the rebalance but that doesn't appear to be occuring. Any thoughts on this? I've seen this in the past as well. (No matter how low you set the variance, you won't make any further progress) Thanks again!
Actually I'm not positive if it will chunk stuff up (e.g. further RAID0 LSOM-Objects) for proactive rebalance - should be relatively easy confirmed by looking at the structure of some displaced Objects (e.g. has striping while relatively low size). Let's say it doesn't chunk, the problem with relatively lower % rebalance then comes down to the size of the Capacity-tier devices, the size of the data-components residing on them and the available Fault Domains (and their fullness) - e.g. moving a 200GB component from one disk to another may just displace where the imbalance is.