I vaguely recall seeing this at some point, e.g. it looked like it was using the last used variance threshold as you noted, IMO it *should* use the default variance threshold always and I will see about getting that changed (or at least documented) if that is the case (once I'm back from travels). In current state if you start manually using -v .30 then immediately stop it, does retest show green or does it just not allow and say nothing to rebalance?
Thank you for the reply. What's interesting is that vsan.proactive_rebalance_info from the CLI actually reports correctly:
Proactive rebalance is not running!
Max usage difference triggering rebalancing: 30.00%
Average disk usage: 62.00%
Maximum disk usage: 73.00% (19.00% above minimum disk usage)
Imbalance index: 8.00%
No disk detected to be rebalanced
I suspect that if I run it via the webclient, it will quickly update and go green. (I'm pretty sure I did this in the past) I'll give your suggestion a try this evening from the CLI and see if that clears it up as well. Thanks!
The quick start/stop of the rebalance from the CLI using a 30% variance does in fact reset the webclient back to the default variance and thus goes green in our case. (I've added the start/stop sequence to our script at the end)
Another observation, our maximum variance is presently at 20%. Last evening I ran a rebalance with a variation target of 10%. Based on the graphs, very little resync behavior was noticed last evening and we're still at 20% 13 hours later. There seems to be point of diminishing returns as far as reducing the variance. My understanding was that vSAN should be able to chunk up the components as needed in order to achieve the rebalance but that doesn't appear to be occuring. Any thoughts on this? I've seen this in the past as well. (No matter how low you set the variance, you won't make any further progress) Thanks again!
Actually I'm not positive if it will chunk stuff up (e.g. further RAID0 LSOM-Objects) for proactive rebalance - should be relatively easy confirmed by looking at the structure of some displaced Objects (e.g. has striping while relatively low size). Let's say it doesn't chunk, the problem with relatively lower % rebalance then comes down to the size of the Capacity-tier devices, the size of the data-components residing on them and the available Fault Domains (and their fullness) - e.g. moving a 200GB component from one disk to another may just displace where the imbalance is.