VMware Cloud Community
BillyK66
Contributor
Contributor

Proactive rebalance via RVC (vsan.proactive_rebalance)

Hello,

I just started dabbling with vsan.proactive_rebalance as it gives me much better control over a proactive rebalance versus performing the rebalance via the vsphere web client.

Question:  Does anyone know what the default --rate-threshold is when performing the rebalance via the web client?  Is this configurable?  I'm just trying to ensure that when I run it via RVC it runs at a faster rate than from the web client. (Right now I'm just kind of guessing and monitoring, 200GB/Hour, 1TB/Hour, etc.)

Thanks,

- Bill

0 Kudos
3 Replies
TheBobkin
Champion
Champion

Hello Bill,

I think it is actually variable depending on contention (e.g. it will move at a slower rate when there is more VM traffic/resync etc.)- a lot of IO queues in vSAN  work on a priority basis that is not visible nor configurable via the GUI.

If you want it to move X amount of data in Y amount of time, get a rough estimate of how much needs to be moved per host (at the desired % disparity threshold) using vsan.proactive_rebalance_info to get baseline info and then set the rate-threshold and run-time.

By the way, curious to know how much space you saved and dedupe ratio following https://communities.vmware.com/thread/590440 but you never updated.

Bob

0 Kudos
BillyK66
Contributor
Contributor

Hello Bob,

Thanks you for the prompt response.  One behavior I noticed is that when you run vsan.proactive_rebalance, it seems to continue to run until you stop it, even though you've already achieved the desired variance and there is no resync activity visible.  Is that expected behavior?  I've running again right now but this time I'm using the "-t" switch to limit it to 2-hours.  I'm just curious if you need to tell it when to stop.

Regarding dedupe, I communicated the thick object concern to my management but at this point there is no plan to rebuild those objects as thin so that we can befit from dedup.  We will start to use thin objects and OSR=0% going forward however.  Presently I'm seeing a 1.19x ratio.

USED BEFORE: 377.59 TB

USED AFTER: 318.51 TB

SAVINGS: 59.08 TB

Regards,

- Bill

0 Kudos
TheBobkin
Champion
Champion

Hello Bill,

"I noticed is that when you run vsan.proactive_rebalance, it seems to continue to run until you stop it, even though you've already achieved the desired variance and there is no resync activity visible."

Probably running for the default length (24hrs), you can see when it will stop from the 2nd line returned from vsan.proactive_rebalance_info e.g. 'Proactive rebalance stop: 2018-06-22 19:55:01 UTC'.

"Regarding dedupe, I communicated the thick object concern to my management but at this point there is no plan to rebuild those objects as thin so that we can befit from dedup"

I don't think this kind of job is something that should be aimed to be done over the course of a few days/week as it can be resource intensive (both on the cluster and the person doing it!), maybe worth considering each time something is re-provisioned/restored/as part of other changes or updates - then again this generally comes down to how change-management is done in your shop.

Whatever you *did* change appears to be a step in the right direction though:

Last week:

USED BEFORE: 329TB

USED AFTER: 317

SAVINGSS: ~12TB

1.04x ratio

Now:

USED BEFORE: 377.59 TB

USED AFTER: 318.51 TB

SAVINGS: 59.08 TB

1.19x ratio

Bob

0 Kudos