In the same section where you triggered the rebalance (Cluster > Monitor tab > Virtual SAN > Health > Cluster > Virtual San Disk Balance) there will be a button to stop the process.
FYI I've had to rebalance a couple of times recently. As far as I remember the Task entry never moves past 5%. It stays there until rebalance is finished and then the task gets marked as Complete and eventually disappears from Recent Tasks.
I monitored rebalance progress by exporting data from the "Disk Balance" tab to Excel and calculating the total amount of data left to move. A bit clunky but it worked for me.
A "Total data to move" field in this view would be helpful.
The task stops automatically after 24 hours. Only then will it go to 100% / Complete. Alternatively, stop it manually with the "stop rebalance" button. But the 5% thingy stays on 5% until the job is stopped either by you or after 24 hours. It's a bug.
By the way, the slightest imbalance (even 1%) triggers the "imbalanced health alert". Duncan Epping already reported it to dev. Hopefully, these sort of "bugs" get squashed in some next release.
Yea it must be a bug since I stopped the rebalance task a few days ago and the rebalance task is still showing at 5%.
Good day !
It seems this issue is only reported so far on vCenter 6. 0 U2.
- The task is set to 1 percent completed when the task is created.
- The task is set to 5 percent completed upon issue the command to rebalance the cluster.
- It then wait for the rebalance to complete before setting the percent done to 100.
- During the waiting period, it will check to see if rebalance is done (via clom-tool command). If not done, it will sleep for 100 seconds and check again if rebalance is done.
- The logic to update the percentage completed is not implemented yet. Therefore, the task is stuck as 5% until it is completed which will then set to 100%.
By default when triggered from the VC UI, the task will run for 24 hours or whenever the rebalance effort is done, whichever comes first.
- In order to kill this task when it is stuck, you need restart the vpxd and health service on all the hosts ( /etc/init.d/vmware-vsan-health restart )
- Restarting vpxd service will clear the rebalance task that is stuck at the UI and restarting vsan-health service after vpxd restart will prevent future rebalance task been stuck for days (UI side).
- Use rvc - vsan.disks_stats to current disk usage.
There is no resolution for this issue as of now.
The fix will be addressed at vCenter server side mostly involving the health service plugin. Hopefully in 6.0 U3
In order to kill this task when it is stuck, you need restart the vpxd and health service on vcenter server else reboot the vc.
The task will clear on its own. Not ideal and hopefully this is streamlined in future releases.
I would much rather let the task entry run its course than restart services on servers running production workload. (This of course assuming there is production workload running)
I happen to try putting one host in maintenance mode and then putting it back. Looks like it cleared it and the task completed.
I'm running all 6.0 and 6.2
Thanks for the reference. Yes... when you kick the proactive rebalance you are opening a 24 hour window for the rebalance to rake place. Use RVC to track progress. UI does not refresh this status. Automatic rebalance kicks in when you hit 80% of drive capacity. This can be changed under advanced settings for each host.
you can view state by rvc command vsan.proactive_rebalance_info 0
Please upgrade your vcenter to the latest available patch for 6.0 Udpate 3 and hosts to the latest 6.0 Update 3 patch 06 which has addressed this problem with few other critical fixes .
KB2146345 - ESXi host experiences a PSOD due to a vSAN race condition
KB2145347 - Component metadata health check fails with invalid state error
KB2150189 - vSAN de-staging may cause a brief PCPU lockup during heavy client I/O
KB2150395 - Bytes to sync values for RAID5/6 objects appear incorrectly in vCenter and RVC
KB2150396 - Using objtool on a vSAN witness node may result in a PSOD
KB2150390 - Health check for vSAN vmknic configuration may display a false positive
KB2150389 - SSD congestion may cause multiple virtual machines to become unresponsive
KB2150387 - vSAN Datastores may become inaccessible during log or memory congestion
KB2151127 - vsan and Vmware Boot bank critical fix
KB2151132 - vsan and Vmware Boot bankcritical fix