I'm currently running vSAN 6.2 in our environment which is comprised of 6 hosts. All SAN health checks have passed except for the Cluster. It was giving me a SAN Disk Balance warning with the option to rebalance the disks. I ran the rebalance option and now the task is stuck at 5%. Any ideas on how to kill this task? It's been running for over 24 hours
In the same section where you triggered the rebalance (Cluster > Monitor tab > Virtual SAN > Health > Cluster > Virtual San Disk Balance) there will be a button to stop the process.
FYI I've had to rebalance a couple of times recently. As far as I remember the Task entry never moves past 5%. It stays there until rebalance is finished and then the task gets marked as Complete and eventually disappears from Recent Tasks.
I monitored rebalance progress by exporting data from the "Disk Balance" tab to Excel and calculating the total amount of data left to move. A bit clunky but it worked for me.
A "Total data to move" field in this view would be helpful.
Good morning, that good info. I believe you should be able to monitor the process through RVC as well. VSAN 6.0 Part 9 - Proactive Re-balance - CormacHogan.com Thank you, Zach.
The task stops automatically after 24 hours. Only then will it go to 100% / Complete. Alternatively, stop it manually with the "stop rebalance" button. But the 5% thingy stays on 5% until the job is stopped either by you or after 24 hours. It's a bug.
By the way, the slightest imbalance (even 1%) triggers the "imbalanced health alert". Duncan Epping already reported it to dev. Hopefully, these sort of "bugs" get squashed in some next release.
Yea it must be a bug since I stopped the rebalance task a few days ago and the rebalance task is still showing at 5%.
Hello,
Good day !
It seems this issue is only reported so far on vCenter 6. 0 U2.
Cause:
By default when triggered from the VC UI, the task will run for 24 hours or whenever the rebalance effort is done, whichever comes first.
Workaround:
Resolution:
There is no resolution for this issue as of now.
The fix will be addressed at vCenter server side mostly involving the health service plugin. Hopefully in 6.0 U3
same problem here... synchronization has apparently ended (no more Warning in Monitor Tab), but the task stuck in 5%.
I'll try the vpradeep01 workaround
Thanks!
Sure.
Correction:
In order to kill this task when it is stuck, you need restart the vpxd and health service on vcenter server else reboot the vc.
The task will clear on its own. Not ideal and hopefully this is streamlined in future releases.
I would much rather let the task entry run its course than restart services on servers running production workload. (This of course assuming there is production workload running)
Thanks,
Matt
This should help clarify what's going on. https://greatwhitetec.com/2016/10/12/vsan-proactive-rebalance/
I happen to try putting one host in maintenance mode and then putting it back. Looks like it cleared it and the task completed.
I'm running all 6.0 and 6.2
Thanks for the reference. Yes... when you kick the proactive rebalance you are opening a 24 hour window for the rebalance to rake place. Use RVC to track progress. UI does not refresh this status. Automatic rebalance kicks in when you hit 80% of drive capacity. This can be changed under advanced settings for each host.
you can view state by rvc command vsan.proactive_rebalance_info 0
Please upgrade your vcenter to the latest available patch for 6.0 Udpate 3 and hosts to the latest 6.0 Update 3 patch 06 which has addressed this problem with few other critical fixes .
KB2146345 - ESXi host experiences a PSOD due to a vSAN race condition
KB2145347 - Component metadata health check fails with invalid state error
KB2150189 - vSAN de-staging may cause a brief PCPU lockup during heavy client I/O
KB2150395 - Bytes to sync values for RAID5/6 objects appear incorrectly in vCenter and RVC
KB2150396 - Using objtool on a vSAN witness node may result in a PSOD
KB2150390 - Health check for vSAN vmknic configuration may display a false positive
KB2150389 - SSD congestion may cause multiple virtual machines to become unresponsive
KB2150387 - vSAN Datastores may become inaccessible during log or memory congestion
KB2151127 - vsan and Vmware Boot bank critical fix
KB2151132 - vsan and Vmware Boot bankcritical fix