VSAN resync components stopped automatically

ManivelR · ‎08-24-2020

Hi Team,

One the the VSAN host's DG group got failed.We have fixed the issue and then(after fixing the issue) resync components automatically stopped at 191.45 GB.

We enabled resync throttling as well but no luck.

It seems,it is struck at 191.45 GB.Any idea to fix this issue?

Thanks,

Manivel R

SureshKumarMuth · ‎08-24-2020

Do you see any hostname in the resync status page? Login to the host and check /var/log/clomd.log.

If you see any resync stuck message there..try restarting clomd service.

Regards,
Suresh
https://vconnectit.wordpress.com/

ManivelR · ‎08-24-2020

Yes.I see the hostname as hv04(the same problematic host and fixed the DG failure issue).

I even restarted clomd service and esxi restart but no luck.

TheBobkin · ‎08-24-2020

Hello Manivel,

"We enabled resync throttling as well but no luck."

Even if this was implicated in the issue, this would do the opposite of helping - throttling the resync means assigning less bandwidth to resync traffic so as to avoid contention (e.g. slows down the resync).

Stuck resyncs are generally caused by either disks/Disk-Groups on the target host being out of space or unrecoverable checksum errors on the remaining data replica it is trying to resync from - the former can be checked from the Health UI and the latter via /var/log/vmkernel.log and/or /var/log/vobd.log .

If it is the former then free up some space, if it is the latter then I would advise opening a Support Request with GSS vSAN for further assistance.

Bob

ManivelR · ‎08-25-2020

Hi Bob,

I raised a case with VMware and their response is mentioned below.

VMware response:-

Could find no VSAN reasons for the stuck resync on the VM, restarted clomd on all 6 hosts, did owner abdicate on all hosts and through the vcenter check_state -r -e but still no luck

rest of vsan health looking good, but Host4 was still showing 0 components in their DG that they rebuilt.

VSAN NW NOT Partitioned,

VSAN Objects all State 7 healthy

VSAN Disks all showed Green clusterwide

VSAN shows no inaccessible VMS

Action taken from my side:-

In HV04,there is no physical disk failure and checked physically.Removed 2 disk groups from HV04(each ESXi has 2 DG) and recreated both DG's. After that data components started copying on HV04 as well.

Resync again started from 3.72 TB and it went smooth till 191.45 GB and then it got stuck.

Again the resync stuck in the same 191,45 GB.The same issue happened again(I mean resync stuck).

Culprit is one VM-->The VM has 3 VMdk and the third VMDk has 10 TB of size and the resync stuck there(Hard disk 3).From Windows guest OS,the used space of 3 VMDk is 8.5 TB.

Any other ideas to fix the issue?

My suggestion is to create a new 10 TB(4th VMDK) from that VM.Copy the data from 3rd vmdk to 4th vmdk. It should resolve the issue?

vSAN Resync Stuck

Thanks.

ManivelR

TheBobkin · ‎08-25-2020

Hello Manivel,

Can you PM me the Support Request number? Though please bear in mind that I can make no guarantees that I will be able to assist with this directly in a GSS capacity - I would like to take a look at the logs and data state to see if we can get more insight (provided they are available, usable and I have time to do so and that there is more insight to be gained).

"My suggestion is to create a new 10 TB(4th VMDK) from that VM.Copy the data from 3rd vmdk to 4th vmdk. It should resolve the issue?"

The new Object won't have any form of pending/stuck resync as that will be a blank new Object you are copying the data to, so yes that will 'solve' the resync issue. However there may be a valid reason the resync is not progressing e.g. the current data (or metadata of the data) it is trying to resync from has issues, thus I would advise validating that any data you are copying from the original is intact and usable before removing the original copy.

Bob

All

VSAN resync components stopped automatically