VMware Cloud Community
Bruticusmaximus
Enthusiast
Enthusiast

VSAN resync keeps running as files are restored to VM

We're running VSAN with ESXi v5.5.  I know, it's out of support.  We had an issue with a power outage a while back.  We lost our 4TB file server VM.  We're to a point where we're restoring files from a USB drive.  We had an issue where the VM powered off and got "Not enough disk space...." when powering on.  Support said that it was because a resync was running and using all the free space in the background.  Once the resync finished, I could power on the VM.  All was good .... until .... we started the restore again. Then, a resync started again. And, we ran out of space, VM powered off, etc

Why would a resync keep starting up?

Could it be because we're restoring a lot of data all at once?

Could it be because the drives on this file server are thin provisioned?

Tags (1)
0 Kudos
2 Replies
larstr
Champion
Champion

Bruticusmaximus,

There are several factors that may trigger a resynch. At least make sure all your disk groups are well under 80% before starting the restore.

"Resynchronization is triggered when capacity device utilization in the Virtual SAN cluster that approaches or exceeds the threshold level of 80 percent."

Also see this article:

https://occasional-it.com/2018/02/04/increasing-vsan-resynch-rebuild-performance/

Lars

0 Kudos
TheBobkin
Champion
Champion

Hello Bruticusmaximus​,

"We're running VSAN with ESXi v5.5.  I know, it's out of support."

Please don't be shy and call us if you have not done so already - despite it being OOS we generally help where we can, if you have found otherwise please PM me details.

"Support said that it was because a resync was running and using all the free space in the background."

If the hardware supports later ESXi and vSAN do consider upgrading sooner rather than later - vSAN 5.5 and earlier 6.0 had some bad resync issues such as under flakey network conditions it could be resyncing to a component, temporarily lose access to the node with this component and start rebuilding a whole new component and not clean up the transient components until resync completed for the Object, thus with an unstable enough network it could potentially result in a looping resync. Potentially here the USB cannot provide consistent and constant reads and is becoming unavailable and it is starting over each time.

If this is the case and the issue is not organic (e.g. you  simply don't have enough space in usable Fault-Domains to be compliant with the Storage Policy) then it should be fairly apparent by the Object having a number of non-active components it is not actively resyncing to (can't recall if they had 'Transient' flag in 5.5) which can be checked using vsan.vm_object_info <pathToVm>. If it is just a case of not having enough space then consider options to free up space or consider (temporarily) restoring the VM as FTT=0.

How much space do you currently have free on the vsanDatastore, how many nodes and Disk-Groups and how much free space per Disk-Group? (RVC vsan.disks_stats <pathToCluster> is your friend here).

Bob

0 Kudos