Solved: Failing to consolidate disks after sudden powerdow...

HCBV · ‎01-15-2021

Hello all,

I had an unexpected powerdown of a VM. The cause is still unknown, but after powering up the VM it shows a notice that consolidation is needed.

Now one of the disks that is attached to the machine wont consolidate, both online and offline it won't consolidate. It finishes and shows no warning, but it isn't consolidated.

What I tried already:

- checked disk chain cid/parentcid - they match

- checked all vmdk files - they look good

- moved ctk files to a temp directory and tried to consolidate again - fails

- cloned disk from latest delta file to a new disk on another datastore - fails at 100% with a bad descriptor error

During startup of the machine I see this error in the vmware.log:

2021-01-15T06:13:47.043Z| vmx| I005: DISKLIB-LIB_BLOCKTRACK : Resuming from change tracking info file /vmfs/volumes/5f4e5b2f-1bc05730-c0c9-f86eee1a35ca/DC02/DC02_1-000005-ctk.vmdk.
2021-01-15T06:13:47.061Z| vmx| I005: DISKLIB-CBT : Initializing ESX kernel change tracking for fid 2294665.
2021-01-15T06:13:47.061Z| vmx| I005: DISKLIB-CBT : Successfuly created cbt node 230389-cbt.
2021-01-15T06:13:47.061Z| vmx| I005: DISKLIB-CBT : Opening cbt node /vmfs/devices/cbt/230389-cbt
2021-01-15T06:14:38.623Z| vmx| I005: OBJLIB-FILEBE : FileBEIoctl: ioctl operation IOCTLCMD_VMFS_DELTADISKS(3033) failed on '/vmfs/devices/deltadisks/66037d-DC02_1-000005-sesparse.vmdk' : Bad file descriptor (589826)
2021-01-15T06:14:38.623Z| vmx| I005: DISKLIB-VMFS :VmfsSparseExtentCommonGetAllocatedSectorChunks: ObjLib_Ioctl failed 0x90009
2021-01-15T06:14:38.623Z| vmx| I005: DISKLIB-LIB_MISC : DiskLibGetAllocatedSectorChunksInRangeInt: failed to get allocated sector bitmap with 'Bad file descriptor' (589833).
2021-01-15T06:14:38.623Z| vmx| W003: DISKLIB-CBT : ChangeTrackerESX_MarkAllUsedAreas: Failed to get allocated sectors: Bad file descriptor.
2021-01-15T06:14:38.623Z| vmx| I005: DISKLIB-LIB : Opened "/vmfs/volumes/5f4e5b2f-1bc05730-c0c9-f86eee1a35ca/DC02/DC02_1-000005.vmdk" (flags 0xa, type vmfs).
2021-01-15T06:14:38.623Z| vmx| I005: DISK: Disk '/vmfs/volumes/5f4e5b2f-1bc05730-c0c9-f86eee1a35ca/DC02/DC02_1-000005.vmdk' has UUID '60 00 c2 9f b8 31 4c 5f-ac 9a 9a b0 79 2e 71 67'
2021-01-15T06:14:38.623Z| vmx| I005: DISK: OPEN '/vmfs/volumes/5f4e5b2f-1bc05730-c0c9-f86eee1a35ca/DC02/DC02_1-000005.vmdk' Geo (104433/255/63) BIOS Geo (0/0/0)
2021-01-15T06:14:38.625Z| vmx| I005: UTIL: Change file descriptor limit from soft 16614,hard 16614 to soft 16729,hard 16729.
2021-01-15T06:14:38.626Z| vmx| I005: DISK: Opening disks took 51882 ms.

However, I do not know what this file /vmfs/devices/deltadisks/66037d-DC02_1-000005-sesparse.vmdk is. The latest delta file is DC02_1-000005.vmdk but what is this file in the location /vmfs/devices/deltadisks? This file appears to be the exact same size as the original disk.

Should I run "vmkfstools -x check /vmfs/devices/deltadisks/66037d-DC02_1-000005-sesparse.vmdk" to see if this file is damaged? I didn't check this file yet, I only checked the files inside the VM folder.

Perhaps there is someone here with more understanding of this problem then me, any help is very much appreciated.

I am also currently trying to clone delta DC02_1-000004.vmdk to see what happens.... but I know that when the unexpected powerdown occured there was only DC02_1-000001.vmdk, so I am assuming the error is in that file. But to be sure I am now trying to clone all of them to see if any of them might work.

HCBV · ‎01-23-2021

The issue was apparently caused by a bug in TrueNAS 12.0 U1. Something with async writes caused corruption.

This is the bug report/fix from TrueNAS: https://jira.ixsystems.com/browse/NAS-108627

I never managed to consolidate the disks. But I managed to solve the issue with back-ups.

I made a new back-up from the content of the disk that failed to consolidate. So I knew for sure that I had all latest content. Then I deleted the disk, used a back-up to restore the disk one day prior to the corruption. Then I merged all the latest content into this disk.

Perhaps my strategy to make a new "content" back-up and then merging the content into an older "disk" back-up will help someone else. Therefore I am posting this, I hope this can be of help for anyone else.

View solution in original post

HCBV · ‎01-23-2021

The issue was apparently caused by a bug in TrueNAS 12.0 U1. Something with async writes caused corruption.

This is the bug report/fix from TrueNAS: https://jira.ixsystems.com/browse/NAS-108627

I never managed to consolidate the disks. But I managed to solve the issue with back-ups.

I made a new back-up from the content of the disk that failed to consolidate. So I knew for sure that I had all latest content. Then I deleted the disk, used a back-up to restore the disk one day prior to the corruption. Then I merged all the latest content into this disk.

Perhaps my strategy to make a new "content" back-up and then merging the content into an older "disk" back-up will help someone else. Therefore I am posting this, I hope this can be of help for anyone else.

All

Failing to consolidate disks after sudden powerdown