VMware Cloud Community
mdymes
Contributor
Contributor

VM Stuck writing to snapshot delta - Unable to consolidate Disks

Hi,

hoping for a little help,  I will post as much information as I think will help.

We have a vm here that is stuck on a snapshot disk. Disk File = [VM-ESX-OS] 37-ZCM-SQL-CL02/37-ZCM-SQL-CL02-000007.vmdk

At one point, the vsphere ui showed snapshots in the chain, but now is not showing any,  so a consolidate would not be possible (was failing when tried)

All disk actions have been taken when the VM is powered off

Started following VMware KB 10004545  to consolidate snapshots, VMware KB: Committing snapshots in vSphere when more than 32 levels of snapshots are present fails w...

here is the command that I ran..

vmkfstools -i 37-ZCM-SQL-CL02-000007.vmdk /vmfs/volumes/VM-ESX-OS2/sql-02-restore/37-ZCM-SQL-CL02-restore.vmdk -v 10

resulted in the following output  (last few lines)

Clone: 34% done.OBJLIB-FILEBE : FileBEIoctl: ioctl operation failed on '/vmfs/devices/deltadisks/41e62a3b-37-ZCM-SQL-CL02-000007-delta.vmdk' : Bad file descriptor (589826)

DISKLIB-VMFS :Vmfs_MoveData : failed to move data (Bad file descriptor:0x90009).

DISKLIB-LIB   : DiskLibCopyDataInt: Failed to copy using vmkernel data mover; falling back to non-accelerated copy. Bad file descriptor

Clone: 40% done.DISKLIB-LIB   : numIOs = 100000 numMergedIOs = 0 numSplitIOs = 0

DISKLIB-LIB   : numIOs = 100000 numMergedIOs = 0 numSplitIOs = 0

Clone: 45% done.FileIOErrno2Result: Unexpected errno=9, Bad file descriptor

DISKLIB-LIB   : RWv failed ioId: #260760 (589833) (9) .

DISKLIB-LIB   : DiskLibCopyDataInt failed with Bad file descriptor.

DISKLIB-VMFS  : "/vmfs/volumes/VM-ESX-OS2/sql-02-restore/37-ZCM-SQL-CL02-restore-flat.vmdk" : closed.

DISKLIB-VMFS  : "/vmfs/volumes/VM-ESX-OS2/sql-02-restore/37-ZCM-SQL-CL02-restore-flat.vmdk" : open successful (1041) size = 85899345920, hd = 0. Type 3

DISKLIB-VMFS  : "/vmfs/volumes/VM-ESX-OS2/sql-02-restore/37-ZCM-SQL-CL02-restore-flat.vmdk" : closed.

DISKLIB-LIB   : Failed to clone : Bad file descriptor (589833).

Failed to clone disk: Bad file descriptor (589833).

My vmdk files are attached.

Tags (4)
0 Kudos
5 Replies
a_p_
Leadership
Leadership

Welcome to the Community,

I'm not sure whether this is an issue with the metadata (grain tables) in one of the delta file(s). What you may do - with the VM powered off - is to run a check on the .vmdk file(s) to see whether it reports errors/details, e.g.:

vmkfstools --fix check 37-ZCM-SQL-CL02-000007.vmdk

This command is supposed to check the disk for errors, but not fix them. In order to fix errors you'd have to replace check option with repair. Please make sure you make yourself familiar with the command, and backup the .vmdk file (as well as it's associated delta file) prior to running the command with the repair option.

André

0 Kudos
mdymes
Contributor
Contributor

Andre,

Thanks for the post.  I should have mentioned that I did run that against all the .vmdk files, all report "Disk is error free".

0 Kudos
FritzBrause
Enthusiast
Enthusiast

I checked the snapshot chain (CID and parentCID) and they look ok.

Note: Some have CID=parentCID, but this is fine since that snapshot was taken when the VM was powered off.

So there is nothing that must be corrected because of an invalid snapshot chain.

What I would suggest:

You did the cloning with the latest snapshot file 37-ZCM-SQL-CL02-000007.vmdk.

Try cloning going down the chain, so next would be 37-ZCM-SQL-CL02-000006.vmdk

The chain is:

37-ZCM-SQL-CL02-000007.vmdk

37-ZCM-SQL-CL02-000006.vmdk

37-ZCM-SQL-CL02-000005.vmdk

37-ZCM-SQL-CL02-000004.vmdk

37-ZCM-SQL-CL02-000003.vmdk

37-ZCM-SQL-CL02-000001.vmdk

37-ZCM-SQL-CL02-000002.vmdk

37-ZCM-SQL-CL02.vmdk

If cloning starting from one of those works, you know that the next one above is corrupt.

So for instance if cloning from from 37-ZCM-SQL-CL02-000005.vmdk works, probably 37-ZCM-SQL-CL02-000006.vmdk is corrupt.

First question is then: Is the data from those newer snapshots needed? Maybe they do not contain new content.

If they are definitely needed, you could rename descriptor file 37-ZCM-SQL-CL02-000006.vmdk (just as an example here) and recreate it.

Use KB 1026353 for this. But since you have the content of the file already (which seems to be ok), just create a new file and write the same content into it.

Maybe the file is simply somehow corrupt.

If all this does not work, you could do a V2V conversion with VMware converter.

0 Kudos
mdymes
Contributor
Contributor

Fritz,

Thanks for the help.  All cloning failed untill I got to snapshot 2 (the first one), I was able to clone from that snapshot. I am going to power it up with the network cards disconnected to inspect it. Honestly,  not sure how old that is though.  As this VM is a cluster member,  I way just kick it out of the cluster and build a new member.

Good suggestion on the v2v. Tried twice using the standalone converter and it failed around 45%   Thanks for the help.

0 Kudos
FritzBrause
Enthusiast
Enthusiast

Good.

You can check timestamps and size of the flat files (from the snapshots).

If they are small and same day, probably not much in there.

Some were even taken when VM was powered off.

If there is not much in the other snapshots, you can go with what you have consolidated now.

0 Kudos