VMware Cloud Community
gecman47
Contributor
Contributor

Recovering from failed snapshot delete

Good morning

I recently tried to delete the first snapshot in a server VM and it failed. The server had virtual disks attached to it and unfortunately since the first snapshot delete failed it's left them in a state where I can open the base disk, from I built the server a year ago, but cannot get to any of the newer data. When I try to open the VM itself I get this error:

error.PNG

The server itself I don't care about, but I'm trying to get it so I can recover the data on the disks, which I didn't set as independent. If I can repair the disks, I can just attach them to a new server with no big deal. Here is a list of files I have for 1/3 4TB virtual disks.

Domain Controller_1-000001-sesparse.vmdk  Domain Controller_1-000002-sesparse.vmdk  Domain Controller_1-000004-sesparse.vmdk  Domain Controller_1-flat.vmdk

Domain Controller_1-000001.vmdk           Domain Controller_1-000002.vmdk           Domain Controller_1-000004.vmdk           Domain Controller_1.vmdk

Is there a way to reassemble these files in such a way as to recover the data? Thanks for the help.

Grant

Reply
0 Kudos
12 Replies
LucianoPatrão

Hi,

Yes it could be possible. You need to repair broken CID-chain. Every vmdk have a CID (Content ID) and the CID of its parent. The parent CID of the Initial Disk is 'ffffffff'

Here is a very detail information about Snapshots and CID-chain http://vmutils.t15.org/TVMsp/TVMsp.html


And also: http://www.sanbarrow.com/vmdk/vmdk-basic-CID-chain-repair.html

So you need to check if one of the snapshot vmdk files CID is broken the chain Domain Controller_1-00000xx.vmdk

You have sparse disks, that is not good in this situation.

Check also: https://kb.vmware.com/kb/1007969

and: https://kb.vmware.com/kb/2045616

Cloning the existing disks could the the solution. But for that you need to fix the CID-Chain first.

Hope this can help

Luciano Patrão

VCP-DCV, VCAP-DCV Design 2023, VCP-Cloud 2023
vExpert vSAN, NSX, Cloud Provider, Veeam Vanguard
Solutions Architect - Tech Lead for VMware / Virtual Backups

________________________________
If helpful Please award points
Thank You
Blog: https://www.provirtualzone.com | Twitter: @Luciano_PT
Reply
0 Kudos
gecman47
Contributor
Contributor

Thanks so much for your response! In this case, I'm really glad I had everything super important backed up elsewhere hahaha. So I spent a chunk of my morning reading up on delta files/CIDs etc. The concepts make sense, but I'm still putting together all the file types and confirming my understanding.

I checked the CIDs this morning and the chain is indeed broken. Are the numbers next to the file associated with the snapshot numbers? I'm a little confused about that because some of the disks I have don't have that, were created at the same time, and aren't independent.

I read a bit about the sparse files and ran out of time. In general what is their purpose? I was kinda confused by the short blurbs I read. They mentioned they were an alternate file system, but I didn't really understand their purpose.

Reply
0 Kudos
a_p_
Leadership
Leadership

Please compress/zip all the .vmdk descriptor files as well as one of the vmware*.log files from the time when the VM still worked, and attach them to a reply post. It will also help if you coudl provide a complete list of files in the VM's folder (e.g. the ouput of ls -lisa).

André

Reply
0 Kudos
gecman47
Contributor
Contributor

a.p.

Files are attached. All of the log files were from today which was rather confusing. The server died yesterday (19 April 2016) around 1400. I'm not sure if they will be of any use so I zipped all of the logs and attached them. The rest of the information requested is in the two text files. Thanks for the assistance.

Grant

Reply
0 Kudos
LucianoPatrão

Hi

You can read more about seSparse (Space Efficient Sparse Virtual Disks) here: http://cormachogan.com/2012/09/05/vsphere-5-1-storage-enhancements-part-2-se-sparse-disks/

Mainly is features to provide the ability to reclaim unused blocks from within the guestOS. Was launch in 5.1

Question: Why do you have so many snapshots in a DCs?? You know that is not recommended to rollback in DC(particularly Global Catalog). To backup a DC it should be done with a proper Backup tool and also to be restored in a proper way.

Luciano Patrão

VCP-DCV, VCAP-DCV Design 2023, VCP-Cloud 2023
vExpert vSAN, NSX, Cloud Provider, Veeam Vanguard
Solutions Architect - Tech Lead for VMware / Virtual Backups

________________________________
If helpful Please award points
Thank You
Blog: https://www.provirtualzone.com | Twitter: @Luciano_PT
Reply
0 Kudos
a_p_
Leadership
Leadership

According to the latest vmware.log file the following virtual disks are connected to this VM:

scsi0:0.fileName = "Domain Controller 2012.vmdk"

scsi0:2.fileName = "/vmfs/volumes/55fcf510-9456b469-8e0a-d050995057a9/Domain Controller/Domain Controller_2-000002.vmdk"

scsi0:3.fileName = "/vmfs/volumes/55fcf53e-b9fd7666-05b7-d050995057a9/Domain Controller/Domain Controller_3-000002.vmdk"

scsi0:4.fileName = "/vmfs/volumes/56013fa9-8262cc6a-0ab1-d050995057a9/Domain Controller 2012/Domain Controller 2012_1-000001.vmdk"

scsi0:5.fileName = "/vmfs/volumes/56013fbd-99c7d9c4-c0c2-d050995057a9/Domain Controller 2012/Domain Controller 2012_2-000001.vmdk"

scsi0:6.fileName = "/vmfs/volumes/56013de1-1340f847-755b-d050995057a9/Domain Controller 2012/Domain Controller 2012_3-000001.vmdk"

In the list above I'm missing "Domain Controller_1.vmdk" which shows up in the file list you provided, and is probably the one missing for "scsi0:1"!?  I further assume that the reason for "Domain Controller 2012.vmdk" as the first virtual disk is that you reconnected this virtual disk using the GUI. What's missing in your previous post are the files lists for the last 3 virtual disks. Please provide this information in case these disks are relevant.

How are the three virtual disks "Domain Controller_1.vmdk" through "Domain Controller_3.vmdk" configured in the guest OS? I'm asking because of the similar file sizes.

André

Reply
0 Kudos
a_p_
Leadership
Leadership

One more question. What's the free disk space on each of the datastores? Do you have other datastore which can temporarily be used - if necessary - for cloning the virtual disks?

André

Reply
0 Kudos
gecman47
Contributor
Contributor

Gotta say getting this info by remoting from a phone on an airplane is a much more painful experience. Yes I have space to back up. Datastore has the main VM hard drive. and then the other hard drives with the virtual hard drives in questiok are each 4TB with plenty of space. though I did thin provision each virtual hd to be 4 TB (I am doing poor mans raid with windows)

[root@esxi:/vmfs/volumes/55fcf4f4-b3de24cd-94da-d050995057a9/Domain Controller] cat Domain\ Controller_1.vmdk
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID=6f9c239d
parentCID=ffffffff
isNativeSnapshot="no"
createType="vmfs"

# Extent description
RW 7730941132 VMFS "Domain Controller_1-flat.vmdk"

# The Disk Data Base
#DDB

ddb.adapterType = "lsilogic"
ddb.geometry.cylinders = "481228"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.longContentID = "129d98696a09a7321f5e78956f9c239d"
ddb.thinProvisioned = "1"
ddb.uuid = "60 00 C2 97 53 bf f7 3c-c7 10 fc 02 97 97 34 8c"
ddb.virtualHWVersion = "11"


-----------

[root@esxi:/vmfs/volumes/55fcf510-9456b469-8e0a-d050995057a9/Domain Controller] cat Domain\ Controller_2-000001.vmdk
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID=bd3b2f15
parentCID=cc556927
isNativeSnapshot="no"
createType="seSparse"
parentFileNameHint="Domain Controller_2.vmdk"
# Extent description
RW 7730941132 SESPARSE "Domain Controller_2-000001-sesparse.vmdk"

# The Disk Data Base
#DDB

ddb.grain = "8"
ddb.longContentID = "7c8cead466fc7cec097d67aabd3b2f15"


---------------


[root@esxi:/vmfs/volumes/55fcf53e-b9fd7666-05b7-d050995057a9/Domain Controller] cat *000001.vmdk
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID=73105565
parentCID=566133b4
isNativeSnapshot="no"
createType="seSparse"
parentFileNameHint="Domain Controller_3.vmdk"
# Extent description
RW 7730941132 SESPARSE "Domain Controller_3-000001-sesparse.vmdk"

# The Disk Data Base
#DDB

ddb.grain = "8"
ddb.longContentID = "4c3368abe5ca68f468256e0c73105565"
[root@esxi:/vmfs/volumes/55fcf53e-b9fd7666-05b7-d050995057a9/

------------------


[root@esxi:/vmfs/volumes/55fb898a-f505dad9-88ac-d050995057a9/Domain Controller 2012] cat *1.vmdk
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID=251de4c0
parentCID=5e4a8ec7
isNativeSnapshot="no"
createType="vmfsSparse"
parentFileNameHint="Domain Controller 2012-000003.vmdk"
# Extent description
RW 167772160 VMFSSPARSE "Domain Controller 2012-000001-delta.vmdk"

# The Disk Data Base
#DDB

ddb.longContentID = "a941068876812580bed3533f251de4c0"

Reply
0 Kudos
a_p_
Leadership
Leadership

Gotta say getting this info by remoting from a phone on an airplane is a much more painful experience.

That might explains why you didn't post all the information I was asking for 😉

Anyway what basically needs to be done to fix a broken snapshot chain, is to ensure that a snapshot's "parentCID" matched the value of its parent's "CID" (do NOT modify a "CID"!). The parent is determined by the "parentFileNameHint". Note that the numbers file names are not necessarily in an ascending order!

So start with the .vmdk file names which you see in the VM's configuration (.vmx) file, and then go through each of its parents up to the base virtual disk.

Once edited you need to reload the VM in order for the changes to take effect (see VMware KB: Reloading a vmx file without removing the virtual machine from inventory).

I can't tell you how the soft RAID will behave with one of its disks being damaged (at least partly due to the issue). Since this is a domain controller (according to the file name), you may/will experience AD issues with the broken CID chain of the first (OS) virtual disk. So in case you are going to power on the VM, ensure that this domain controller doesn't corrupt your AD, e.g. by disconnecting the virtual network!

If you have question please feel free to ask.

André

Reply
0 Kudos
gecman47
Contributor
Contributor

Ahh sorry about that. The other two files are below. So I took a look through the files on each of the 4tb drives. It seems in the case of all three hard drives -000004 is the parent of -000002 in all three cases. However after that the chain is broken. Should -000004 link to -000001, which should then link to the base vmdk file?

[root@esxi:/vmfs/volumes/55fcf4f4-b3de24cd-94da-d050995057a9/Domain Controller] cat *2.vmdk

# Disk DescriptorFile

version=1

encoding="UTF-8"

CID=e5a74aae

parentCID=4567a55a

isNativeSnapshot="no"

createType="seSparse"

parentFileNameHint="Domain Controller_1-000004.vmdk"

# Extent description

RW 7730941132 SESPARSE "Domain Controller_1-000002-sesparse.vmdk"

# The Disk Data Base

#DDB

ddb.grain = "8"

ddb.longContentID = "49354585027d4b33172a6d84e5a74aae"

----------------------------------------------------------------

cat /vmfs/volumes/55fcf53e-b9fd7666-05b7-d050995057a9/Domain\ Controller/Domain\

Controller_3-000002.vmdk

# Disk DescriptorFile

version=1

encoding="UTF-8"

CID=b8c3b831

parentCID=5b7fa9ba

isNativeSnapshot="no"

createType="seSparse"

parentFileNameHint="Domain Controller_3-000004.vmdk"

# Extent description

RW 7730941132 SESPARSE "Domain Controller_3-000002-sesparse.vmdk"

# The Disk Data Base

#DDB

ddb.grain = "8"

ddb.longContentID = "78a7d50808b8f3eaf72a5249b8c3b831"

Reply
0 Kudos
a_p_
Leadership
Leadership

As mentioned before, the file names (e.g. 000001.vmdk, 000002.vmdk, ...) are not necessarily representing the correct chain. It always the "parentFileNameHint" which points to the next file in the chain.

Anyway, I can't tell you for sure what happened with (some of) the 000001.vmdk files, as there are no more logs available. Since two descriptor files (000002 as well as 000001) point to the base disk, I would assume that the deletion of the first snapshot (as you mentioned in your initial post) was interrupted right after the data from the 000001.vmdk files were merged into the base .vmdk file, and therefore these 000001.vmdk files are now obsolete!? However that's just an assumption.

In this situation you need to be very careful to avoid further corruption. What you should do is to fix the snapshot chain for all virtual disks based on the "parentFileNameHint", and then reload the VM. Once done, create another snapshot prior to using the virtual disks. This will ensure that the current virtual disks will not be modified and you can always revert to the current state, and change things if required. Another option is to clone the virtual disks using the vmkfstools command line utility.

If you want/nedd to attach the snapshot virtual disks to another (helper) VM, you will need to manually edit the .vmx file, because snapshot files cannot be attached to a VM from the GUI.


André

Reply
0 Kudos
Guruprasad143
Contributor
Contributor

Reply
0 Kudos