VMware Cloud Community
rbsskalssauka
Contributor
Contributor

VDP snapshot leftovers after vdp data store power failure:

Hi.

I had a power failure and data store where VDP 5.1. appliance resided was powered off while creating a scheduled backup. After that i could not create backups anymore and after some blind tweaks vdp appliance did not start anymore. There was a problem with snapshot disks it could not open anymore, so i removed those disks but as i'm not a vmware professional i'm not sure if i should delete corresponding files. VDP started up but when i initiated a backup it was complaining that "operation failed due to existing snapshot". How can i know if i can delete those vmdk files left from snapsots? Can i just assume that if i don't see any snapshots in snapshot manager and snapshot vmdk file has modification date some hours or days back in the past then i can just delete them safely?

Sorry for my English.

16 Replies
continuum
Immortal
Immortal

The snapshot manager  is as reliable as the oracle from Delphi - in other words - dont trust it !!!
To find out which vmdks are not used at the moment read the latest vmware.log - vmdks mentioned in lines that also contain "disklib" are in use.
If unsure - move all the vmdks that you suspect as unused to a new directory. If the VM then can boot again you can delete them


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

rbsskalssauka
Contributor
Contributor

But if i have only one snapshot vmdk on each virtual machine and it's modification date is in the past isn't it enough to assume that it's not used (i have also only one vdp appliance)? I just think that it could be quite time consuming for me to read those log files.

0 Kudos
continuum
Immortal
Immortal

ok - if yo udo not have the 2 minutes to inspect the logs then move the candidates for deletion into a subfolder. If you can do that while the VM is running they are not used by the current state of the VM.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
john23
Commander
Commander

From date of modification, chances of vm "not to power on condition" is higher.. since snapshot tree would be broken...safer approach is what

Thanks -A Read my blogs: www.openwriteup.com
0 Kudos
rbsskalssauka
Contributor
Contributor

Hi!

Thanks for answers. I 'h downloaded log file for one of the affected vm's and the last date suspicious file was mentioned as opened was 10 days ago. At vm settings only single hard disk was assigned and it pointed to "basic" disk image and not to the ghost snapshot. As you suggested i created a subfolder and tried to move this snapshot to it but "Unable to access file <unspecified filename> since it is locked" message came out. Should i try to delete it anyway or vmware won't let me to delete it as it's locked? It's a production vm and i would need to wait for at least few days for maintenance window, but currently a have no working backup running...

Thanks!

0 Kudos
john23
Commander
Commander

If it is not allowing you to delete, it means it is consuming the snapshot vmdk.. Even if you want to try the option create subfolder and move wait for maintenance window..

Thanks -A Read my blogs: www.openwriteup.com
0 Kudos
rbsskalssauka
Contributor
Contributor

So only option to remove snapshot correctly is to shut down guest, move ghost snapshot and then power on and see if it's starting?

0 Kudos
john23
Commander
Commander

Its a good way..otherwise vm will not power-on if dependency is there

Thanks -A Read my blogs: www.openwriteup.com
0 Kudos
rbsskalssauka
Contributor
Contributor

I turned off vm and still could not move vmdk files to subfolder using vmware vsphere client. A thought that it could be vmware data protection which locks files, bet shutting it down did not help. Then i used hosts ssh shell and succeeded to move vmdk files. Is it ok to do it this way and if it's done on running vm does it indicates that those files ar not used and can be safely removed? I'm asking this because linux is using files differentely and some applications just can read file and leave it unlocked and user can delete file from filesystem but when application tries to commit all changes then user can get in trouble because of deleted file.

0 Kudos
rbsskalssauka
Contributor
Contributor

UPDATE

Used "lsof | grep /vmfs/volumes/.................." to look for open files. For testing purposes tried to move open file to another folder and it succeeded therefore moving vmdk files from ssh console is not an indicator for locked file recognition. Is lsof relieable tool to search for ghost snapshot files? In other words whether snapshot not being listed in lsof indicates it can be safely deleted? I'm already week without a backup and that's makes me nervous Smiley Happy

0 Kudos
john23
Commander
Commander

ESXi does not utilize a separate Service Console Operating System. This reduces the amount of lock troubleshooting to just the VMkernel. For example, Console OS troubleshooting methods such as using the lsof utility are not applicable to ESXi hosts.

VMware KB: Investigating virtual machine file locks on ESXi/ESX

Thanks -A Read my blogs: www.openwriteup.com
0 Kudos
rbsskalssauka
Contributor
Contributor

I've investigated this KB before, but running "vmkfstools -D /vmfs/volumes/7546a2bb-9cc818d9/Utubunga/Utubunga.vmdk" from ssh console through me an error "Could not get the dump information for '/vmfs/volumes/7546a2bb-9cc818d9/Utubunga/Utubunga.vmdk' (rv -1)
Could not dump metadata for '/vmfs/volumes/7546a2bb-9cc818d9/Utubunga/Utubunga.vmdk': Function not implemented
Error: vmkfstools failed: vmkernel is not loaded or call not implemented."

lsof is also mentioned there.

0 Kudos
continuum
Immortal
Immortal

Are you still busy with this problem ???

1. inspect the current VM settings: write down the vmdks that are listed for scsi0:0.filename =
If you have a VM with more than 1 vmdks - checks the other as well

2. connect to the host that is running the VM at the moment with winscp.
Open every vmdk that you have on your list with the buildin editor of Winscp.
Find the line parentfilenamehint=
If this line exists add all referenced vmdk-filenames to your list and check them as well.

3. Now you have a list of vmdk.
If an entry is a snapshot and has 000000* in the name add the associated delta.vmdk to your list.
If the vmdk is no snapshot - add the associated flat.vmdk to your list.


4. All the vmdk files that you have on your list now are part of the current VM and you want to keep them.
All vmdks inside the VM-directory that are NOT on the list MAY be orphaned.

5. Try to move the orphans into a subdirectory using the move function of Datastorebrowser.
If they cant be moved - make a new list and write them down.

The files that you have on your new list are either:
- in use by a third VM
- locked by a misbehaving Backup-tool
- locked by vooddoo

Those locks will probably go away next time you reboot the host.

Whenever inspecting strange effects with locked files - first of all check wether any backuptools are still trying to finish a scheduled job ...
Good check: delete *ctk.vmdks. If they reappear next time you start the VM - check the backuptools


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

rbsskalssauka
Contributor
Contributor

Hi.

I started to sort it out yesterday similarly to your instruction.

1. checked VM vmx for disk entries if those are pointing to base vmdk-s (cat VM.vmx | grep vmdk), if that's true

2) checked if VM.vmdk disks has entry parentCID=ffffffff and are pointing to corresponding VM-flat.vmdk disk, if that's true

3) checked all snapsoht files VM-00000x.vmdk whether parentFileNameHint="VM.vmdk" and if thats true checked if snapshot parentCID is different from base VM CID

4) finally checked deletion candidates for modification dates and also checked with Ruben Garcia SnapVMX script.

Hopefully this will be enough to no brake all of my virtual machines Smiley Happy

0 Kudos
ealaqqad
Enthusiast
Enthusiast

I have faced this same error at one of my customers, and documented the way I have resolved it on my blog at: vSphere Data Protection error – operation failed due to existing snapshot

Hope this help,

Eiad Al-Aqqad

Blog: http://www.VirtualizationTeam.com

Regards, Eiad Al-Aqqad Technology Consultant @ VMware b: http://www.VirtualizationTeam.com b: http://www.TSMGuru.com
0 Kudos
rbsskalssauka
Contributor
Contributor

I've "resolved" an issue as described in my previous post (i've put it in quotes because vdp is causing me a headaches since upgrade to 5.1 and then to 5.5.5). Moving from shell won't take in a count any esxi locks, whereas doing it from vsphere client did not worked for me in some scenarious even when i shut down vdp and restarted vcenter. Anyway thanks for your reply.

0 Kudos