DrWhy
Enthusiast
Enthusiast

Deleting Snapshots taking 36+hrs

I'm deleting a snapshot from a virtual machine.  The job started 36hrs ago and is only at 41%.  The disk usage is sitting around 75MB/s Write and 40MB/s Read.  The storage is capable of much more.  Why is this process is going so slow?  Is this a normal speed for removing a large snapshot?  Is there anyway to speed this up?

disk.png

13 Replies
Nithy07cs055
Hot Shot
Hot Shot

When was the snapshot taken ? was it too long ,, ?

check if the VM needs consolidation? you can check this VMware KB: Consolidating snapshots in vSphere 5.x/6.0

and if you still facing the issue , you can try from command line

# vim-cmd vmsvc/getallvms   (Make a note of the VMID,  make sure you note down the correct Vm's VMID)

# vim-cmd vmsvc/snapshot.get [VMID]

# vim-cmd vmsvc/snapshot.create [VmId] [snapshotName] [snapshotDescription] [includeMemory] [quiesced]

#  vim-cmd vmsvc/snapshot.removeall [VMID]

this will create a test snapshot and remove all from the Virtual machine

also if you want monitor the snapshot deletion from CLI check this http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100756...

Thanks and Regards, Nithyanathan R Please follow my page and Blog for more updates. Blog : https://communities.vmware.com/blogs/Nithyanathan Twitter @Nithy55 Facebook Vmware page : https://www.facebook.com/Virtualizationworld
0 Kudos
Omega42
Contributor
Contributor

You should find out if the impressive number of disk requests comes from the snapshot removal or the active vm.

I guess the disk is just saturated (and since this is non-sequential access the throughput does not matter).

0 Kudos
markdjones82
Expert
Expert

Depending on how large the Snapshot was and how much IO is currently going on, yes it can take an extremely long time ot remove a snap while powered on. 

http://www.twitter.com/markdjones82 | http://nutzandbolts.wordpress.com
DrWhy
Enthusiast
Enthusiast

The snapshot is large.  It's ~8TB.  The circumstances that produces this large snapshot are unique (1st time replication of a 20TB vm was interrupted part way through and had to be resumed.  The way the backup software handled it apparently created a snapshot when it replicated the remain data and then chose to merge that data when the replication was complete).  That aside, I did not realize that this operation was random I/O, which would explain the slow throughput.  I don't understand why it would be random I/O or even why the deleting/merging process requires all data to be read and re-written, but that's neither here nor there.  The VM is powered off, I'm not sure how much difference this makes.  Also, this is the only VM producing I/O on this host, so I can confirm that all these disk requests do indeed come from this snapshot removal. 

Nithy07cs055
Hot Shot
Hot Shot

its always recommended to use agent backup on the VM which has disk larger that 2 TB. do not use VDAP( Snapshot based )  this will cause many issues like consolidation. Server hung . snapshot issue etc.

Thanks and Regards, Nithyanathan R Please follow my page and Blog for more updates. Blog : https://communities.vmware.com/blogs/Nithyanathan Twitter @Nithy55 Facebook Vmware page : https://www.facebook.com/Virtualizationworld
0 Kudos
DrWhy
Enthusiast
Enthusiast

I would have thought that a snapshot, and time to merge that snapshot, would be dependent on how much data was written to that snapshot and not limited because the size of the VMDK was large.  Is this not the case?  Can you please explain?

0 Kudos
Nithy07cs055
Hot Shot
Hot Shot

Yes it is.. but during the full backup and differential.. just have a though ..

VMDK is ~8TB, and the full backup Snapshot , imagine the amount of data that would try to write on the snapshot disk .. i guess for incremental it wont cause any issues ..

But recommended to have a agent backup 

Thanks and Regards, Nithyanathan R Please follow my page and Blog for more updates. Blog : https://communities.vmware.com/blogs/Nithyanathan Twitter @Nithy55 Facebook Vmware page : https://www.facebook.com/Virtualizationworld
0 Kudos
DrWhy
Enthusiast
Enthusiast

So get this - I've been waiting over 48hrs for this task to complete.  I finally decided to cancel the task and restart the replication.  Several minutes after I cancel, the task shows as completed and the Veeam replication jobs completed successfully.  What just happend?  Since when does canceling a job actually make the job complete?

task.png

0 Kudos
Nithy07cs055
Hot Shot
Hot Shot

not sure about that ,, we need to check how Veeam product works .. need to check logs on the Veema to verify the jobs were actually completed or not ??

I haven't got chance to work on this product .. 

Thanks and Regards, Nithyanathan R Please follow my page and Blog for more updates. Blog : https://communities.vmware.com/blogs/Nithyanathan Twitter @Nithy55 Facebook Vmware page : https://www.facebook.com/Virtualizationworld
0 Kudos
markdjones82
Expert
Expert

What % was it at when you cancelled?  I've seen it jump from 75-90 all the way to complete in a matter of seconds.  I'm wondering if when you selected cancel if it was in the finishing stages and completed before the cancel command hit. Its the only explanation I can think of.  Check and see if there are any snaps still there and that would be your answer.

http://www.twitter.com/markdjones82 | http://nutzandbolts.wordpress.com
0 Kudos
DrWhy
Enthusiast
Enthusiast

When I say cancelled the task, I cancelled it within the vSphere Web Client and not on Veeam's side.  I would have expected that to cause the task to fail and then the veeam backup job to fail.  However, as I stated, the task completed and then the Veeam job completed.  I don't believe this is a question for Veeam, but a question for what went on the VMware side.

The task was at 45%.  It's extremely improbable that I just to happen to cancel the job seconds before it was going to complete.  That said, I do not believe that to be true.

0 Kudos
markdjones82
Expert
Expert

Well, I'm bamboozled Smiley Happy I've seen many strange things over the years and sometimes I choose to move on and let it go before I got insane.  I would chalk this up to that and just be glad its all good!  If that isnt' sufficient enough, you could try opening a ticket with VMware and have them parse through the logs.

You could also check yourself on the host for that time under /var/log/vmkernel.log and see what it was telling you in there.

http://www.twitter.com/markdjones82 | http://nutzandbolts.wordpress.com
0 Kudos
DrWhy
Enthusiast
Enthusiast

I had this occur again.  Similar situation except Veeam wasn't involved at all.  I attempted to delete the snapshot from within the web console.  The snapshot was taking a very long time so I cancelled it and then the task showed as completed.   However, this time I discovered that after doing this I cannot power on the VM.  It tells me that the VM needs consolidation so I tried this but get the following error:

"An error occurred while consolidating disks: The parent virtual disk has been modified since the child was created. The content ID of the parent virtual disk does not match the corresponding parent content ID in the child."

I believe this is a bug in vmware when you attempt to cancel a snapshot deletion task.  Can anyone from VMware comment?