murreyaw
Enthusiast
Enthusiast

Remove Snapshot Hangs in VC

Jump to solution

I have 4 hosts, all connected to a central iSCSI storage system. When I try to remove a snapshot from a VM, its only snapshot, it goes to 95% and then sits there. This prevents me from being able to vmotion the host. Any ideas?

VI 3.01 VC 2.01

1 Solution

Accepted Solutions
admin
Immortal
Immortal

That's normal behavoir, it does the actual commit at 95%, if you've had the snapshot for a while then it can take a long time to commit, like 40mins! How long have you been waiting?

Leave it and see if it times out or completes successfully.

View solution in original post

0 Kudos
21 Replies
admin
Immortal
Immortal

That's normal behavoir, it does the actual commit at 95%, if you've had the snapshot for a while then it can take a long time to commit, like 40mins! How long have you been waiting?

Leave it and see if it times out or completes successfully.

0 Kudos
esiebert7625
Immortal
Immortal

Yeah thats something they should fix in VC, it goes to 95% right away and sits there leaving you thinking it's hung up. I called support the first time it happened to me and by the time they called me back it had completed. I kept checking the file system to see if the redo files where still there, when they disappeared I figured it was done. If the Redo files are very big and it has alot to commit it can take a real long time. VC has a 15 minute time-out built into it so the task fails in VC after 15 minutes because it does not finish in time. They checked with engineering who said the 15 minute time-out is hard-coded and cannot be changed. Generally you should not let the snapshots go to long or they get very large and take a long time to commit when deleted. Not sure if you knew but when you create a snap-shot it freezes the current state of the vmdk file and puts all changes in redo files. It doesn't just take a snapshot of the disk and store it somewhere else. So when you delete a snapshot it has to put all the changes back into the original vmdk file. If you revert instead it simply goes back to the original vmdk file and deletes the redo files.

jmattox
Enthusiast
Enthusiast

As a workaround you could connect your VC client to the esx host directly and then try to remove the snap shots.. My guess is it will work fine...

murreyaw
Enthusiast
Enthusiast

It was taking forever.

0 Kudos
Davidste
Contributor
Contributor

Does anyone know approximate times for how long snaps take to commit when they get large?

I've got a 200GB snapshot to commit (yes I know it's too large, but we didn't know it had been created) and need to know how long it'll take to commit.

0 Kudos
CWedge
Enthusiast
Enthusiast

The one VM rep told me 300gb took 4hrs...obviously your mileage may vary... but there you go..

0 Kudos
esiebert7625
Immortal
Immortal

Sounds about right, it's too bad they do not have a snapshot monitor built into VC that would monitor the size and status of all snapshots on the ESX hosts. That way you could see all the snapshots in one place instead of on a per VM basis.

0 Kudos
Davidste
Contributor
Contributor

Well hopefully this info will be useful to someone.

Original VMDK file = 200GB

Snapshot delta = 201GB

Disks = 6x 10k rpm SAS in a Dell 2950 RAID 5

Snapshot took 6.5hrs[/b] to commit

0 Kudos
samugi
Enthusiast
Enthusiast

When this behaviour is seen. And VC times out. Is the removal / commit of the snapshot still going after the VC timeout? If not how do you perform the commit that will take hours. Just patched a mail server last night and did a snap prior to the patch. Co-worker was going to remove the snap after a few hours to make sure everything was ok but he didn't do it so a very busy mail server has been running 12+ hours and my attempt to remove snapshot timed out. I just need to understand if I should just wait cause its commiting still or If I need to do something else.

Thanks

0 Kudos
skywalkr
Contributor
Contributor

To determine if it is done, just use the datastore browser to monitor the activity of the files in the VM's directory. Once the snapshot is really finished, which could be HOURS, depending on how long and how much I/O has been accumulated, the snapahot files will be merged into the base and the date time stamps will stop changing showing the process is finished.

Warning do NOT interrupt it or your are risking your data!

You can also sort of watch it via VC as it will not allow you to do anything to the VM until it completes (such as edit etc..).

G. Mobley

Later, GC Mobley
0 Kudos
Threonine
Contributor
Contributor

Generally waiting long enough if your delta files are large will take care of it. However, I have this happening right now on a VM that I was backing up with esxRanger and the virtual machine completely locked up during this process and the process continues to hang at 95% about 10 hours later. Ugh.

0 Kudos
jshirbini
Contributor
Contributor

It happened to me many times , just give it the time enough to be completed , the best way to do is to take this VM offline if possible , I found it's completed faster than to be online. if you have time out , just go and chick the snapshot manager and check if the snapshot still there, I found it gave me timeout but I did not find the snapshot , if you want to be sure , go the the vm edit sitting and check the V.disk and see the vmdk file , look at the Disk file , if you still have 0000000.vmdk file or depends on the vm name that's mean the snapshot still there , if you find the vm name.vmdk without 000000 that's mean the snapshot has been removed.

0 Kudos
stevespike
Contributor
Contributor

Ok, so I also now know that although the task times out the process will eventually complete if given enough time.

However is anyone aware if it is at all possible via the service console to monitor the snopshot delete process? So at least I know it is still working?

A very useful tool is available. Snap hunter. This can be installed on an ESX host and once configured it will scan the entire ESX cluster for any snapshots present on either local or SAN disks. It will fire off an email with the results. I have mine configured to scan once every morning.

It can be downloaded from http://www.xtravirt.com

Steve

0 Kudos
jeverettk
Contributor
Contributor

I don't know how large mine was. I didn't look, and it doesn't appear anymore, but it's been 2 days now. I just read that apparently I shouldn't have reattempted the delete after the error message (delete failed due to timeout).

I still show an active task on that machine.

Unfortunately I powered down the VM after I received the second error message . Now that machine is completely inaccessible. Fortunately this was on a test VM and not production.

Unfortunately the frozen task has basically frozen the rest of my infrastructure functions (at least on the infrastructure client).

Does anyone know of a way to kill a frozen task if 'Cancel' is greyed-out?

0 Kudos
pateliqb
Contributor
Contributor

Another way to see if the VMware ESX host is removing snapshots is to use 'vimsh'. Here is a how:

Logon to the Service Console
Type:bq. vimsh

Press Enter

Type:

vimsvc//task_list

Press Enter

A list of running process should be displayed, for example:

(ManagedObjectReference) + \\ +             'vim.Task:haTask--vim.FileManager.copy-4776',+ \\ +             'vim.Task:haTask-...

Type:

vimsvc/task_info haTask-208-vim.VirtualMachine.removeAllSnapshots-168

Press Enter

Information about the process should be displayed, for example:

(vim.TaskInfo) {
+             dynamicType = <unset>,+
+             key = "haTask-208-vim.VirtualMachine.removeAllSnapshots-168",+
+             task = 'vim.Task:haTask-208-vim.VirtualMachine.removeAllSnapshots-168',+
+             name = "vim.VirtualMachine.removeAllSnapshots",+
+             descriptionId = "VirtualMachine.removeAllSnapshots",+
+             entity = 'vim.VirtualMachine:208',+
+             entityName = "Buggered-VM001",+
+             state = "running",+
+             cancelled = false,+
+             cancelable = false,+
+             error = (vmodl.MethodFault) null,+
+             result = <unset>,+
+             progress = <unset>,+
+             reason = (vim.TaskReasonUser) {+
+                dynamicType = <unset>,+
+                userName = "vpxuser",+
+             },+
+             queueTime = "2008-12-10T17:57:14.244144Z",+
+             startTime = "2008-12-10T17:57:14.244144Z",+
+             completeTime = <unset>,+
+             eventChainId = 168,+
+          }+

I hope this helps.

Thanks to Jina Jang from VMware support for teaching this to me.

For more information on 'vimsh' you could check:

http://knowledge.xtravirt.com/white-papers/index.php?option=com_remository&func=download&id=10&chk=5...

0 Kudos
Skark166
Contributor
Contributor

You can also use the VI Client to connect directly to the ESX host. Virutal Center Server times out the task after 15 minutes. However, the task is initiated on the ESX host. The host will not time the job out....ever. You can view the progress from the "Recent Tasks" panel at the bottom of the VI Client.

Hung tasks (snapshot or otherwise) can be manually killed by opening an SSH session to the service console on the effected VMs ESX host and running the command:

-


service mgmt-vmware restart

-


It's completely safe to run, this command will only reset management services for the esx host you're logged onto. The virtual environment will remain uneffected. The ESX host will become unresponsive from the virutal center server for around 2 to 5 minutes, then all should come back to life.

0 Kudos
admin
Immortal
Immortal

On you can see a deep explanation of many of the issues raised here.

And in you can avoid getting surprises with the snapshots. Prevent better than solve ;o)

http://vmutils.blogspot.com/

http://www.youtube.com/watch?v=gl0VmmKNhYk

0 Kudos
SvenJacobs
Contributor
Contributor

Problem is fixed.

Creating a manual snapshot and remove all did the trick for me after moving around some guest to make some free space.

we learned something today 🙂

thanks you all

0 Kudos
SamMu
Contributor
Contributor

How did you create a manual Snapshot? I have the same issue on a production VM and don't want to loose any data.

Remove snapshot is stuck and I cannot power on the VM.

0 Kudos