VMware Communities
Oben
Contributor
Contributor

Why does discarding a snapshot take so long?

Hi all,

I made a snapshot a while ago but decided I didn't need it anymore.

So I decided to discard it from the menu.

After asking for confirmation it brought up a progress box saying "Cleaning up deleted files..."

The progress bar was agonisingly slow! Why would it take thirty minutes of disk crunching activity to just delete the snapshot?

What is this all about - cleaning up etc? I'll take manual backups from now on. This snapshot feature is unworthy of use if this is standard behaviour.

24 Replies
maddymac
Enthusiast
Enthusiast

Once you create snapshot it will create a new .vmdk in your

virtual machine bundle

If you are keeping you snapshot for longtime make sure that

you have sufficient space available on your virtual machine

Once you discard the snapshot it will commit all the data

from newly created .vmdk to original .vmdk

In your came it took long time because amount of data in

snapshot .vmdk is huge

If you really want check the speed of discard snapshot try

creating new snapshot and discard the snapshot.

0 Kudos
Oben
Contributor
Contributor

I checked the contents of the VM package.

It had TWO lots of .vmdk files except one had -00001- or something like that in the file name.

I don't understand why the snapshot would work like what you say.

Why would it not just make a duplicate (like I think it does) as the original and then work off the other one. When the time ever came then you could simply delete the original and keep the current one etc. Why would it need to consolidate anything?

0 Kudos
admin
Immortal
Immortal

"Discard Sapshot" is somewhat misleadingly named - you are discarding the ability to go back to that snapshot by merging your changes back to the original disk. "Revert to Snapshot" is what deletes the changes you've made, and should be faster.

admin
Immortal
Immortal

Yeah, the key point here is that deleting a snapshot actually commits all the changes into the parent disk. This takes time proportional to the number/size of changes in the child (everything that happened since you took the snapshot).

An attempt to explain by example:

(1) I install Windows XP.

(2) I take a snapshot before ever running Windows Update.

(3) I run Windows Update and upgrade to Service Pack 2.

(4) I discard the snapshot from step 2. This will take a while, since the installation of Service Pack 2 required hundreds of megabytes of disk writes to be committed into the parent (base) disk.

After step (4) is complete, the VM has no snapshot, but the guest still has Service Pack 2 installed. If I had reverted to the snapshot in step (2) instead of discarding it, the operation would be very fast (just throw away all the state in the child by deleting it). However, then the VM would revert to the state it was in at step (2), i.e. Service Pack 2 would no longer be installed.

0 Kudos
Oben
Contributor
Contributor

It still doesn't make sense. When you take a snapshot the storage space of your VM package doubles.

So it makes a direct duplicate of the original vmdk set. I have a 50 GB disk divided into 2 GB files. When I asked for a snapshot I noticed a doubling in size and all the files in the package were duplicated.

I presume that the original is kept to one side and not touched.

Now if I choose to go back to the original, ie the snapshot, then it discards or deletes the new vmdk set getting rid of all changes and loads the old vmdk that is still untouched together with its RAM state like it would with a resume function.

If I don't ever want to go to the original, ie. discard the snapshot then it should just delete the old vmdk set and continue on its merry way with the new vmdk set.

That's how easy it should be. I don't think any of this folding in of incremental changes etc that people are hypothesising makes sense at all.

The evidence is that an ENTIRE copy is made of the vmdk when you do a snapshot and discarding or reverting to snapshots is all about which of these vmdks is used. To think it blends changes incrementally etc seems like hogwash to me (still).

Deleting a vmdk set doesn't take 30 minutes.

0 Kudos
admin
Immortal
Immortal

It still doesn't make sense. When you take a snapshot the storage space of your VM package doubles.

No it doesn't, this is your fallacy.

So it makes a direct duplicate of the original vmdk set. I have a 50 GB disk divided into 2 GB files. When I asked for a snapshot I noticed a doubling in size and all the files in the package were duplicated.

You will have a new set of disk files, but they are not the same size as the originals. The second set of disks is incremental, and somewhere in size between 0 and the max size of your disk.

I presume that the original is kept to one side and not touched.

Correct.

Now if I choose to go back to the original, ie the snapshot, then it discards or deletes the new vmdk set getting rid of all changes and loads the old vmdk that is still untouched together with its RAM state like it would with a resume function.

Correct.

If I don't ever want to go to the original, ie. discard the snapshot then it should just delete the old vmdk set and continue on its merry way with the new vmdk set.

Incorrect, because the new vmdk set is not complete.

That's how easy it should be. I don't think any of this folding in of incremental changes etc that people are hypothesising makes sense at all.

We're not hypothesizing, we're telling you how it works. If for some reason you're not seeing this behavior, that's strange and we should investigate it.

0 Kudos
HobbitFootAussi
Enthusiast
Enthusiast

The 3 blue square icon next to these guys names means that they are VMWare engineers. So they aren't hypothesizing...

0 Kudos
CMHexx
Contributor
Contributor

I understand what you've said so far about the process and it makes total sense to me about how it works with the exception of one thing:

What happens when I go to delete a snapshot and during the "Cleaning up deleted files..." phase I click the cancel button?

I would normally presume it would stop committing the changes to the parent disk and abort the deleting of the snapshot, however, when I just tried it and I clicked cancel it still removed the icon showing that I had a snapshot. So were the changes committed? How could they be if I cancelled it? And if not, then did I just lose data and my possibility to return to that snapshot? Smiley Happy

Thanks.

0 Kudos
FrayAdjacent
Enthusiast
Enthusiast

Taking a snapshot FREEZES your virtual hard disk - NO changes are made to it at all. (until the snapshot is 'deleted' or reverted)

A second virtual hard disk is created, and ANY AND ALL changes are written to it. Running like this for a long time can end up with a HUGE snapshot vmdk.

'Deleting' that snapshot isn't just deleting that child vmdk, it's taking all of the changes and data and integrating it INTO the original vmdk. This can take a long time. If you want to really summarily delete that snapshot vmdk, you'd need to 'revert to snapshot', which takes your VM back to the point in time where you started the snapshot.

It can't really be explained any more simply than that. Smiley Wink

0 Kudos
admin
Immortal
Immortal

I'd expect that if you canceled discarding a snapshot, we would abort the merge of the delta disk to the base disk. This would mean the base disk would no longer be consistent by itself, but you haven't lost any data because the combination of base and delta disks is still consistent. You would have lost the possibility to return to the snapshot, however.

0 Kudos
CMHexx
Contributor
Contributor

Ahh... ok that makes sense, thanks!

0 Kudos
abusy
Contributor
Contributor

I just had my first experience with this horrendous 'Snapshot' system. I tried to delete a snapshot at work and it froze up my virtual machine for almost an hour!!

Why oh why can a Snapshot not simply store all the original copies of any files you change after the time you create the Snapshot? Then deleting a snapshot would not take hours, and (correct me if I'm wrong) disk space would remain the same as the current system in place.

Anybody?

At the very least, re-label the 'remove snapshot' button as something like 'merge snapshot' and give a warning message that this will lock up the user's virtual machine and will take forever + 1 day. An estimate of how long it will take would be nice as well.

0 Kudos
abusy
Contributor
Contributor

Ok, so right now I'm waiting for VM to 'clean up deleted files' again, so I decided to google the problem again, and I came across this thread again...

No replies to my last post? Really VMWare?

I still don't understand why this snapshot system works the way it does... Why can't it just work like this?:

1. User creates snapshot.

2. Fusion records the time when the snapshot was created.

3. Whenever a file is changed from that point on, the original is first stored in a separate location as a backup.

4. When the user deletes this snapshot, the backup files are simply deleted.

5. THAT'S IT!

VM, please reply - even just to tell me why my idea sucks!

0 Kudos
RDPetruska
Leadership
Leadership

I guess the main reason is that your idea would take up TONS more disk space. In addition to the fact that all VMware products use the same snapshot mechanism/procedures.

0 Kudos
abusy
Contributor
Contributor

I guess the main reason is that your idea would take up TONS more disk space.

Why would it take up more disk space than the current snapshot system? My understanding is that the current system basically does the same thing except that it stores new files in a separate location, as opposed to original files before they are altered - thus requiring all this time consuming merging.

In addition to the fact that all VMware products use the same snapshot mechanism/procedures.

That's a fair point, If VM doesn't want to re-develop their whole snapshot mechanism I completely understand. I'll just start looking for another product that performs better. (Anyobody know if Parallels does the same thing?)

I'd like to hear something from VM on this however.

0 Kudos
admin
Immortal
Immortal

One very major point to remember is that we are not able to "see" files in the guest because the guest can have an arbitrarily crazy filesystem. Fusion operates at the level of .vmdks. It is not practical to "back up" individual files in the guest without some assistance from something in the guest, but relying on something in the guest would open up more failure points and seems generally undesirable.

Why would it take up more disk space than the current snapshot system? My understanding is that the current system basically does the same thing except that it stores new files in a separate location, as opposed to original files before they are altered - thus requiring all this time consuming merging.

With the current system, every piece of data is represented only once - it's either in the base disk or a snapshot. If you duplicated files when taking a snapshot, you would copy the base files every time you took a snapshot. For a concrete example:

Suppose you have a base system that takes up 5 GB (the guest OS, plus whatever applications you added). You want to try out a new program which takes 1 GB of additional space, and take a snapshot first.

With the current system, you need 5 GB for the base disk plus 1 GB for the snapshot.

With your proposed system, you need 5 GB for the base disk plus 6 GB for the snapshot.

And this just gets worse the more snapshots or the larger base disk you have. Plus, under your system you'd have to wait around while the base disk gets copied to the snapshot, which is completely wasted if you never intend to save the work you're doing (which is a major use case for snapshots).

In addition to the fact that all VMware products use the same snapshot mechanism/procedures.

That's a fair point, If VM doesn't want to re-develop their whole snapshot mechanism I completely understand. I'll just start looking for another product that performs better. (Anyobody know if Parallels does the same thing?)

I'm pretty sure all existing virtualization products take the same general approach to snapshots.

0 Kudos
abusy
Contributor
Contributor

Thanks etung,

That all seems pretty sensible.

So the best solution in my mind then would be to simply be more transparent with the user about what is going to happen when they click the 'delete snapshot' button. When the average user sees this button, they expect that the snapshot will be 'sent to the trash', which is usually a near-instant operation.

This could easily be cleared up if there was a simple confirmation alert that told the user "This operation will take approximately X minutes. Do you still want to proceed?"

0 Kudos
AsherN
Enthusiast
Enthusiast

Actually all that really needs to be done would be to rename the option 'Delete' to 'Commit'.

0 Kudos
admin
Immortal
Immortal

So the best solution in my mind then would be to simply be more transparent with the user about what is going to happen when they click the 'delete snapshot' button. When the average user sees this button, they expect that the snapshot will be 'sent to the trash', which is usually a near-instant operation.

This could easily be cleared up if there was a simple confirmation alert that told the user "This operation will take approximately X minutes. Do you still want to proceed?"

Seems reasonable, though actually calculating the time probably isn't possible (as it depends on hard-to-determine things like maximum disk read/write speeds, host fragmentation, and other I/O going on). Filed PR 418169 with this feature request.

0 Kudos