VMware Communities
jaydub222
Enthusiast
Enthusiast

Snapshot Persistence

I read somewhere that snapshots are only good for like 24-48 hours.  Just what does this mean?  If I take a snap shot, and then subsequently revert back to that snapshot, aren't I good to go for the long haul, irrespective of whether or not I revert back an hour later or a month later?

0 Kudos
12 Replies
Eric_Allione
Enthusiast
Enthusiast

Hi Jay, what you are referencing are the official best practices which are not to retain snapshots for longer than 24-48 hours. This is because they get big quickly, and can cause performance problems, especially when many are nested.

However, you are asking this in the Workstation Pro section, and so that means we aren't talking about an enterprise with high-risk SLAs. Those VMs will still work just fine with older snapshots, especially if you close a VM and leave it dormant for months. Essentially, no real time has gone on for that VM as far as snapshot deltas are concerned. For that reason, with inactive VMs, I will have some snapshots in Workstation Pro (or Fusion Pro) that are a couple months old. Just make sure to take new snapshots before high risk changes such as patching. If you cycle 3-4 snapshots in workstation for relatively inactive VMs it should not really matter how old they are.

But in the enterprise with vSphere etc, those are production VMs and so any snapshots of them will be quickly growing in size, especially with chatty VMs such as DCs or database servers.

0 Kudos
jaydub222
Enthusiast
Enthusiast

Well, I'm definitely not enterprise.  My intention is to have only one snapshot in place at any one time....something to revert back to if something goes bad.  Not a series of snapshots.  Given this use case:

  • Would it make more sense to delete the existing snapshot before a take another, or take a new snapshot and delete the older one?
  • Or would it make more sense to just clone?
0 Kudos
Eric_Allione
Enthusiast
Enthusiast

Especially in Workstation Pro, I think it's good to have at least two backup snapshots. I like to keep at least one current one but also one that is at least a few weeks older in case there becomes a discovered reason to roll way back. Remember, if that VM was asleep, you can come back to that laptop 5 years later, open it up, and the snapshot will be the same size since the VM was inactive. The 24-48 hour thing is in the performance best practices but there's no logical time limit on age of snapshots.

0 Kudos
Eric_Allione
Enthusiast
Enthusiast

Also, considering that you can have up to 32 snapshots in a chain, having 3-4 should never be a concern on Workstation Pro.

I have even been in a production situation where taking up to 12 chained snapshots was appropriate because there were many potential fault points in a long upgrade process that was having problems. It will still work fine for you in Workstation, but if you start to have performance issues, make sure you aren't nesting too many. You would only nest a ton in a troubleshooting burst or complex upgrade.

KB - Best practices for using snapshots in the vSphere environment (1025279) - VMware Knowledge Base

0 Kudos
jaydub222
Enthusiast
Enthusiast

Thanks Eric for your insights.  Two probably does make sense...one original known good image, and a subsequent one that you think is probably good. 

If you've got a 20gb virutual disk that is essentially full, who large is the snapshot?

0 Kudos
Eric_Allione
Enthusiast
Enthusiast

No problem. Snapshots start very small and then approach the size of the entire disk but cannot exceed it. VMware Workstation will tell you how big the snapshots are getting. Every time a bit changes when you have a snapshot, it maps that this bit was flipped. You cannot flip more than all the bits, which is why snapshots do not get bigger than the file. But this is also exactly why if you sleep a VM and come back 5 years later, then you will still have a tiny snapshot (because no bits were flipped regardless of time passed).

0 Kudos
wila
Immortal
Immortal

Hi,

When reading this thread, the age old "Snapshots are not Backups" statement comes to mind.

If you want to keep snapshots for a long time, then perhaps it isn't a snapshot what you are after and a backup might be a better solution.

You can make a backup by simply copying the whole folder that contains the Virtual Machine to another disk and store it there.

The problem with that - especially over time - becomes that you tend to forget why you have that copy.

Which is why people tend to keep that kind of thing in snapshots, but snapshots do make your VM more fragile and they do slow you down after a while (especially if you run a few VM's simultaneously)

As of today, there's another solution, as it happens I wrote an application to handle this kind of scenario, it is called "Vimalin for Windows"

It basically does the above of copying a VM whole, but it allows you to keep notes with that backup too and all from within a GUI, so you don't have to make a copy in Windows Explorer.

Vimalin has more features, like that you can automate making backups of your virtual machines at a specific schedule and have it roll over backups on a predetermined schedule.

The automation part is the non free version, but you can make and restore backups, add notes, compress/decompress all with the free version.

See:

https://www.vimalin.com/news/

--

Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Eric_Allione
Enthusiast
Enthusiast

Wila's point is something I certainly abide by for clean installations of guest OS. If you're just using a VM for browsing and scripting and have it all linked to GitHub or Dropbox anyway, then going all the way back to a clean install is useful and even ideal for testing.

Space seemed to be a constraint here so I did not suggest a full copy, but you are right that's it's definitely important to point out!

0 Kudos
jaydub222
Enthusiast
Enthusiast

My only concern with snapshots, as opposed to backing up regularly, is the accuracy of the snapshot.  I think someone used the word "fragile".  Is there a reasonable possibility of a chain of snapshots being less than 100% accurate at best or unstable at worst?  I use Terabyte Drive Image to back up my HDD's and they are unfailingly accurate.  In 15 years of backing up I have had no problems with image fidelity.  Can the same be said about snapshots?  That is to say, is there a price to pay for the convenience of snapshots, other than performance?  And why would performance be an issue with snapshots?

0 Kudos
Eric_Allione
Enthusiast
Enthusiast

Yeah, in an enterprise production environment, I have seen 2-3 snapshots actually fail, while an extra snapshot saved the day. At least two of these occurrences was when I was taking a state snapshot, which also gets the state of the RAM on the guest OS. Either the snapshot can get corrupt or, if it is a state snapshot, you can just catch the VM at a really bad time. But at least 99% plus they work.

As a consideration along the lines that snapshots are not backups, if there is a larger systemic problem, then snapshots will stop working along with everything else.

0 Kudos
wila
Immortal
Immortal

Hi,

In this thread the one that used the word "fragile" is me Smiley Happy

Snapshots are great, I use them all the time. They have saved me many times and yes I also use them when testing software (including Vimalin, one has got to love nested virtualisation!)

With snapshots, your virtual disks become a chain of disks linked to each other.

If one of the virtual disks in that chain gets corrupt, everything above that chain is basically lost unless you get help from a data recovery specialist (like continuum​ )

That's the "bit rot" case, but if you read the forum then it also appears to happen on power loss that disk slices end up missing.

BTW, I'd wager that bit rot happens more frequently in none enterprise areas.

Then there's the case that people take a snapshot of a running VM while there's not enough free disk space to take that actual snapshot.

Each time you take a snapshot you need to have enough free space for the RAM memory of your VM (+ state data) to be written out to disk. If your VM has been setup to use 4GB of RAM then that means 4GB vmem files per snapshot.

Finally the worst one, where AFAIK most data corruption occurs.

If your VM is configured with a single slice disk and you commit the snapshot then you need the free space for that VM disk.

Eg. if you have a snapshot on a 100GB single file virtual disk and you commit the snapshot, you need to have > 100GB of free space in order to be able to commit that snapshot.

Running out of free disk space while committing that snapshot is pretty bad.

Snapshots are great, but use with care.

edit: forgot to answer your other question. "Why do snapshot have a performance penalty?"

This is because when you create a snapshot, you are closing the current virtual disk file to become read-only and create a new virtual disk file (next in the chain) to write your changed data to.

So everytime to you need read data from that disk you now have to read it from 2 virtual disk files instead of one. Add more snapshots and the amount of virtual disk files that must be read is increased by one.

--

Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
continuum
Immortal
Immortal

Hello
you will probably dont like what I want to tell you now ...
It summarizes my experience with VMDKs and various constellations in a quite drastic way.
Anyway - this table is not part of the vSphere or Workstation documentation - but IMHO it should be.
In the end the user decide which scenario he uses - but how is a user supposed to pick a solid setup if nobody tells him ....risk-of-using-various-combinations-of-vmdk-types.png

vmdk-choice: example uses a Windows VM with NTFSrisk
1. VMFS-eager-zeroed-thick provisioning - no snapshot on old ESXi 3maybe even lower  than with  native NTFS
2. VMFS-lazy- zeroed-thick - no snapshots very little risk - easy to repair
3. monolithicFlat VMDK on Workstation version upto WS 7.1 - no snaps very little risk - easy to repair
4. split flat VMDK VMDK on Workstation version upto WS 7.1 - no snaps very little risk - easy to repair
5. thin provisioned ESXi-vmdk - no snapsyou need to create regular backups
6. split sparse format on WS - old versionscan be repaired - but risky
7. thin provisioned ESXi-vmdk - with delta.vmdk snapshotscan be repaired - but risky
8. monolithicSparse modern WS-vmdkunacceptable risk
9. thin provisioned ESXi-vmdks with weeks or months of data inside a long snapshot chainunacceptable risk
10. monolithicSparse WS-vmdks with weeks or months of data inside a long snapshot chainunacceptable risk
11. thin provisioned modern ESXi with lots of data in large SESPARSE-snapshotsunacceptable risk
12. encrypted  modern WS-vmdk with snapshotsonly usable if the data is regarded as disposable

In scenarios 1 -3 you can leave for some months and still expect that your Windows NTFS volume still has your data -
even occasional thunderstorms with power failures should add no significant extra risk.


In scenarios 7 - 12 you have to expect that a single power failure or unexpected host-OS reboot renders ALL YOUR VALUABLE data unreadable.
In other words - even for a friday night visit to the movies you need a full working backup if you do NOT classify your data as disposable.
Highest risk factors at the moment:
- sparse WS-vmdks larger than  950GB
- very large thin ESXi vmdks with huge SESPARSE snapshots
- large VMFS datastores (> 8TBs and larger) with lots of thin provisioned VMDKs AND lots of snapshots
- encrypted WS vmdks with snapshots.
I work as a freelancer in VMDK/ VMFS recovery - that means that I  have a good overview of the available tools and also have the necessary skills and practice.
If you call me with scenario 10, 11 or 12 I would have to tell you that the chances are not good enough to justify spending a lot of time with recovery attempts.
For scenario 11  and 12 I  would tell you to do the math and calculate wether the data is valuable enough to call Ontrack. If you have seen what Ontrack charges - you know what I mean:
most of the times those stories end with a quite expensive experience ....
Hope that all readers still can sleep well after reading this ....
My suggestion if you are a paying WS or vSphere customer:
Complain about the fact that VMware does NOT supply a known to work commandline tool to fix errors in WS or ESXi delta.vmdks
Complain that there is no tool to attach SESPARSE snapshots to a basedisk.
At the moment not even UFSexplorer - the reference tool for reading    VMDKs and VMFS - can handle such a case unless both flat and the sesparse vmdks are in perfect health.
Complain about VOMA - in practical scenarios VOMA will not help you a tiny bit - it will only make you waste lots of time.
And time is in very short supply if you have to handle those large VMDKs and datastores that are used today.
Hope this will help some users ....
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...