I read somewhere that snapshots are only good for like 24-48 hours. Just what does this mean? If I take a snap shot, and then subsequently revert back to that snapshot, aren't I good to go for the long haul, irrespective of whether or not I revert back an hour later or a month later?
Hi Jay, what you are referencing are the official best practices which are not to retain snapshots for longer than 24-48 hours. This is because they get big quickly, and can cause performance problems, especially when many are nested.
However, you are asking this in the Workstation Pro section, and so that means we aren't talking about an enterprise with high-risk SLAs. Those VMs will still work just fine with older snapshots, especially if you close a VM and leave it dormant for months. Essentially, no real time has gone on for that VM as far as snapshot deltas are concerned. For that reason, with inactive VMs, I will have some snapshots in Workstation Pro (or Fusion Pro) that are a couple months old. Just make sure to take new snapshots before high risk changes such as patching. If you cycle 3-4 snapshots in workstation for relatively inactive VMs it should not really matter how old they are.
But in the enterprise with vSphere etc, those are production VMs and so any snapshots of them will be quickly growing in size, especially with chatty VMs such as DCs or database servers.
Well, I'm definitely not enterprise. My intention is to have only one snapshot in place at any one time....something to revert back to if something goes bad. Not a series of snapshots. Given this use case:
Especially in Workstation Pro, I think it's good to have at least two backup snapshots. I like to keep at least one current one but also one that is at least a few weeks older in case there becomes a discovered reason to roll way back. Remember, if that VM was asleep, you can come back to that laptop 5 years later, open it up, and the snapshot will be the same size since the VM was inactive. The 24-48 hour thing is in the performance best practices but there's no logical time limit on age of snapshots.
Also, considering that you can have up to 32 snapshots in a chain, having 3-4 should never be a concern on Workstation Pro.
I have even been in a production situation where taking up to 12 chained snapshots was appropriate because there were many potential fault points in a long upgrade process that was having problems. It will still work fine for you in Workstation, but if you start to have performance issues, make sure you aren't nesting too many. You would only nest a ton in a troubleshooting burst or complex upgrade.
KB - Best practices for using snapshots in the vSphere environment (1025279) - VMware Knowledge Base
Thanks Eric for your insights. Two probably does make sense...one original known good image, and a subsequent one that you think is probably good.
If you've got a 20gb virutual disk that is essentially full, who large is the snapshot?
No problem. Snapshots start very small and then approach the size of the entire disk but cannot exceed it. VMware Workstation will tell you how big the snapshots are getting. Every time a bit changes when you have a snapshot, it maps that this bit was flipped. You cannot flip more than all the bits, which is why snapshots do not get bigger than the file. But this is also exactly why if you sleep a VM and come back 5 years later, then you will still have a tiny snapshot (because no bits were flipped regardless of time passed).
When reading this thread, the age old "Snapshots are not Backups" statement comes to mind.
If you want to keep snapshots for a long time, then perhaps it isn't a snapshot what you are after and a backup might be a better solution.
You can make a backup by simply copying the whole folder that contains the Virtual Machine to another disk and store it there.
The problem with that - especially over time - becomes that you tend to forget why you have that copy.
Which is why people tend to keep that kind of thing in snapshots, but snapshots do make your VM more fragile and they do slow you down after a while (especially if you run a few VM's simultaneously)
As of today, there's another solution, as it happens I wrote an application to handle this kind of scenario, it is called "Vimalin for Windows"
It basically does the above of copying a VM whole, but it allows you to keep notes with that backup too and all from within a GUI, so you don't have to make a copy in Windows Explorer.
Vimalin has more features, like that you can automate making backups of your virtual machines at a specific schedule and have it roll over backups on a predetermined schedule.
The automation part is the non free version, but you can make and restore backups, add notes, compress/decompress all with the free version.
Wila's point is something I certainly abide by for clean installations of guest OS. If you're just using a VM for browsing and scripting and have it all linked to GitHub or Dropbox anyway, then going all the way back to a clean install is useful and even ideal for testing.
Space seemed to be a constraint here so I did not suggest a full copy, but you are right that's it's definitely important to point out!
My only concern with snapshots, as opposed to backing up regularly, is the accuracy of the snapshot. I think someone used the word "fragile". Is there a reasonable possibility of a chain of snapshots being less than 100% accurate at best or unstable at worst? I use Terabyte Drive Image to back up my HDD's and they are unfailingly accurate. In 15 years of backing up I have had no problems with image fidelity. Can the same be said about snapshots? That is to say, is there a price to pay for the convenience of snapshots, other than performance? And why would performance be an issue with snapshots?
Yeah, in an enterprise production environment, I have seen 2-3 snapshots actually fail, while an extra snapshot saved the day. At least two of these occurrences was when I was taking a state snapshot, which also gets the state of the RAM on the guest OS. Either the snapshot can get corrupt or, if it is a state snapshot, you can just catch the VM at a really bad time. But at least 99% plus they work.
As a consideration along the lines that snapshots are not backups, if there is a larger systemic problem, then snapshots will stop working along with everything else.
In this thread the one that used the word "fragile" is me
Snapshots are great, I use them all the time. They have saved me many times and yes I also use them when testing software (including Vimalin, one has got to love nested virtualisation!)
With snapshots, your virtual disks become a chain of disks linked to each other.
If one of the virtual disks in that chain gets corrupt, everything above that chain is basically lost unless you get help from a data recovery specialist (like continuum )
That's the "bit rot" case, but if you read the forum then it also appears to happen on power loss that disk slices end up missing.
BTW, I'd wager that bit rot happens more frequently in none enterprise areas.
Then there's the case that people take a snapshot of a running VM while there's not enough free disk space to take that actual snapshot.
Each time you take a snapshot you need to have enough free space for the RAM memory of your VM (+ state data) to be written out to disk. If your VM has been setup to use 4GB of RAM then that means 4GB vmem files per snapshot.
Finally the worst one, where AFAIK most data corruption occurs.
If your VM is configured with a single slice disk and you commit the snapshot then you need the free space for that VM disk.
Eg. if you have a snapshot on a 100GB single file virtual disk and you commit the snapshot, you need to have > 100GB of free space in order to be able to commit that snapshot.
Running out of free disk space while committing that snapshot is pretty bad.
Snapshots are great, but use with care.
edit: forgot to answer your other question. "Why do snapshot have a performance penalty?"
This is because when you create a snapshot, you are closing the current virtual disk file to become read-only and create a new virtual disk file (next in the chain) to write your changed data to.
So everytime to you need read data from that disk you now have to read it from 2 virtual disk files instead of one. Add more snapshots and the amount of virtual disk files that must be read is increased by one.
you will probably dont like what I want to tell you now ...
It summarizes my experience with VMDKs and various constellations in a quite drastic way.
Anyway - this table is not part of the vSphere or Workstation documentation - but IMHO it should be.
In the end the user decide which scenario he uses - but how is a user supposed to pick a solid setup if nobody tells him ....
|vmdk-choice: example uses a Windows VM with NTFS||risk|
|1. VMFS-eager-zeroed-thick provisioning - no snapshot on old ESXi 3||maybe even lower than with native NTFS|
|2. VMFS-lazy- zeroed-thick - no snapshots||very little risk - easy to repair|
|3. monolithicFlat VMDK on Workstation version upto WS 7.1 - no snaps||very little risk - easy to repair|
|4. split flat VMDK VMDK on Workstation version upto WS 7.1 - no snaps||very little risk - easy to repair|
|5. thin provisioned ESXi-vmdk - no snaps||you need to create regular backups|
|6. split sparse format on WS - old versions||can be repaired - but risky|
|7. thin provisioned ESXi-vmdk - with delta.vmdk snapshots||can be repaired - but risky|
|8. monolithicSparse modern WS-vmdk||unacceptable risk|
|9. thin provisioned ESXi-vmdks with weeks or months of data inside a long snapshot chain||unacceptable risk|
|10. monolithicSparse WS-vmdks with weeks or months of data inside a long snapshot chain||unacceptable risk|
|11. thin provisioned modern ESXi with lots of data in large SESPARSE-snapshots||unacceptable risk|
|12. encrypted modern WS-vmdk with snapshots||only usable if the data is regarded as disposable|
In scenarios 1 -3 you can leave for some months and still expect that your Windows NTFS volume still has your data -
even occasional thunderstorms with power failures should add no significant extra risk.
In scenarios 7 - 12 you have to expect that a single power failure or unexpected host-OS reboot renders ALL YOUR VALUABLE data unreadable.
In other words - even for a friday night visit to the movies you need a full working backup if you do NOT classify your data as disposable.
Highest risk factors at the moment:
- sparse WS-vmdks larger than 950GB
- very large thin ESXi vmdks with huge SESPARSE snapshots
- large VMFS datastores (> 8TBs and larger) with lots of thin provisioned VMDKs AND lots of snapshots
- encrypted WS vmdks with snapshots.
I work as a freelancer in VMDK/ VMFS recovery - that means that I have a good overview of the available tools and also have the necessary skills and practice.
If you call me with scenario 10, 11 or 12 I would have to tell you that the chances are not good enough to justify spending a lot of time with recovery attempts.
For scenario 11 and 12 I would tell you to do the math and calculate wether the data is valuable enough to call Ontrack. If you have seen what Ontrack charges - you know what I mean:
most of the times those stories end with a quite expensive experience ....
Hope that all readers still can sleep well after reading this ....
My suggestion if you are a paying WS or vSphere customer:
Complain about the fact that VMware does NOT supply a known to work commandline tool to fix errors in WS or ESXi delta.vmdks
Complain that there is no tool to attach SESPARSE snapshots to a basedisk.
At the moment not even UFSexplorer - the reference tool for reading VMDKs and VMFS - can handle such a case unless both flat and the sesparse vmdks are in perfect health.
Complain about VOMA - in practical scenarios VOMA will not help you a tiny bit - it will only make you waste lots of time.
And time is in very short supply if you have to handle those large VMDKs and datastores that are used today.
Hope this will help some users ....