VMware Cloud Community
RegNullify
Contributor
Contributor
Jump to solution

Let’s talk about virtual machine snapshots in the enterprise

Greetings fellow VMware wayfarers,

I am trying to build policy and procedure around VMware virtual machine snapshots and would like to know your thoughts in regards to best practice and how long should snapshots be in place before removing them. And what is the maximum number snapshots you feel should be taken for each machine.

The purpose of my research is to incorporate this policy into a service level agreement and to preserve system integrity for both the virtual machine and the infrastructure that it runs on. I have worked in a few environments were a VM had several snapshots that were months old and were 60-80 gb in size. When i confirmed that the snapshots were no longer needed and was given the ok to remove them it took almost 24 hours and there was an impact on system performance for the VM and of course the ESX server that is was running on. I am looking forward to your responses and value your opinion. Thanks for your support.

So far here is what I have come up with.

Maximum number of snapshots per machine = 3

Duration that a snapshot will last before deletion = 5 business days

-Jason

A+, N+, CNA, CNE, MCP, MCSA, VCP310, VCP410, VCI <------ Long time dedicated IT Professional specializing in U.S. Federal Government implementations.
0 Kudos
1 Solution

Accepted Solutions
petedr
Virtuoso
Virtuoso
Jump to solution

thanks for the helpful

www.phdvirtual.com, makers of esXpress

www.thevirtualheadline.com www.liquidwarelabs.com

View solution in original post

0 Kudos
7 Replies
AndreTheGiant
Immortal
Immortal
Jump to solution

IMHO, in the enterprise (using ESX snapshot) you have to limit this feature.

A good choice is VUM default: 1 snapshot for max 18 hours.

If you really need a long time rollback you must use backup.

Or, if your storage support snapshot integrated with ESX (like Equallogic Auto Snapshot Manager) you can use this feature instead.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
petedr
Virtuoso
Virtuoso
Jump to solution

When I ran a VMware/Oracle infrastructure we very rarely kept snapshots on production Virtual Machines more then a day if that long and normally no more then 1 at a time.

However in development environments things were handled differently as snapshots provided an excellent way to do patch and code release testing. We could snapshot a VM, load our patches or new software changes. Then if a problem existed the snapshot provided the ability to revert back to it.

www.phdvirtual.com, makers of esXpress

www.thevirtualheadline.com www.liquidwarelabs.com
VMmatty
Virtuoso
Virtuoso
Jump to solution

I think it really depends on the use case and specifically what the virtual machine is used for. If the workload is a file server then limiting snapshots makes a lot of sense. On the other hand if you are doing software development then I can see situations where many snapshots are required and regularly used. I could argue that VMware Lab Manager is a better tool to use to manage those kinds of environments, but not everyone has that. So understanding what the specific server is used for will help create the SLA and overall guidance.

I agree with others that keeping snaphots around for a long period of time is a bad idea. In general if you can limit the total number of snapshots to between 1-2, and the duration to no more than 5 days, you will likely not run into issues with overly large snapshots impacting disk performance. A great way to make sure you're keeping your SLAs is to use PowerShell/PowerCLI scripts to query your environment and see which VMs have snapshots, how old they are, who created them, etc. The following is a fantastic script that gives you not only this information but a great deal more. Using this script or one like it will help make sure that you don't have rogue snapshots growing out of control and that you are keeping users in line (and in compliance with your SLA).

http://www.virtu-al.net/2009/08/18/powercli-daily-report-v2/

Snapshots are a great feature but I think they can cause more harm than good if they are not well understood/managed. I think your idea of developing specific guidance and an SLA is a good one, and I'm curious to hear what you ultimately decide.

Matt | http://www.thelowercasew.com | @mattliebowitz
0 Kudos
beyondvm
Hot Shot
Hot Shot
Jump to solution

In my oppinion snapshots should never be used except as a temporary "Undo" button lasting as little time as possible, such as in the case of VUM or in testing a patch. I advise people to keep their snapshots around as little as possible, 10 min or less even.

---

If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!!

www.beyondvm.com

--- If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!! www.beyondvm.com
0 Kudos
RegNullify
Contributor
Contributor
Jump to solution

Greetings,

Good work!! and thank you for your input. Here is what I have determined. There is a strong need to separate production VMs from development VMs because in the production world although you might need a snap for a short period of time it should really only last for about 10-12 hours. In the development arena however the snapshots need to last much longer because they provide ultimate flexibility and a reliable way of unwinding any permanent changes in case of a failure.

In conclusion I am going to create separate LUNs to house the development VMs and perform a "storage Vmotion" on any existing development VMs that are running on LUNS with production VMs. In my policy I will stipulate that a production VM will have no more than one snapshot at any given time with a 10 hour "time to live" before deletion. The development VMs will have a maximum of three snapshots at any give time and have a "time to live" for 24 hours. If there is a need to unwind any changes afterwards then we will revert to backups and perform a VM recovery.

The reason for separate LUNS is so I can guarantee system integrity for the production VMs. Lastly all systems will be classified in one of two ways "Production" or "Development" and with that being said they will fall under a resource pool for runtime compute resources. See below.

Resource Pools and functions

  • "High Priority (Production VMs)" Set attributes appropriately to guarantee VM resources and take what is needed to achieve this from systems outside of the resource pool.

  • "Low Priority (Development VMs)" Set attributes appropriately to provide sufficient resources but do not guarantee them, production has overall precedence.

Thanks for your support guys.

-Jason

A+, N+, CNA, CNE, MCP, MCSA, VCP310, VCP410, VCI <------ Long time dedicated IT Professional specializing in U.S. Federal Government implementations.
0 Kudos
petedr
Virtuoso
Virtuoso
Jump to solution

thanks for the helpful

www.phdvirtual.com, makers of esXpress

www.thevirtualheadline.com www.liquidwarelabs.com
0 Kudos
lesnyh
Contributor
Contributor
Jump to solution

Уважаемые Коллеги!

11 сентября 2009 до 12часов нахожусь вне офиса. Доступен на мобильном 79636266147.

С уважением, Марат Лесных.

0 Kudos