Greetings fellow VMware wayfarers,
I am trying to build policy and procedure around VMware virtual machine snapshots and would like to know your thoughts in regards to best practice and how long should snapshots be in place before removing them. And what is the maximum number snapshots you feel should be taken for each machine.
The purpose of my research is to incorporate this policy into a service level agreement and to preserve system integrity for both the virtual machine and the infrastructure that it runs on. I have worked in a few environments were a VM had several snapshots that were months old and were 60-80 gb in size. When i confirmed that the snapshots were no longer needed and was given the ok to remove them it took almost 24 hours and there was an impact on system performance for the VM and of course the ESX server that is was running on. I am looking forward to your responses and value your opinion. Thanks for your support.
So far here is what I have come up with.
Maximum number of snapshots per machine = 3
Duration that a snapshot will last before deletion = 5 business days
-Jason
thanks for the helpful
www.phdvirtual.com, makers of esXpress
IMHO, in the enterprise (using ESX snapshot) you have to limit this feature.
A good choice is VUM default: 1 snapshot for max 18 hours.
If you really need a long time rollback you must use backup.
Or, if your storage support snapshot integrated with ESX (like Equallogic Auto Snapshot Manager) you can use this feature instead.
Andre
When I ran a VMware/Oracle infrastructure we very rarely kept snapshots on production Virtual Machines more then a day if that long and normally no more then 1 at a time.
However in development environments things were handled differently as snapshots provided an excellent way to do patch and code release testing. We could snapshot a VM, load our patches or new software changes. Then if a problem existed the snapshot provided the ability to revert back to it.
www.phdvirtual.com, makers of esXpress
I think it really depends on the use case and specifically what the virtual machine is used for. If the workload is a file server then limiting snapshots makes a lot of sense. On the other hand if you are doing software development then I can see situations where many snapshots are required and regularly used. I could argue that VMware Lab Manager is a better tool to use to manage those kinds of environments, but not everyone has that. So understanding what the specific server is used for will help create the SLA and overall guidance.
I agree with others that keeping snaphots around for a long period of time is a bad idea. In general if you can limit the total number of snapshots to between 1-2, and the duration to no more than 5 days, you will likely not run into issues with overly large snapshots impacting disk performance. A great way to make sure you're keeping your SLAs is to use PowerShell/PowerCLI scripts to query your environment and see which VMs have snapshots, how old they are, who created them, etc. The following is a fantastic script that gives you not only this information but a great deal more. Using this script or one like it will help make sure that you don't have rogue snapshots growing out of control and that you are keeping users in line (and in compliance with your SLA).
http://www.virtu-al.net/2009/08/18/powercli-daily-report-v2/
Snapshots are a great feature but I think they can cause more harm than good if they are not well understood/managed. I think your idea of developing specific guidance and an SLA is a good one, and I'm curious to hear what you ultimately decide.
In my oppinion snapshots should never be used except as a temporary "Undo" button lasting as little time as possible, such as in the case of VUM or in testing a patch. I advise people to keep their snapshots around as little as possible, 10 min or less even.
---
If you found any of my comments helpful please consider awarding points for "Correct" or "Helpful". Thanks!!!
Greetings,
Good work!! and thank you for your input. Here is what I have determined. There is a strong need to separate production VMs from development VMs because in the production world although you might need a snap for a short period of time it should really only last for about 10-12 hours. In the development arena however the snapshots need to last much longer because they provide ultimate flexibility and a reliable way of unwinding any permanent changes in case of a failure.
In conclusion I am going to create separate LUNs to house the development VMs and perform a "storage Vmotion" on any existing development VMs that are running on LUNS with production VMs. In my policy I will stipulate that a production VM will have no more than one snapshot at any given time with a 10 hour "time to live" before deletion. The development VMs will have a maximum of three snapshots at any give time and have a "time to live" for 24 hours. If there is a need to unwind any changes afterwards then we will revert to backups and perform a VM recovery.
The reason for separate LUNS is so I can guarantee system integrity for the production VMs. Lastly all systems will be classified in one of two ways "Production" or "Development" and with that being said they will fall under a resource pool for runtime compute resources. See below.
Resource Pools and functions
"High Priority (Production VMs)" Set attributes appropriately to guarantee VM resources and take what is needed to achieve this from systems outside of the resource pool.
"Low Priority (Development VMs)" Set attributes appropriately to provide sufficient resources but do not guarantee them, production has overall precedence.
Thanks for your support guys.
-Jason
thanks for the helpful
www.phdvirtual.com, makers of esXpress
Уважаемые Коллеги!
11 сентября 2009 до 12часов нахожусь вне офиса. Доступен на мобильном 79636266147.
С уважением, Марат Лесных.