VirtualNewbie1
Enthusiast
Enthusiast

Deleting very old Snapshots residing on Netapp Datastore

Hello All,

We are using VMWare 5.5 VSphere and have installed many VMs on ESXi Host. Recently one of the VMs (Sles 10) has performance issue whenever a activity (e.g. Postgres Database) starts in cron.

The system seems to get very slow or even non-responsive at times. One more important factor to note here is that, the whole VM is residing on Datastore created on Netapp Storage (via NFS).

So that could also be a reason for Performance problems. But we are not yet sure the real cause of problem. So we are considering all factors.

I came across this information while digging around,  "It is recommended to delete old snapshots because the presence of redundant delta disks can adversely affect virtual machine performance."

There  are no. of old snapshots (oldest being from 2018)  created by earlier administrator and never been deleted then. Please see the attachment.

We wish to delete the older snapshots but would like to keep the present state of the VM. This is very important.

I am new to VSphere Snapshot topic. Hence I would like to know your point of view on following points.

1. How valid is this assumption, that deleting the snapshot/s would help resolving performance problems?

2. If we decide to delete the older snapshots one by one then how long will it take for each snapshot to get deleted. As I read, the system consolidate the data after each delete operation.

    (The system itself is using currently 112 GB of Netapp-Datastore space.)

3. What should be order to delete the Snapshots? Like Oldest  ---> newest or otherway round ?

4. We do not wish to preserve any snapshot but at the same time do not  wish to delete all the snapshots at once. As this being the production system, would like to take minimum risk.

5. Using snapshot manager has any advantage?

6. Should the system be restarted , before/during/after snapshot delete operation?

7. During snapshot delete , is the system performance degraded? Thereby affecting the users working on it?

Please let me know if you require any further information.

I am sorry to ask so many questions but any guidance from people having previous experience, would be very much appreciated. I would as well simultaneously be trying to gather information from elsewhere.

Thanks in advance.

0 Kudos
7 Replies
daphnissov
Immortal
Immortal

Your performance degradation issues are very likely due to the large number of snapshots on this VM, and, if their names are any indication they are YEARS old. This was a huge mistake allowing this to happen in the first place. You must delete all of them if you want to return this VM to normal performance. Depending on the performance of the underlying storage, deleting/committing these snapshots may very likely cause performance penalties. I would:

  1. Right now, create a full, standalone backup of this VM with whatever backup tool you use. Needs to be image based.
  2. Delete snapshots starting with the most recent first.

Now to your questions.

1. How valid is this assumption, that deleting the snapshot/s would help resolving performance problems?

Very valid. In fact, "probably" is the operative word here.

2. If we decide to delete the older snapshots one by one then how long will it take for each snapshot to get deleted. As I read, the system consolidate the data after each delete operation.

    (The system itself is using currently 112 GB of Netapp-Datastore space.)

Depends on a variety of factors. We cannot give you an answer here.

3. What should be order to delete the Snapshots? Like Oldest  ---> newest or otherway round ?

You can do it either way, but newest first has less impact.

4. We do not wish to preserve any snapshot but at the same time do not  wish to delete all the snapshots at once. As this being the production system, would like to take minimum risk.

This is somewhat of an irrelevant point. But if you wish to minimize the underlying storage activity caused by a commit, start with newest first.

5. Using snapshot manager has any advantage?

Don't understand what you mean by this question.

6. Should the system be restarted , before/during/after snapshot delete operation?

No

7. During snapshot delete , is the system performance degraded? Thereby affecting the users working on it?

Possibly, but to what extent and for how long depends on the same factors preventing us from giving you an answer in your question #2.

VirtualNewbie1
Enthusiast
Enthusiast

5. Using snapshot manager has any advantage?

Don't understand what you mean by this question.

If VSphere Client --> Snapshot Manager tool is the recommended option. 

There is as well a "Consolidate" option available separately.

Thanks a ton for all the replies/comments.

0 Kudos
daphnissov
Immortal
Immortal

Yes, that is the recommended option. Consolidate is a separate action which you seem not to need at the present time.

0 Kudos
VirtualNewbie1
Enthusiast
Enthusiast

Ok thanks once again. I would plan this activity and update you all.

0 Kudos
pragg12
Hot Shot
Hot Shot

Hi,

Damn! is the first word I said when I saw the attached image. Oldest snapshot seems to be from Dec 2015 as per first snapshot name.

These are lots of snapshots and the latter half of these are the ones which also include memory snapshot. (the ones with green play button)

You should check how much total space this VM is occupying on datastore. Refer this VMware Doc : View Virtual Machine Storage Resources in the vSphere Web Client

Another point to consider is, if all disks of this VM are hosted on same datastore or more.

This is my rough estimate, from experience. If the gap between total size of all disks in use by VM and total space used by VM in datastore is huge, you should be looking at a very long maintenance window. Possibly one whole weekend, with no other task running on that datastore(s) in parallel.

Plan carefully since vSphere 5.5 is already EOL. 😉

Consider marking this response as "Correct" or "Helpful" if you think my response helped you in any way.
0 Kudos
VirtualNewbie1
Enthusiast
Enthusiast

These are lots of snapshots and the latter half of these are the ones which also include memory snapshot. (the ones with green play button)

memory snapshots ...Is that a problem  when considering the deletion?

You should check how much total space this VM is occupying on datastore.

Provisioned storage is 302 GB , Not-shared Storage - 121 GB and Used Storage is 121 GB. But things are not that simple. This VM is not residing on local datastore. It is completely residing on Datastore created on Netapp Storage via NFS (1Gb Network).   I guess this will surely slowdown the whole process.

Another point to consider is, if all disks of this VM are hosted on same datastore

Both the disks ( 100 GB each Thin Provision) of this VM are created on this remote Datastore. 

If the gap between total size of all disks in use by VM and total space used by VM in datastore is huge, you should be looking at a very long maintenance window. Possibly one whole weekend, with no other task running on that datastore(s) in parallel.

Like said the Provisioned storage is 302 GB and used is 121 GB. so I guess the difference is 181 GB. Not sure if with 1 GB network this is considered to be a huge difference.

One more thing that comes to my mind is, deleting the oldest snapshot is sensible or the latest one? In above comments, one expert suggested to consider the newest snapshot first for deletion.

And I will be using VSphere client-->Snapshot Manager to delete the snapshots one by one. If at all the deletion of first snapshot itself takes lets say around 3-4 Hours then I think i might plan this activity at different weekends. I mean it will not be a must to delete all the snapshots at one go. Or am I thinking in a wrong way? Please clarify.

0 Kudos
pragg12
Hot Shot
Hot Shot

memory snapshots ...Is that a problem  when considering the deletion?

When considering deletion, no.

Provisioned storage is 302 GB , Not-shared Storage - 121 GB and Used Storage is 121 GB. But things are not that simple. This VM is not residing on local datastore. It is completely residing on Datastore created on Netapp Storage via NFS (1Gb Network).   I guess this will surely slowdown the whole process.

You may face latency when trying to delete these many snapshots in one go, over 1 Gb network. Delete snapshots in phases, few at a time or one snapshot at a time.

Both the disks ( 100 GB each Thin Provision) of this VM are created on this remote Datastore.

Ok.

Like said the Provisioned storage is 302 GB and used is 121 GB. so I guess the difference is 181 GB. Not sure if with 1 GB network this is considered to be a huge difference.

One more thing that comes to my mind is, deleting the oldest snapshot is sensible or the latest one? In above comments, one expert suggested to consider the newest snapshot first for deletion.

And I will be using VSphere client-->Snapshot Manager to delete the snapshots one by one. If at all the deletion of first snapshot itself takes lets say around 3-4 Hours then I think i might plan this activity at different weekends. I mean it will not be a must to delete all the snapshots at one go. Or am I thinking in a wrong way? Please clarify.

The difference doesn't look huge. So the deletion shouldn't take long time. Plan this during off business hours and make sure new backup schedule is not running on same VMs or any VMs on same datastore and ESXi host. Any application, database or tasks should not be running on the affected VM since a high number of IOs on VM can increase snapshot deletion time.

I haven't faced this situation so can't comment on which snapshot to delete.If you have received a suggestion, try it and let us know how it goes. Good Luck to you!

Refer to below 3rd party article on VMware Snapshot:

Deep Dive – The Ultimate Guide to Master VMware Snapshot

Consider marking this response as "Correct" or "Helpful" if you think my response helped you in any way.