VMware Cloud Community
RM1875
Contributor
Contributor
Jump to solution

Snapshot causes Datastore to run out of space without warning

Hi,

Maybe it's because of my leak of VMware knowledge, but I'm a bit dissapointed in ESXi.

It's happend to me a second time, you run a VM, need to create a (just in case) snapshot...and forget to remove the snapshot.

So far so good...BUT when you discover the snapshot and eventually want to remove it in some kind of a way the snapshot removal needs the same amount of free storage as the snapshot has grown and because ESXi doesn't warm you or threshold your free space compared to the size of the snapshot, removing the snapshot consumes so much of free space that eventually your datastore is out of diskspace, your VM crashes and you are not able to consolidate.

I really do not understand why VMware does not build in a protection warining that the ratio snapshot versus free diskspace is in danger or at least will BEFORE you start the deletion and create your own VM suicide.

Now we have to trust on VMware support...advising me to extend the local storage...what won't be easy saidly...Hope the come with a suitable solution!

1 Solution

Accepted Solutions
SupreetK
Commander
Commander
Jump to solution

Creating the below alarms and more importantly, monitoring them will help you proactively address the datastore usage issues than reacting on them -

Configure 'Datastore Usage on Disk' alarm - Configuring and Analysing vSphere Datastore Alarms

Configure 'VM running on Snapshot' alarm - VMware Knowledge Base

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

Cheers,

Supreet

View solution in original post

4 Replies
SupreetK
Commander
Commander
Jump to solution

Creating the below alarms and more importantly, monitoring them will help you proactively address the datastore usage issues than reacting on them -

Configure 'Datastore Usage on Disk' alarm - Configuring and Analysing vSphere Datastore Alarms

Configure 'VM running on Snapshot' alarm - VMware Knowledge Base

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

Cheers,

Supreet

RM1875
Contributor
Contributor
Jump to solution

@

Well, having a lot of time this night because of a bad sleep...maybe a design change could prevent having trouble in the first place.

I never tested it, but maybe its a good idea in the future to use a 75/25% rule when creating the VM disk. Let's say I have a 1TB local storage (datastore) and I would like it to use it as a VMdisk.

When creating I will assign only 75% of the maximum amount of the datastore space to the VMdisk, so the VMdisk will be 750GB big. Whenever the VMdisk runs out of space for whatever reason the VM disk can be increased with the 25% left datastore diskspace saving you because now it's possible to consolidate.

As said, never tested it...but sounds wisely....

0 Kudos
SupreetK
Commander
Commander
Jump to solution

All depends on how good your design is. Have seen environments with just 1-2 percent of overhead free space in the datastore and still going on without going down.

Cheers,

Supreet

0 Kudos
golddiggie
Champion
Champion
Jump to solution

In every environment I've administrated, or set up, we've ALWAYS had the usage alerts set and going to those that needed to see alerts. I typically don't set the yellow alert to any higher than 75% consumed before getting a message. At 95% is a red alert (danger, danger Will Robinson).

Also, IF you use snapshots properly, then you don't run into the issues like you described. They are supposed to be used (and actually designed to be used) as recovery points during changes to a VM. Either changes within the VM's OS or even a change at the virtual hardware layer. Then deleted (if no need to revert) within ~72 hours of creation. I've run reports every 2-4 weeks on configurations where I knew users were NOT removing snapshots properly, and bugged them until they either did, or completely understood the dangers of having snapshots live longer. Not only do they consume space, the VMs have a performance hit due to the snapshots remaining.

SNAPSHOTS ARE NOT A BACKUP SOLUTION.

Learn that and live by that. While many B&R products USE snapshots as part of their process, they don't keep them on the VM (well, if the product is working properly).

Also, I would NEVER have a datastore sized so that it will be used 100% in a VM. Or even be 100% consumed by VMs living on it. Maybe in a test lab configuration you can use above 80% of the storage, but I would NEVER do that in a production environment. I would also make damned sure all of the storage alarms are active and you actually get the messages. That way as storage starts to have higher consumption levels you get messages and can address it BEFORE you run into major problems.

IME, getting a solid environment design approved and built, at the start, eliminates a LOT of potential issues later. Hell, I even have more than enough unallocated/consumed space for my home lab vSphere environment (both in host and on the NAS I'm using).