VMware Cloud Community
savagea
Enthusiast
Enthusiast

Disk space alert will not clear

Have a server that's alerting for "One or more virtual machine guest file systems are running CRITICALLY low on disk space".  This condition was present about 12hrs ago for a brief time, but since, the disk has been cleared off.  However every time I cancel this alert, it comes back after the next 5min collection. 

Any thoughts?

0 Kudos
21 Replies
greco827
Expert
Expert

Check the symptom definitions of the specific alert which is being triggered.  It may still have a breach if it is based on a static usage amount or a percentage which wasn't met.

Also check and make sure it isn't a different partition on the same VM which is causing the alert.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

Alert:  "One or more virtual machine guest file systems are running CRITICALLY low on disk space"

Symptom:  "Guest file system space usage at critical level".  Critical when metric is >95. 


Both vDisks (C: D:) are below 50% utilization. 


Wait Cycle = 1.  Cancel Cycle = 1



0 Kudos
greco827
Expert
Expert

Hmmmm.  OK.  And the disk space you added on the OS was rescanned, and the partitions from the OS itself show adequate space, correct?  What are the partition sizes/free space?  Is VMware Tools Running?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

VMware Tools is running, yes. 

1.png

0 Kudos
greco827
Expert
Expert

This one has me stumped, but I'm looking to see if I can mimick this behavior.  The VM has tools running, it appears to be collecting, and vROps seems aware of the disk space addition since it is reflected in the metrics.  It appears that the symptom and message event definitions are default, so I'm really not sure.  I'll keep digging around.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

Thank you, much appreciated.  Could this have to do with how far back in time this checks? 

0 Kudos
greco827
Expert
Expert

What policy is applied to this VM?  Has that policy (even if it is Default Policy) been altered?  You can go into the policy itself under Override Alert/Symptom Definitions and make sure that there is no override setting configured on the policy applied to the VM in question.

vROps_Policy_OverrideDefinitions.jpg

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

This is a copy of the default policy, no override settings to any vms. 

1.png

0 Kudos
greco827
Expert
Expert

OK, it was worth a quick check.  I'm still looking at why this might be happening.  I was able to add files to a VM, create this alert then delete the files and the alert cleared. 

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

Really...  I'm racking my brain here.  Very odd.

0 Kudos
greco827
Expert
Expert

Just noticed this, but in your image, why are one of your alerts marked as do not inherit?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
greco827
Expert
Expert

Also, what is the definition of the one starting with (DCs)?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

That's marked as do not inherit because by default it's set to alert on 'Any' of:

Guest file system space usage at immediate level

Guest file system space usage at warning level

Guest file system space usage at critical level

So we cloned that definition, and set it to alert on 'Only':

Guest file system space usage at critical level

The (DCs) definition is a modification for only our Domain Controller.  It's applied to only our 'DC' custom group.

0 Kudos
PaulH82
Contributor
Contributor

We are seeing multiple cases of the same problem too - vROps version 6.0.1 and also 6.1 affected. Metrics are showing disk space is clear and down below 25% usage, but critical alert > 98% still flagging.

Have manually cancelled alert, restarted vSphere adapters, stops and restarted collections on the VM (via Environment Overview) and no success on this occasion. (Sometimes, however, restarting the collection has cleared the alerts ok on other VMs)

We have also stopped and restarted VMTools.

Any suggestions would be greatly appreciated!

Paul

0 Kudos
PaulH82
Contributor
Contributor

I may have stumbled across something....

I placed the VMs into Maintenance using Environment Overview (Inventory Explorer) and the alerts cleared as expected. (Previously, stopping collection either by the solution/adapter or individual VM object did not clear the alert which confused me)

I then ended the Maintenance and the alerts have not resurfaced. Metrics all appear to be collecting ok so I'm now hopefully that this has fixed the problem.

Let me know if this helps you as well!

Paul

0 Kudos
savagea
Enthusiast
Enthusiast

Paul, I also stumbled on to something too.  Seems an alert for the server that was an issue had been previously been 'assigned ownership'.  Once I release that ownership, deleted the alert, it hasn't resurfaced. 

0 Kudos
PaulH82
Contributor
Contributor

Ah ok - that was not the case for us as we take ownership of alerts within our team to show that they are being investigated and to avoid engineers duplicating effort. (Annoying that ownership cannot be transferred unless original owner releases ownership themselves)

To us, it looks like vROps "forgets" to clear the alert (or thinks that it already has) and no number of collections or metrics seems to force the system to recheck against old alerts. We believe that placing into maintenance forces all alerts to clear (regardless of status) and therefore "flushes" out the system.

We've fixed this alerting problem for at least 4 VMs today successfully using this method. Smiley Happy

0 Kudos
savagea
Enthusiast
Enthusiast

Awesome Paul, nice job.  I'm trying that as well.

0 Kudos
suvrobhattachar
Enthusiast
Enthusiast

If you are running Windows NT Based Operating system , then you can configure the Perfmon inside the VM and see if any genuine High Disk Usage is reported or not .

It'll help you to remove the odd man out !!!

0 Kudos