Re: different disk space alarms for development ma...

rgcda · ‎05-12-2015

In vCOps 5.8 I used attribute packages to differentiate between production and development machines on alerting and monitoring for disk space alarms. So if a production machine reached 95% utilization on a drive it would generate an alarm. I set this up in an attributed package and assigned it to production machines. I had a different attribute package assigned to development machines that would not create an alarm. So when a production machine exceeded the 95% threshold I would get an email indicating I needed to take action. I also had some different attribute packages assigned to some production machines that have chronically high disk space usage, but never reach the 100% utilization. These wouldn't alarm at the 95% threshold but maybe at the 99% threshold. In vROps I seem to have lost this ability which I'm not thrilled about. It doesn't appear that you can set different monitoring thresholds for different machines. So now after migrating to vROps I get alarms for development machines. Is there anyway to setup different monitoring thresholds for different machines. I know I could just put a development machine into permanent maintenance mode, but that doesn't really help me out with the ability to set different thresholds for production machines.

funksoul · ‎05-13-2015

Hi, rgcda.

The attribute packages from vC Ops 5 merged as a part of the policy in vR Ops 6.

To illustrate a general procedure, in vR Ops 6 you would go through these steps to set different threshold values to different groups of machines:

1. From 'Content > Symptom Definitions' menu, define symptoms using metrics for monitoring disk usage. (e.g. guestfilesystem|percentage)

You can set thresholds like 85% = WARNING, 90% = IMMEDIATE, 95% = CRITICAL as a baseline for all machines.

2. From 'Content > Alert Definitions' menu, define an alert using symptoms you defined above.

3. From Environment menu, create a group that has the development machines in it.

4. From 'Administration > Policies' menu, create a new policy based on your default policy and override symptom threshold value (like 99% = CRITICAL) and assign the policy to the group you created.

5. From 'Content > Notifications' menu, define a notification rule if you need to receive an e-mail or something on the alert you defined.

* Unlike Memory usage metrics (for example), a machine can have many disk drives that results to many instances of a same metric. (such as disk space usage for C:, D:, E:)

So when you define a symptom using these metrics, you should set an 'instanced' property for it. (KB 2108273 : http://kb.vmware.com/kb/2108273)

Regards,

Ho-Sung.

rgcda · ‎05-13-2015

This is great. Thanks for taking the time to explain things. I'm not fully understanding this yet. I went through this process and was able to disable some symptom definitions in a new policy which I assigned to a group with a test machine in it I'm playing with and it cleared the disk utilization alarms that were occurring. So I've got a portion of what you are describing done. What I don't fully understand is how I assign a different symptom definition in this policy. Are you saying that I need to create new symptom definitions that I want to assign to development machines and then enable them in this specific policy in the Override Alert / Symptom Definitions? Also for alert notifications would I have to create a new alert definition and enable it in is this specific policy?

If I would have different symptom definitions for the same object are you saying I have to make the change in the KB you mentioned? For instance if I want the C drive to be a critical alarm when it reaches 95% on this machine and a critical when it reached 99% on a different machine.

funksoul · ‎05-13-2015

Hi, rgcda.

> Are you saying that I need to create new symptom definitions that I want to assign to development machines and then enable them in this specific policy in the Override Alert / Symptom Definitions?

When you define a symptom that triggers over 95% of disk space usage, the default threshold value of the symptom will be 95.

But if you want different threshold value of 99% for your development machines group with the same symptom definition,

you don't have to define another duplicated symptom but just override(modify) the threshold value of original symptom to 99 in the policy edit screen.

In other words, single symptom definition can have different threshold values in each policy.

> Also for alert notifications would I have to create a new alert definition and enable it in is this specific policy?

Nope, you do not have to create a new alert in this case.

If a symptom threshold value changes via policy override, alerts which have that symptom will fire at different threshold value accordingly.

> If I would have different symptom definitions for the same object are you saying I have to make the change in the KB you mentioned?

For instance if I want the C drive to be a critical alarm when it reaches 95% on this machine and a critical when it reached 99% on a different machine.

Nope, as you see in the above example, applying different threshold value to each machine can be done regardless of the metric you use is instanced or not.

The KB is about treating a metric that has many instances for a machine and you may need it for disk space usage.

Regards,

Ho-Sung.

All

different disk space alarms for development machines vs. production machines