VMware Cloud Community
tjreeddoc
Enthusiast
Enthusiast

vSOM alerts not automatically clearing/ Can you manual poll an Object?

All,

I am new to VMware, vSOM, and vROPs.

We have a VM in vSOM generating the following alert: “Virtual machine has continuous high CPU usage causing stress”.  Using vSphere Web Client and monitoring the performance of the CPU, I do see brief periods of time when CPU usage spikes above 60%.  However, it quickly returns to less than 10%.  So, I am expecting the vSOM alert to clear the next cycle.

After I watched the “vRealize Operations Explaining Alerts: Alerts, Symptoms, and Recommendations & Actions” video found on the Alerts tab of the offending VM, I discovered vSOM/vROPs polls Objects every five mins.

Questions:

  1. Once the CPU utilization drops below 60%, shouldn’t the alert ”Virtual machine has continuous high CPU usage causing stressclear?
  2. When the utilization increases above 60%, shouldn’t vSOM/vROPs generate the “Virtual machine has continuous high CPU usage causing stress alert again?
  3. Do I have to wait five mins? Can I make vSOM/vROPs manually poll an Object to determine if the alert has cleared?

T.J.

Tags (3)
3 Replies
dtaliafe
Hot Shot
Hot Shot

I just had a support case open about this same alert a few weeks ago.  I think the alert name should really be changed to say something like "Virtual machine has a period of high CPU usage causing stress".  Stress uses a sliding analysis window that you can set in the policy.  By default it is a 60 minute window for a VM and the analysis period is 30 days.  So any 60 min period in the last 30 days can cause the stress score on the VM to be very high.  In my case I had a VM that had been idle for 29 days, but because there was one period of high CPU usage 30 days ago it still had high stress and this alert was active.  You can try modifying the sliding analysis window in your policy if you want to bring the overall stress score down or change the demand exceeds to a higher value:

stress.png

In my case I had a period where the CPU was very high for a couple hours so the stress for a 60 min window was very high.  Support recommended a 4 - 8 hour window.

I'm still not sure if I completely understand stress, but in answer to your questions:

1. The out of box alert isn't going to clear unless the stress also drops.  If you want a more traditional alert that only looks at the CPU usage and doesn't use the stress score you can create a custom alert that is only triggered on CPU usage.  It should clear automatically after the CPU usage drops back down.

2. If the alert has been cleared then it will generate the alert again.  You can manually clear an active alert and it should be generated again if the symptoms still exist.

3. I don't know of a way to manually poll an object.  In the Inventory Explorer you can start and stop collecting for an individual object.  That might get it to poll sooner.

tjreeddoc
Enthusiast
Enthusiast

dtaliafedtaliafe,


Thank you!


You saved me a ticket and from re-inventing the wheel!


T.J.

0 Kudos
tjreeddoc
Enthusiast
Enthusiast

dtaliafe,

Thanks again for your post! It really helped me out. Following are steps I took to get the alert to clear. 

After reading your post, found the following recommendation on the VMware website:

CPU

Warning Condition

CPU usage is above 75% for 5 minutes

CPU

Critical Condition

CPU usage is above 95% for 5 minutes

Memory

Warning Condition

Memory usage is above 85% for 10 minutes

Memory

Critical Condition

Memory usage is above 95% for 10 minutes

I got the alert to clear when I did the following:

  1. Copied my default policy and named the new policy Test_VM_Alerting_Policy
  2. Created a Custom Group and named it Test_VM_Alerting_Group
  3. Added the Object of the VM that was alerting to the Test_VM_Alerting_Group
  4. Assigned the Test_VM_Alerting_Policy to the Test_VM_Alerting_Group
  5. Edited Analysis Setting>Virtual Machine>Stress

Resource                   Demand                     Time

Memory                      85%                            10 minutes   

CPU                           75%                            5 minutes

  1. Manually polled the Object by going to Administration>Inventory Explorer> Selecting the Object that was alerting>Stop Collecting then Start Collecting

It looks like the VMs high CPU Utilization takes place around 0300 and 0730.  So, I need to wait and see if my new Policy works.

T.J.

0 Kudos