All,
I am new to VMware, vSOM, and vROPs.
We have a VM in vSOM generating the following alert: “Virtual machine has continuous high CPU usage causing stress”. Using vSphere Web Client and monitoring the performance of the CPU, I do see brief periods of time when CPU usage spikes above 60%. However, it quickly returns to less than 10%. So, I am expecting the vSOM alert to clear the next cycle.
After I watched the “vRealize Operations Explaining Alerts: Alerts, Symptoms, and Recommendations & Actions” video found on the Alerts tab of the offending VM, I discovered vSOM/vROPs polls Objects every five mins.
Questions:
T.J.
I just had a support case open about this same alert a few weeks ago. I think the alert name should really be changed to say something like "Virtual machine has a period of high CPU usage causing stress". Stress uses a sliding analysis window that you can set in the policy. By default it is a 60 minute window for a VM and the analysis period is 30 days. So any 60 min period in the last 30 days can cause the stress score on the VM to be very high. In my case I had a VM that had been idle for 29 days, but because there was one period of high CPU usage 30 days ago it still had high stress and this alert was active. You can try modifying the sliding analysis window in your policy if you want to bring the overall stress score down or change the demand exceeds to a higher value:
In my case I had a period where the CPU was very high for a couple hours so the stress for a 60 min window was very high. Support recommended a 4 - 8 hour window.
I'm still not sure if I completely understand stress, but in answer to your questions:
1. The out of box alert isn't going to clear unless the stress also drops. If you want a more traditional alert that only looks at the CPU usage and doesn't use the stress score you can create a custom alert that is only triggered on CPU usage. It should clear automatically after the CPU usage drops back down.
2. If the alert has been cleared then it will generate the alert again. You can manually clear an active alert and it should be generated again if the symptoms still exist.
3. I don't know of a way to manually poll an object. In the Inventory Explorer you can start and stop collecting for an individual object. That might get it to poll sooner.
dtaliafe,
Thanks again for your post! It really helped me out. Following are steps I took to get the alert to clear.
After reading your post, found the following recommendation on the VMware website:
CPU | Warning Condition | CPU usage is above 75% for 5 minutes |
CPU | Critical Condition | CPU usage is above 95% for 5 minutes |
Memory | Warning Condition | Memory usage is above 85% for 10 minutes |
Memory | Critical Condition | Memory usage is above 95% for 10 minutes |
I got the alert to clear when I did the following:
Resource Demand Time
Memory 85% 10 minutes
CPU 75% 5 minutes
It looks like the VMs high CPU Utilization takes place around 0300 and 0730. So, I need to wait and see if my new Policy works.
T.J.