VMware Cloud Community
savagea
Enthusiast
Enthusiast

How to reset a 'time based' alert

I have a couple servers that are alerting for 'Virtual machine has chronic CRITICAL CPU workload leading to CPU stress', which isn't an event-based alert but rather uses data over the past 30 days. 

Both servers are actually right-sized, but only alarming because of one application issue a week or so ago that caused the CPU's to spike for a while. 

Do I have to wait for the 30 day cycle to end before these alerts go away, or is there a way to 'reset' them on these servers?

I can't change the 30 day time setting, and cancelling these alerts only brings them right back.

30 Replies
greco827
Expert
Expert

That Alert Definition is tied to a Symptom Definition which has a cancel cycle of 1, which should mean 5 minutes.  That being said, critical by default is set at greater than 50% CPU|Stress, so if you are still exceeding 50%, it will still alert.

Both are editable.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
KamalakarAnumul
Enthusiast
Enthusiast

Solution given below is correct.

Reply
0 Kudos
savagea
Enthusiast
Enthusiast

Thank you!  How do I mark these as 'answered'?

Reply
0 Kudos
savagea
Enthusiast
Enthusiast

So let me understand...  If cancelled, and the alert doesn't happen again within 5min, it shouldn't come back.

However I do have my stress criteria in the policy set to >80%, 120min peak, 30 day sample.

Reply
0 Kudos
greco827
Expert
Expert

That policy setting is different from the alert itself.  The policy is more capacity related, whereas the alert indicates an immediate and precise issue.

vROps_Alert01.jpg

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
savagea
Enthusiast
Enthusiast

Ok, got that.  But here's where I'm having trouble...  My alert is based on a 'critical' symptom where CPU% > 75.  But after cancelling this alert, the CPU didn't get above or near 50% and it came back again.  1.png2.png

Reply
0 Kudos
greco827
Expert
Expert

Is the second image showing CPU | Stress, or CPU | Workload or Usage?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
greco827
Expert
Expert

Can you share a screenshot from this page?

vROps_CPUStress.jpg

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
savagea
Enthusiast
Enthusiast

Sorry for the dumb question, but how can I tell?

Reply
0 Kudos
savagea
Enthusiast
Enthusiast

Here the screen you asked for, along with another view that shown the CPU stress from back on 9/25.  My thinking is that since the stress timeframe goes back 30 days, this might be why the alarm isn't clearing?1.png2.png

Reply
0 Kudos
greco827
Expert
Expert

That could explain why it is recurring.  Where in the policy did you set the 30 day time frame?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
savagea
Enthusiast
Enthusiast

It's set in the policy.  30days is the default setting for non-trend based analytics.  But there has to be a way to get these alerts to stop showing up, if the stress was caused by some short-term issue with the server, instead of having to wait for the entire 30 days to go by.

1.png2.png

Reply
0 Kudos
greco827
Expert
Expert

I really don't think that is the issue, but change it to 1 day and see.  It's easy enough to change back.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
savagea
Enthusiast
Enthusiast

Yup, you're right.  Changed to 1 day, came right back. 

Reply
0 Kudos
greco827
Expert
Expert

OK, so the hunt continues!!

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
greco827
Expert
Expert

Can you share this screen please?vROps_CPUStress2.jpg

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
savagea
Enthusiast
Enthusiast

1.png

Reply
0 Kudos
greco827
Expert
Expert

OK, try this. 

1) Check the policy that is being applied to this VM to ensure you edit the right policy.

2) Edit that policy and make sure to choose the vCenter Adapter - Virtual Machine.

3) In the stress field, change CPU from Sliding Analysis Window: Any, to Entire Range

4) In the time field, change the Date Range to 1 day.

vROps_CPUStress3.jpg

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos
greco827
Expert
Expert

This works.  My badge score was 60 earlier, but now tat it is based on the last day for the VM, it is 4.

vROps_CPUStress2.jpgvROps_CPUStress4.jpg

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
Reply
0 Kudos