How to reset a 'time based' alert

savagea · ‎10-13-2015

I have a couple servers that are alerting for 'Virtual machine has chronic CRITICAL CPU workload leading to CPU stress', which isn't an event-based alert but rather uses data over the past 30 days.

Both servers are actually right-sized, but only alarming because of one application issue a week or so ago that caused the CPU's to spike for a while.

Do I have to wait for the 30 day cycle to end before these alerts go away, or is there a way to 'reset' them on these servers?

I can't change the 30 day time setting, and cancelling these alerts only brings them right back.

greco827 · ‎10-13-2015

That Alert Definition is tied to a Symptom Definition which has a cancel cycle of 1, which should mean 5 minutes. That being said, critical by default is set at greater than 50% CPU|Stress, so if you are still exceeding 50%, it will still alert.

Both are editable.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

KamalakarAnumul · ‎10-13-2015

Solution given below is correct.

savagea · ‎10-13-2015

Thank you! How do I mark these as 'answered'?

savagea · ‎10-13-2015

So let me understand... If cancelled, and the alert doesn't happen again within 5min, it shouldn't come back.

However I do have my stress criteria in the policy set to >80%, 120min peak, 30 day sample.

greco827 · ‎10-13-2015

That policy setting is different from the alert itself. The policy is more capacity related, whereas the alert indicates an immediate and precise issue.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

savagea · ‎10-13-2015

Ok, got that. But here's where I'm having trouble... My alert is based on a 'critical' symptom where CPU% > 75. But after cancelling this alert, the CPU didn't get above or near 50% and it came back again.

greco827 · ‎10-14-2015

Is the second image showing CPU | Stress, or CPU | Workload or Usage?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

greco827 · ‎10-14-2015

Can you share a screenshot from this page?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

savagea · ‎10-14-2015

Sorry for the dumb question, but how can I tell?

savagea · ‎10-14-2015

Here the screen you asked for, along with another view that shown the CPU stress from back on 9/25. My thinking is that since the stress timeframe goes back 30 days, this might be why the alarm isn't clearing?

greco827 · ‎10-14-2015

That could explain why it is recurring. Where in the policy did you set the 30 day time frame?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

savagea · ‎10-14-2015

It's set in the policy. 30days is the default setting for non-trend based analytics. But there has to be a way to get these alerts to stop showing up, if the stress was caused by some short-term issue with the server, instead of having to wait for the entire 30 days to go by.

greco827 · ‎10-14-2015

I really don't think that is the issue, but change it to 1 day and see. It's easy enough to change back.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

savagea · ‎10-14-2015

Yup, you're right. Changed to 1 day, came right back.

greco827 · ‎10-14-2015

OK, so the hunt continues!!

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

greco827 · ‎10-14-2015

Can you share this screen please?

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

savagea · ‎10-14-2015

greco827 · ‎10-14-2015

OK, try this.

1) Check the policy that is being applied to this VM to ensure you edit the right policy.

2) Edit that policy and make sure to choose the vCenter Adapter - Virtual Machine.

3) In the stress field, change CPU from Sliding Analysis Window: Any, to Entire Range

4) In the time field, change the Date Range to 1 day.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

greco827 · ‎10-14-2015

This works. My badge score was 60 earlier, but now tat it is based on the last day for the VM, it is 4.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog

All

How to reset a 'time based' alert