VMware Cloud Community
savagea
Enthusiast
Enthusiast

How to reset a 'time based' alert

I have a couple servers that are alerting for 'Virtual machine has chronic CRITICAL CPU workload leading to CPU stress', which isn't an event-based alert but rather uses data over the past 30 days. 

Both servers are actually right-sized, but only alarming because of one application issue a week or so ago that caused the CPU's to spike for a while. 

Do I have to wait for the 30 day cycle to end before these alerts go away, or is there a way to 'reset' them on these servers?

I can't change the 30 day time setting, and cancelling these alerts only brings them right back.

30 Replies
savagea
Enthusiast
Enthusiast

But when you change it back, I'm sure it'll return, correct?

0 Kudos
greco827
Expert
Expert

Mine has not generated another alert yet.  I left it set to Entire Range, but changed the Date Range back to 30 days.  I'll give it a bit of time and check again after lunch.  Even if we found a way to clear it, I'm not sure I totally understand the trigger just yet.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

I agree, not totally understanding it.  Also not totally understanding the difference between "entire range" and "any".

0 Kudos
greco827
Expert
Expert

Entire Range points back to the settings you give under the time field.  Any refers to the 60 (or in your case 120) minute period you set in the Stress field.  What I'm trying to find out definitively is whether or not that time period when set to Any, means an average of >70% of a period of time, or how that is calculated.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

Yes, that would be very interesting to know.  It's not very clear.  I really appreciate your help on this.

0 Kudos
savagea
Enthusiast
Enthusiast

Here you go, from vmware documentation. 

So if you set to 'Entire Range', it will look at the % of time the CPU is in it's stress zone throughout the entire 30 days, or whatever you set it too. 

Set to 'Any' and this is what works in conjuction with the 'peak' setting.  So it's looking to see if the cpu is stressed in ANY 120min time window... I'm just not sure how far back it goes. 

1.png

0 Kudos
greco827
Expert
Expert

I actually saw that, but something still didn't add up.  You have it set to 120 minutes, which should mean that when you had your spike, 120 minutes later the alert should have cleared ... at least that's how I read it.

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

The way I'm reading it, if the stress exceeds 80% in any 120min time window throughout the past 30 days, it'll alarm.  That's an extremely tight setting... so basically it would have to go 30 days without stress exceeding 80% in any 120min window.  It's sinking in now, but I'm struggling with what's a best practice setting to go with.  It's a complicated one for sure.

0 Kudos
greco827
Expert
Expert

I reached out to some buddies at VMware to see if they could get a better explanation from the internal dev team.  Stay tuned!

If you find this or any other answer useful please mark the answer as correct or helpful https://communities.vmware.com/people/greco827/blog
0 Kudos
savagea
Enthusiast
Enthusiast

Great, thank you! 

0 Kudos
savagea
Enthusiast
Enthusiast

Also if they could provide a best practice on HOW to configure it.