VMware Cloud Community
sxnxr
Commander
Commander

Triggered alerts not clearing

I have created an alert for Host network redundancy lost. this is using a clone of the OOTB fault symptom

pastedImage_0.png

and this is my alert

pastedImage_1.png

The alert triggers as expected but stays active and does not clear when the redundancy is restored. Am i doing something wrong?

The problem i have is we use these alerts to auto cut tickets in service now but as long as the alert is still active in vrops it will never trigger again and cut a new ticket

Reply
0 Kudos
10 Replies
sxnxr
Commander
Commander

I am also having the same problem with storage redundancy alerts

Reply
0 Kudos
mghall
Enthusiast
Enthusiast

Seeing the same behavior with a new alert.

We're running vROPs 6.7. The alert was created for a Windows service to check if it was running or not.

Initially it was triggered on the service, not on the server. I had to go back and modify the alert definition so it reported on the server. I've now got both conditions reporting. I've also gone back and verified that the service is running correctly and the EPOP agent is running. Starting to look at the logs now.

Reply
0 Kudos
daphnissov
Immortal
Immortal

Historically, these types of non-clearing alerts have been confirmed as bugs. I don't know if that's the case here, but if the condition is no longer true in the infrastructure being monitored and the alert isn't clearing in vROps despite correctly-configured wait and cancel cycles, I'd open an SR to get confirmation.

Reply
0 Kudos
MeImNot76
Enthusiast
Enthusiast

Hello @mghall

Could you explain briefly how you modified the alert to report on the server rather than on the service please?

Thank you!

Reply
0 Kudos
sxnxr
Commander
Commander

I have a support call open with VMware and they are doing the normal SOP to apply HF9 or go to 6.7 which is great because it takes 3h to do the upgrade to 6.7 and 1h to do HF9 so if ant alerts generate during that time they will not create indecent tickets for our NOC

PLEEEEEESE give me a no down time upgrade

Reply
0 Kudos
daphnissov
Immortal
Immortal

They're giving you a hotfix for vROps 6.6.1?

Reply
0 Kudos
sxnxr
Commander
Commander

Yep Hot Fix 9 ( i already have for a different problem to do with policies not being pushed out to all nodes in the cluster)

Reply
0 Kudos
daphnissov
Immortal
Immortal

Can you explain the contents of this hot fix, please? Just in case others come across this thread, what symptoms are you seeing that necessitated this (if different from host disconnection alerts not clearing).

Reply
0 Kudos
RickVerstegen
Expert
Expert

I am experiencing the same issues mentioned in this thread with 6.6.1. Will there be a public hotfix/patch be released for this?

I have the issues related to disk space.

Was I helpful? Give a kudo for appreciation!
Blog: https://rickverstegen84.wordpress.com/
Twitter: https://twitter.com/verstegenrick
Reply
0 Kudos
sxnxr
Commander
Commander

This is all the info i was given on HF9

This HF will address general issues like VSAN, API, License, alerts and alarms, policy

We applied it because every time you bring the cluster offline and back on again in 6.6.1 NON HF9 it takes up to 10 mins for all you custom groups to run their membership rules and add the objects to them. This was causing us problems because we have different alerting levels set in different policies so on startup all objects were a member of the default policy because the membership rules for the custom groups had not been worked out. HF9 fixes this as all objects stay in there custom group through reboot/offline a cluster.

The second reason was we were getting alarms being generated on some objects when we had them disabled in the policies. It turns out there is a bug that will cause the policy update to not be pushed out to all the nodes in the cluster. Depending on what node was evaluating the alarm trigger it could have been looking at an old policy and triggering the alarm when it should not have been

Reply
0 Kudos