10 Replies Latest reply on Aug 8, 2018 3:42 AM by sxnxr

    Triggered alerts not clearing

    sxnxr Expert

      I have created an alert for Host network redundancy lost. this is using a clone of the OOTB fault symptom

       

      and this is my alert

       

       

      The alert triggers as expected but stays active and does not clear when the redundancy is restored. Am i doing something wrong?

       

      The problem i have is we use these alerts to auto cut tickets in service now but as long as the alert is still active in vrops it will never trigger again and cut a new ticket

        • 1. Re: Triggered alerts not clearing
          sxnxr Expert

          I am also having the same problem with storage redundancy alerts

          • 2. Re: Triggered alerts not clearing
            mghall Novice

            Seeing the same behavior with a new alert.

             

            We're running vROPs 6.7. The alert was created for a Windows service to check if it was running or not.

             

            Initially it was triggered on the service, not on the server. I had to go back and modify the alert definition so it reported on the server. I've now got both conditions reporting. I've also gone back and verified that the service is running correctly and the EPOP agent is running. Starting to look at the logs now.

            • 3. Re: Triggered alerts not clearing
              daphnissov Champion
              vExpertCommunity Warriors

              Historically, these types of non-clearing alerts have been confirmed as bugs. I don't know if that's the case here, but if the condition is no longer true in the infrastructure being monitored and the alert isn't clearing in vROps despite correctly-configured wait and cancel cycles, I'd open an SR to get confirmation.

              • 4. Re: Triggered alerts not clearing
                MeImNot76 Novice

                Hello @mghall

                 

                Could you explain briefly how you modified the alert to report on the server rather than on the service please?

                 

                Thank you!

                • 5. Re: Triggered alerts not clearing
                  sxnxr Expert

                  I have a support call open with VMware and they are doing the normal SOP to apply HF9 or go to 6.7 which is great because it takes 3h to do the upgrade to 6.7 and 1h to do HF9 so if ant alerts generate during that time they will not create indecent tickets for our NOC

                   

                  PLEEEEEESE give me a no down time upgrade

                  • 6. Re: Triggered alerts not clearing
                    daphnissov Champion
                    vExpertCommunity Warriors

                    They're giving you a hotfix for vROps 6.6.1?

                    • 7. Re: Triggered alerts not clearing
                      sxnxr Expert

                      Yep Hot Fix 9 ( i already have for a different problem to do with policies not being pushed out to all nodes in the cluster)

                      • 8. Re: Triggered alerts not clearing
                        daphnissov Champion
                        Community WarriorsvExpert

                        Can you explain the contents of this hot fix, please? Just in case others come across this thread, what symptoms are you seeing that necessitated this (if different from host disconnection alerts not clearing).

                        • 9. Re: Triggered alerts not clearing
                          RickVerstegen Expert

                          I am experiencing the same issues mentioned in this thread with 6.6.1. Will there be a public hotfix/patch be released for this?

                           

                          I have the issues related to disk space.

                          • 10. Re: Triggered alerts not clearing
                            sxnxr Expert

                            This is all the info i was given on HF9

                             

                            This HF will address general issues like VSAN, API, License, alerts and alarms, policy

                             

                            We applied it because every time you bring the cluster offline and back on again in 6.6.1 NON HF9 it takes up to 10 mins for all you custom groups to run their membership rules and add the objects to them. This was causing us problems because we have different alerting levels set in different policies so on startup all objects were a member of the default policy because the membership rules for the custom groups had not been worked out. HF9 fixes this as all objects stay in there custom group through reboot/offline a cluster.

                             

                            The second reason was we were getting alarms being generated on some objects when we had them disabled in the policies. It turns out there is a bug that will cause the policy update to not be pushed out to all the nodes in the cluster. Depending on what node was evaluating the alarm trigger it could have been looking at an old policy and triggering the alarm when it should not have been