14 Replies Latest reply: Nov 14, 2008 3:40 PM by Linux ETC RSS

[4.0 Beta] Alerts not being sent out

Linux ETC Hot Shot
Currently Being Moderated
We have noticed that a few alerts have not been sent out nor noticed by the Hyperic server itself.  This has become a recent issue within the last day or so since we have been receiving alerts via email (multiple addresses too) prior to this.

Using build #873.  Please advise what additional information would be needed to take a look at this further.

Thanks in advance.

Linux ETC
  • 1. Re: [4.0 Beta] Alerts not being sent out
    Expert
    Currently Being Moderated
    LinuxETC,

    Are the same alerts or resources misbehaving? Or multiple? Did they fail within a block of time? Or throughout?

    --jeremy
  • 2. Re: [4.0 Beta] Alerts not being sent out
    Linux ETC Hot Shot
    Currently Being Moderated
    > Are the same alerts or resources misbehaving? Or
    > multiple? Did they fail within a block of time? Or
    > throughout?
    >
    Jeremy:

    Various resources, various times to be honest.  It seems (knock on wood) that things are working again as well, but at the same time, we are monitoring things a bit more than usual since every now and then something will go down, but not send out the email alert for the resource at issue.

    LinuxETC
  • 3. Re: [4.0 Beta] Alerts not being sent out
    Linux ETC Hot Shot
    Currently Being Moderated
    An additional note.

    When reviewing the Alerts from Analyze -> Alert Center I noticed there were a few recent ones noted.  However, when you go to that particular resource to "clear" or Fix the alert, it is not showing up there on the list (Alert section -> Alert Resource).  This is separate from using the Dashboard with the more recent alerts and just "clearing them out" from there (under Dashboard -> Recent Alerts).

    Yes, odd indeed.

    LinuxETC
  • 4. Re: [4.0 Beta] Alerts not being sent out
    staceyeschneider Expert VMware Employees
    Currently Being Moderated
    Hi LinuxETC,

    This is strange. The Dashboard doesn't show all types of alerts - usually only ones for !! and higher, as well as last day or something - this is configurable.

    But not seeing it as an alert off the resource itself is a problem. Can you confirm you are drilling into the resource, on the alerts tab and then the alert button is clicked (vs the configure)?

    Thanks,
    -Stacey
  • 5. Re: [4.0 Beta] Alerts not being sent out
    Linux ETC Hot Shot
    Currently Being Moderated
    That is an affirmative.

    We are doing just "Availability" of the systems as a whole.  Some will alert, some will not.  The criteria is "Availability<95.0%" and "Alert One time until fixed" for the specifics.  The email contacts are correct (otherwise we would not be getting the ones that do alert ;-) ) so that is ruled out.

    Currently the "test case" we have is showing Availability of ~50.0% and no Alert with the above criteria has fired off.  Yes, we did a "Waldo check" as well seeing that the Alert is Active.  The other resources on the same system all show green and full stats (e.g. CPU usage, hard drive stats, Linux daemons like ssh, etc.).

    The Beta 4.0 versions in play here presently are as follows to help assist with that aspect:
    - Server is 4.0.0-893-x86-linux
    - Client/Agent ("test" from above) is 4.0.0-893-x86 (does not alert properly)
    - Client/Agent that does work is 4.0.0-876 (64-bit version, but system is down as of the writing of this reply so I cannot give the full specifics).
    - Client/Agent that does work is 4.0.0-876 (32-bit version)

    Thoughts are welcomed here.  TIA and HTH.

    LinuxETC
  • 6. Re: [4.0 Beta] Alerts not being sent out
    staceyeschneider Expert VMware Employees
    Currently Being Moderated
    Hmmm. I think I am confused actually.

    What I am reading is that you are putting availability alerts out there, and when you are having alert conditions met, alerts aren't firing. Not that alerts are showing up in the Alert Center, but not in the resource. Is this correct? If it is...

    There is likely two problems here - one, availability is "flapping", as it should always be 1 or 0 (unless this is on a group). That is fixed in the next release.

    That said, with this false availability reporting, you should be getting a lot of false alerts, not missing them.

    Are you using EE? If so, is there a recovery alert attached to it? Do you have it set to suppress itself until a recovery alert fires?
    Is there a duration to the trigger? alerts <95% for more than 10 mins?
    Do you have an escalation scheme?
  • 7. Re: [4.0 Beta] Alerts not being sent out
    Linux ETC Hot Shot
    Currently Being Moderated
    > What I am reading is that you are putting
    > availability alerts out there, and when you are
    > having alert conditions met, alerts aren't firing.
    > Not that alerts are showing up in the Alert Center,
    > but not in the resource. Is this correct? If it
    > is...
    >

    Correct. Alerts are not firing when conditions are meet, in this case "Availability < 95.0%" of a system being monitored. The Alerts are not seen on the Dashboard, not sent via email, but it is clear the system's Availability per the Agent is < 95.0%

    > There is likely two problems here - one, availability
    > is "flapping", as it should always be 1 or 0 (unless
    > this is on a group). That is fixed in the next
    > release.
    >
    > That said, with this false availability reporting,
    > you should be getting a lot of false alerts, not
    > missing them.
    >

    Reverse. More the Alerts are not coming through versus "false positives".

    I concur on the possibility of Flapping as well, hence why we put things up at 95.0% and even tried for lower percentages (e.g., 70.0% to 50.0%) for the Alert to see if that was the case. If so, we would be getting Alerts firing as "false positives" as you suggested, but that apparently is not the case here.

    > Are you using EE? If so, is there a recovery alert
    > attached to it? Do you have it set to suppress itself
    > until a recovery alert fires?
    > Is there a duration to the trigger? alerts <95% for
    > more than 10 mins?
    > Do you have an escalation scheme?

    I am not sure what you mean by "EE" above, so I will have to defer on answering that for now.

    The Enabled Action is the "Each time conditions are met" radio button and "Generate one alert and then disable alert definition until fixed" check box at this time.

    No escalation scheme in place at this time to answer that part.

    HTH.

    Linux ETC
  • 8. Re: [4.0 Beta] Alerts not being sent out
    Expert
    Currently Being Moderated
    LinuxETC,

    Let's try and narrow it down. First uncheck the "Generate one alert and then disable alert definition until fixed" filter and see if alerts fire when you see the agent reporting <95%.

    If that doesn't work, try also setting the Availability to <100% and see if we can force an alert.

    It's also possible that a time offset exists betwen the server and agent.

    --jeremy
  • 9. Re: [4.0 Beta] Alerts not being sent out
    staceyeschneider Expert VMware Employees
    Currently Being Moderated
    Hi LinuxETC,

    Sorry - using internal acronymns... EE is Enterprise Edition, also sometimes now called HQE. This version opens up a couple different possibilities for alert options which is what I am trying to triangulate in on.

    Jeremy's suggestion is good though, the one alert/disable could be our problem. It'd be good to check the server log to see what it says when the alerts are supposed to be firing.

    Thanks,
    -Stacey
  • 10. Re: [4.0 Beta] Alerts not being sent out
    Linux ETC Hot Shot
    Currently Being Moderated
    > Hi LinuxETC,
    >
    > Sorry - using internal acronymns... EE is Enterprise
    > Edition, also sometimes now called HQE. This version
    > opens up a couple different possibilities for alert
    > options which is what I am trying to triangulate in
    > on.
    >

    Ah, then no. Just the Community or non-EE version.

    > Jeremy's suggestion is good though, the one
    > alert/disable could be our problem. It'd be good to
    > check the server log to see what it says when the
    > alerts are supposed to be firing.
    >

    Just changed it, and will post something after a few hours (thinking 4-6 hours) of having the Availability<100% shortly for the two in question, unless the Alerts start firing away of course.

    LinuxETC
  • 11. Re: [4.0 Beta] Alerts not being sent out
    Linux ETC Hot Shot
    Currently Being Moderated
    > > Jeremy's suggestion is good though, the one
    > > alert/disable could be our problem. It'd be good to
    > > check the server log to see what it says when the
    > > alerts are supposed to be firing.
    > >
    >
    > Just changed it, and will post something after a few
    > hours (thinking 4-6 hours) of having the Availability
    ><100% shortly for the two in question, unless the
    > Alerts start firing away of course.
    >
    > LinuxETC

    Ok, here is the results so far.  After a 24 hour period it seems the Alerts for (Availability<100%) are firing off of the system in question every 30 minutes.  This is with the "Enabled Action: Each time conditions are met" and for each instance (the check box for "Generate one alert and then disable alert definition until fixed" being NOT marked).

    So logic and deductive reasoning in mind here, this Alert should go off every minute or whatever the time frame for checking this Alert would be versus every 30 minutes.  The system is remotely located, so I can see flapping being a potential issue, but not for a 30 minute window delay unless this is the Availability Alert's default setting perhaps.  Perhaps I am thinking out loud and too early in the morning too. ;-)

    As for posting the server log file if you would narrow down what part of the log associated with the Alert is needed (since it is rather long) that would be appreciated.  I can also do it as an Attached File as a thought, but I figured I would see what is best suited and appropriate here.

    TIA.

    LinuxETC
  • 12. Re: [4.0 Beta] Alerts not being sent out
    staceyeschneider Expert VMware Employees
    Currently Being Moderated
    Hi LinuxETC,

    I suspect this is the flapping issue. We just pushed a new build this morning for GA - can you upgrade and let us knows if that fixes it?

    Thanks,
    -Stacey
  • 13. Re: [4.0 Beta] Alerts not being sent out
    Linux ETC Hot Shot
    Currently Being Moderated
    > Hi LinuxETC,
    >
    > I suspect this is the flapping issue. We just pushed
    > a new build this morning for GA - can you upgrade and
    > let us knows if that fixes it?
    >
    > Thanks,
    > -Stacey

    Stacey:

    I will see what we can do and go from there.  Thanks for the update on the release as well.

    LinuxETC
  • 14. Re: [4.0 Beta] Alerts not being sent out
    Linux ETC Hot Shot
    Currently Being Moderated
    Stacey and company.

    We upgraded everything (Server and Agent) as suggested with the 4.0.1-905 (Community Edition).  After watching for 2+ days, it seems that flapping was the culprit here for the oddities with the two systems in question.

    Thanks for the assistance and pointers.

    LinuxETC

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 6 points

Share This Page