Thanks
Hi,
I have a problem with Hyperic event alerts on Availability.
I want to simply alert and email when a server becomes unavailable. I am using the standard Availability metric that Hyperic collects, and the alert I setup triggers on event that the Availability is equal to 0%. I am getting many false positives where I am alerted that availability is 0%, but when I look there are no problems on the monitored server, and even the Availability graph shows no evidence of 0% Availability.
check the time synchronisation between the server and the agent.
Hyperic is very (IMHO overly) sensitive to badly synchronised clocks, and false alerts on Availability metrics without any evidence for an outage on the server are the most frequent symptom.
Hi,
I have a problem with Hyperic event alerts on Availability.
I want to simply alert and email when a server becomes unavailable. I am using the standard Availability metric that Hyperic collects, and the alert I setup triggers on event that the Availability is equal to 0%. I am getting many false positives where I am alerted that availability is 0%, but when I look there are no problems on the monitored server, and even the Availability graph shows no evidence of 0% Availability.
check the time synchronisation between the server and the agent.
Hyperic is very (IMHO overly) sensitive to badly synchronised clocks, and false alerts on Availability metrics without any evidence for an outage on the server are the most frequent symptom.
True about clocks.
you can check that here - administration -> HQ Health -> Agents (tab) -> Time Offset (column)
also, you can enable debug logs on agent and check the error messages it throws up.
Thanks for the responses. But my Hyperic server is, and needs to be, in a different time zone than the monitored client servers. All these servers use the ntpd time sync daemon.
If the availbility metric will not work in this scenario, is there a different way of alerting when a server is not reachable? For Example; the server has crashed or the Hyperic client software has crashed or that network has gone down.
Thanks
Hi
That sounds starnge as this simple scenario works...
Please provide me with the following details:
- Which Hyperic version are you using
- Is it EE or OS?
- I understand that your Hyperic server is in time zone X while your monitored applicaitons are in time zone Y, correct?
- All are using the same NTP server?
- Which O/S is the Hyperic server installed on? and which O/S the monitored apps are?
Thanks
Yoav, Hyperic QE
Hi
if possible please provide 2 screenshots
1) the resource you are creating alert for
Image:1.jpg
2) the alert defenition page
Image:2.jpg
I attached similar screen shots from my server so you can look at what to capture
Thanks
Nimrod
Hi
if possible please provide 3 screenshots
1) the resource you are creating alert for
2) the alert defenition page
3)the triggered alert screen
I attached similar screen shots from my server so you can look at what to capture
Thanks
Nimrod
Hi Peter,
What sort of time offsets should I be looking for?
Timezones dont matter here.
I believe internally, Hyperic stores all timestamps in UTC format.
In that offset column, if you see values greater than 1 min (60000 ms), then you should be worried.
I try to keep them under 500ms. Anything greater than that is an indication that something is/or is about to go wrong.
More info can be found in official Hyperic documentation - http://pubs.vmware.com/vfabric5/index.jsp?topic=/com.vmware.vfabric.hyperic.4.6/Troubleshoot_Agent_a...
One more thing - as per my observations, windows servers (especially Win 2003) tend to have greater offsets.
Installing and configuring NTP/Win Time Service usually solves all time-sync problems.
Here is a sample of offset values you should see -
Thank you Amurty, my hyperic clients/servers were not in sync. I've fixed the problem by configuring ntp on the servers/clients.
My time offsets are much more reasonable now
Thanks Everyone,
It seems I had the time sync issue. I had not installed NTP on the actual Hyperic server, I only had it installed on the clients. All good now.