VMware Cloud Community
kallon
Contributor
Contributor
Jump to solution

Hyperic Availabillity metric alert

Hi,
I have a problem with Hyperic event alerts on Availability. 
I want to simply alert and email when a server becomes unavailable.  I am using the standard Availability metric that Hyperic collects, and the alert I setup triggers on event that the Availability is equal to 0%.  I am getting many false positives where I am alerted that availability is 0%, but when I look there are no problems on the monitored server, and even the Availability graph shows no evidence of 0% Availability. 
I have tried to configure the alert to only trigger when the 0% event occurs 20 times in 1 hour, but even that still falsely triggers the alert sometimes.  I have tried this in different environments and have the same issue everywhere.
Is there an easier way to detect when a server has gone down and/or become unavailable?

Thanks

Reply
0 Kudos
1 Solution

Accepted Solutions
admin
Immortal
Immortal
Jump to solution

Hi,

I have a problem with Hyperic event alerts on Availability. 

I want to simply alert and email when a server becomes unavailable.  I am using the standard Availability metric that Hyperic collects, and the alert I setup triggers on event that the Availability is equal to 0%.  I am getting many false positives where I am alerted that availability is 0%, but when I look there are no problems on the monitored server, and even the Availability graph shows no evidence of 0% Availability. 

check the time synchronisation between the server and the agent.

Hyperic is very (IMHO overly) sensitive to badly synchronised clocks, and false alerts on Availability metrics without any evidence for an outage on the server are the most frequent symptom.

View solution in original post

Reply
0 Kudos
10 Replies
admin
Immortal
Immortal
Jump to solution

Hi,

I have a problem with Hyperic event alerts on Availability. 

I want to simply alert and email when a server becomes unavailable.  I am using the standard Availability metric that Hyperic collects, and the alert I setup triggers on event that the Availability is equal to 0%.  I am getting many false positives where I am alerted that availability is 0%, but when I look there are no problems on the monitored server, and even the Availability graph shows no evidence of 0% Availability. 

check the time synchronisation between the server and the agent.

Hyperic is very (IMHO overly) sensitive to badly synchronised clocks, and false alerts on Availability metrics without any evidence for an outage on the server are the most frequent symptom.

Reply
0 Kudos
amurty
Enthusiast
Enthusiast
Jump to solution

True about clocks.
you can check that here - administration -> HQ Health -> Agents (tab) -> Time Offset (column)

also, you can enable debug logs on agent and check the error messages it throws up.

kallon
Contributor
Contributor
Jump to solution

Thanks for the responses.  But my Hyperic server is, and needs to be, in a different time zone than the monitored client servers.  All these servers use the ntpd time sync daemon.

If the availbility metric will not work in this scenario, is there a different way of alerting when a server is not reachable? For Example; the server has crashed or the Hyperic client software has crashed or that network has gone down.

Thanks

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

Hi

That sounds starnge as this simple scenario works...

Please provide me with the following details:

- Which Hyperic version are you using

- Is it EE or OS?

- I understand that your Hyperic server is in time zone X while your monitored applicaitons are in time zone Y, correct?

- All are using the same NTP server?

- Which O/S is the Hyperic server installed on? and which O/S the monitored apps are?

Thanks

Yoav, Hyperic QE

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

Hi

if possible please provide 2 screenshots

1) the resource you are creating alert for

Image:1.jpg

2) the alert defenition page

Image:2.jpg

I attached similar screen shots from my server so you can look at what to capture

Thanks

Nimrod

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

Hi


if possible please provide 3 screenshots

1) the resource you are creating alert for

1.jpg

2) the alert defenition page

2.jpg

3)the triggered alert screen

3.jpg

I attached similar screen shots from my server so you can look at what to capture

Thanks

Nimrod

Reply
0 Kudos
Liem87
Contributor
Contributor
Jump to solution

Hi Peter,

What sort of time offsets should I be looking for?

tiem.PNG

Reply
0 Kudos
amurty
Enthusiast
Enthusiast
Jump to solution

Timezones dont matter here.
I believe internally, Hyperic stores all timestamps in UTC format.

In that offset column, if you see values greater than 1 min (60000 ms), then you should be worried.

I try to keep them under 500ms. Anything greater than that is an indication that something is/or is about to go wrong.

More info can be found in official Hyperic documentation - http://pubs.vmware.com/vfabric5/index.jsp?topic=/com.vmware.vfabric.hyperic.4.6/Troubleshoot_Agent_a...

One more thing - as per my observations, windows servers (especially Win 2003) tend to have greater offsets.

Installing and configuring NTP/Win Time Service usually solves all time-sync problems.

Here is a sample of offset values you should see -

Capture.JPG

Liem87
Contributor
Contributor
Jump to solution

Thank you Amurty, my hyperic clients/servers were not in sync. I've fixed the problem by configuring ntp on the servers/clients.

My time offsets are much more reasonable now

reasonabletime.PNG

Reply
0 Kudos
kallon
Contributor
Contributor
Jump to solution

Thanks Everyone,

It seems I had the time sync issue.  I had not installed NTP on the actual Hyperic server, I only had it installed on the clients.  All good now.

Reply
0 Kudos