VMware Cloud Community
shquser
Contributor
Contributor

Monitoring Solaris Server Availability

Hello:
We are using Hyperic version 4.5, currently we are monitoring the availability of Solaris Servers and we are seeing some servers in the network are not available for a brief period of time and generating alerts.

solaris version: 5.10

Here is the alert message:
============================================================
solaris-db has generated the following alert:
Solaris server component(s) down solaris-db Availability (0.0%)
-----------------------------------------
ALERT DETAIL
- Resource Name: solaris-db
- Alert Name: Solaris server component(s) down
- Alert Date / Time: April 22, 2011 3:11:00 AM CDT
- Triggering Condition(s):
If Availability<100.0% (actual value = 0.0%)
- Alert Severity: !!! - High

Last Indicator Metrics Collected:
[April 22, 2011 3:13:00 AM CDT] Availability = 0.0%
[April 22, 2011 3:10:00 AM CDT] Free Memory = 563.1 MB
[April 22, 2011 3:10:00 AM CDT] Load Average 5 Minutes = 0.9
[April 22, 2011 3:10:00 AM CDT] Swap Used = 7.0 GB
===========================================================

But there is no issue with the Server itself. Also checked with network team if there is any connectivity between the server where hyperic running and the other servers being monitored, and they confirmed there is no activity during that time frame.

Some one please help me to figure out
1) what is the command hyperic runs to probe the Solaris server availability, so that we can run it from the shell and verify.
2) Dose it mean the probe that was sent timed out ?
3) any other way of troubleshoot this issue.

Thanks in advance.

Message was edited by: shquser
Reply
0 Kudos
4 Replies
jvalkeal_hyperi

Usually this kind of false alerts are caused by time drifting (aka hq server time is too far away compared to time for hq agent). Check your NTP settings...
Reply
0 Kudos
shquser
Contributor
Contributor

Hello Janne,
Thanks for the response. I have verified the NTP settings on both UNIX servers hosting hq server and hq agent and the contents of file "/etc/inet/ntp.conf" are the identical.

Appreciate if you can guide me with more details on what to check and how to resolve the issue.

Thanks in advcane.
Reply
0 Kudos
jvalkeal_hyperi

Agent itself is a server type under a platform. It has as "Server Offset" telling how much time is drifting. Also administration > HQ Health > Agents tab is showing same metric.

Identical ntp.conf doesn't tell if ntp is working. You can also check different on OS level, but how HQ sees it is more important.

Reason why I think this might be an issue is that your alert was fired 3:11:00 and details is telling last availability was collected 3:13:00 (in the future). Doesn't sound right, does it?

If that's an issue then usually restarting ntp daemon takes it back to correct time. Or using command ntpdate when daemon is down.
shquser
Contributor
Contributor

That is a valid point. I'll check with our Admins to verify and restart ntp daemon.
Thanks again for the quick response.
Reply
0 Kudos