Montiring Hyperic Agent

gnovak · ‎04-25-2007

I'm in the process of having Hyperic monitor a group of OpenSuse machines. One of the things I am monitoring and creating alert for is the Hyperic agent that is running on these machines. Basically I want to know if the agent is down on any of these machines so it can be restarted.

I also have alert configured for things such as if the network card is down, if there is a high load on cpu of 95% for more then 5 minutes out of 5 minutes and if disk space goes beyond a specific amount.

I tested the alert I configured for the agent by shutting down the agent on one of the machines. One thing I forgot to mention was that the machine in question is actually a Xen instance or basically a virtual machine.

Once the agent was shut down, I did receive an alert that the agent was down, but shortly after received an alert that there was a high cpu on the box and that the network card was also down.

Did I receive these other alerts because Hyperic thought they were down due to the agent being down as well? Anyone have any idea why shutting down the agent would also generate these other alerts?

I checked the Xen instance and the network card wasn't down and there were only 2 processes running on the box that were barely taking up any cpu.

gnovak · ‎04-26-2007

Anyone have any ideas? Has anyone ever monitored the agent and had these results?

BradFelmey · ‎04-26-2007

The Enterprise version has something called policy-based alerting, which allows you to specify alert rules. In your case, you would create a policy that if the whole box is down, don't bother sending alerts for HTTP service, ICMP ping, disk space, or what-have-you.

gnovak · ‎04-26-2007

I don't have the enterprise version. 😞 I have the open source version. Also it's not the box that is down, it's just the HQ agent running on the box that is down. If the agent itself is down, does the server "think" that because it cannot get a status from the machine for say, disk space, cpu, or a service that it is down and trigger an alarm in hyperic?

Perhaps I am not configuring the alert properly? How would I monitor the Hyperic Agent on a machine and trigger an alert if it goes down? If the agent in fact goes down, does this also trigger other alarms to fire?

Currently I click on the machine name, click on HQ Agent under Deployed Servers Health, and then click on the Alerts tab and configure an alert for the agent that if it is less then 100% available to trigger an alarm. Is this correct?

Is anyone else monitoring the agent on the machines and if so have they had success or failure with monitoring and generating alerts?

ama_hyperic · ‎04-26-2007

I'm new here, but I have a few suggestions on things you could try.

1. You could run a script from cron that checks the process list to see if the agent is running, if not restart it.
2. You could monitor from the server itself with nmap( or something that checks specific ports ) checking to see if the agent is listening on port 2144. Fire a alert when the agent goes down but since its down you cant run a control action to restart it.

You could do both and get a alert when it goes down without having to manually restart it.

-Alex.

gnovak · ‎04-27-2007

I think your ideas are great. I would love to have cron check to see if it's running and if not to start it. That way i don't really have too much to worry about. I'll give that a try and see what happens...

All

Montiring Hyperic Agent