Server not collecting stats for any new platforms or servers
I've hit a problem where any new platforms added in, or indeed and servers or services added to existing platforms, do not collect any statistics and remaing in a '?' state or in some cases go '!'. Everything already added continue to monitor and collect stats ok. If we delete an agent already monitoring and re-add, this also stops collecting stats then.
To see if we'd reached some limit on the number of agents, I removed a number of non-essential platforms (about 15) but this made no impact.
I've included an extract from the Health screen for information.
To rule out any agent or comms issues, we've installed a 2nd Hyperic server instance and tried to connect a server to that, which works fine.
Any assistance anyone can offer would be appreciated!
a few notes - Your HQ version is a bit outdated. Maybe you should consider an update to the latest stable 4.2 release - HQ is still running with the default JVM settings I believe. Have a look at your hq-server.conf file and change it to the following values #server.java.opts=-XX:MaxPermSize=192m -Xmx512m -Xms512m server.java.opts=-XX:MaxPermSize=192m -Xmx1024m -Xms1024m - For further information please check http://support.hyperic.com/display/DOC/Configuring+HQ+for+Large+Environments+and+Improved+Performanc... - I have also noticed a high number of time offsets for some of your Agents. You might consider setup NTP on your Agents. - Also running HQ server on Windows is not recommended as I understand. If you can, consider a migration to a Unix or Linux box.
Thanks Mirko - I'm going to try those adjustments, and if those fail upgrade to the latest 4.2 release, and if needs be start afresh with a clean database. I'll let you know how things go.
Everything should be running NTP although there are two seperate domains being monitored so not everything is using the same NTP source.
One thing we did notice is a large number of errors in the database log: STATEMENT: INSERT INTO HQ_METRIC_DATA_8D_0S (measurement_id, timestamp, value) VALUES (87619, 1279753500000, 0.00021) ERROR: duplicate key violates unique constraint "hq_metric_data_8d_0s_pkey"
This has generated several hundred MB of log files yesterday looking like that (or initially the 7d_1s table).
We tried truncating all the hq_nmetric_data tables and the eam_measurement_data tables to remove all the old metric data to try and clear out the database but that didn't have any impact - the duplicate key errors start reappearing almost immediately when we restart the Hyperic server. We have done some tuning of the database server following the suggestions on the Hyperic documentation and elsewhere.