I've reported earlier that I managed to registred the 2000+ services from the single WLS domain (5 WLS servers) I'm using for this evaluation by increasing considerably the transaction timeout. All of them are hosted on the same platform where one HQ agent is running.
I'm experiencing performance problem that I cannot explain. First the HQ server process consumes between 50 and 100% of the cpu. Second the log file of the agent displays error messages like this;
2007-10-11 13:10:36,986 INFO [SenderThread] Agent measurements no longer backlo
gged
2007-10-11 13:15:59,785 WARN [ConfigPopulateThread] Unable to get entities for
agent: IO error: java.net.SocketTimeoutException: Read timed out
2007-10-11 13:15:59,785 WARN [ConfigPopulateThread] Sleeping for 160 seconds to
fetch entities
2007-10-11 13:20:41,508 WARN [SenderThread] The Agent is having a hard time kee
ping up with the frequency of metrics taken. Consider increasing your collectio
n interval.
2007-10-11 13:20:54,559 INFO [SenderThread] Agent measurements no longer backlo
gged
2007-10-11 13:26:19,805 WARN [ConfigPopulateThread] Unable to get entities for
agent: IO error: java.net.SocketTimeoutException: Read timed out
2007-10-11 13:26:19,805 WARN [ConfigPopulateThread] Sleeping for 320 seconds to
fetch entities
2007-10-11 13:30:59,467 WARN [SenderThread] The Agent is having a hard time kee
ping up with the frequency of metrics taken. Consider increasing your collectio
n interval.
2007-10-11 13:31:12,396 INFO [SenderThread] Agent measurements no longer backlo
gged
I guess that the HQ server is not able to keep up with the measurments send by the agent. SInce that number of weblogic JMX components is typical of our various environments hence I'm suspicious about the behaviour of the HQ server when we will registred one of our production environment. We are mainly interested in JMX monitoring, Weblogic JMX in particular.
I suspect some tuning problem but I cannot put my finger on it. Some random thread dumps show mainly hibernate activity but the Oracle DB instance does not display abnormal activity.
Any idea ?