roadrunner99
Contributor
Contributor

HQ Agent 4 dosn't sent data anymore

After i installed HQ Agent 4 (4.0.1-905) on a solaris machine and set it up, my server received data from that agent.

But since a certain amount of time the agent is not any more able to send data, i get the following error message:

2008-12-17 15:02:22,274 ERROR [SenderThread] [SenderThread]
java.lang.NoClassDefFoundError
at org.hyperic.lather.client.LatherHTTPClient.invoke(LatherHTTPClient.java:99)
at org.hyperic.hq.bizapp.client.AgentCallbackClient.invokeLatherCall(AgentCallbackClient.java:157)
at org.hyperic.hq.bizapp.client.MeasurementCallbackClient.measurementSendReport(MeasurementCallbackClient.java:62)
at org.hyperic.hq.measurement.agent.server.SenderThread.sendBatch(SenderThread.java:418)
at org.hyperic.hq.measurement.agent.server.SenderThread.run(SenderThread.java:576)
at java.lang.Thread.run(Unknown Source)

My Platform is Solaris 10, i use version 4.0.1-905 of the agent on sparc.
What i checked so far:

Port and IP Adress of Server are reachable, i restarted the service and tried some different java. No success.

Can someone give hints how to resolve the problem?

Kind Regards
Roadrunner
0 Kudos
15 Replies
roadrunner99
Contributor
Contributor

ok, i tested the same on another machine and with another version 3.2.6.
i also tested different ip-adresse (machine has 2 network cards).

I alway get the same error. What does this error mean? Because the connection to the server is ok (telnet <servername> 7080).

I tested now for serveral days and dont come to a conclusion nor find something on the net.

Any suggestions would be welcome
Thx
Roadrunner
0 Kudos

If I got this right, class used in line 99 on LatherHTTPClient is missing. If I'm right, class is org.apache.commons.httpclient.HttpClient which should come from jar file /bundles/agent-4.0.0-EE-866/pdk/lib/commons-httpclient-3.1.jar. (you have little different path)

Maybe there's something wrong with the agent classpath. You could post more lines from agent.log file. Maybe there's some errors when agent is started.

Was this platform sparc or x86?
excowboy
Virtuoso
Virtuoso

Hi,

sending some more lines from your agent.log would be fine. You could also increase the debug level.
Set the parameter agent.logLevel=DEBUG in your agent.properties and restart your HQ Agent.

Are you using the shipped JRE from the package hyperic-hq-installer-4.0.1-905-sparc-solaris.tgz or another JRE ?

Cheers,
Mirko
roadrunner99
Contributor
Contributor

Thanks for your valuable answers.

We have a sparc plattform.

First of all i checked, if the httpclient.jar is present. Yes it is.

As JDK i use the shipped with Websphere (export HQ_JAVA_HOME=/opt/IBM/WebSphere/AppServer from Fixpack 21)
I also tried with the shipped JDK (i assume when i dont give the HQ_JAVA_HOME he will use the built in). I also filled the variable with some crap stuff to see what happens, he gave an error.
So i set the /bundles/agen????/pdk/lib to the classpath.

But with the option DEBUG i found something interesting. It seems to me that he thinks, that the agent wants to communicate with 127.0.0.1 (localhost) , might this be the problem? (or is this only a listen port?) But when i ran setup i gave him the server adress.

See the complete Logexerpt from the starting to the Error:

2008-12-17 16:13:51,210 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:13:51,275 INFO [Thread-0] [ProductPluginManager] Loading plugin: xen-plugin.jar
2008-12-17 16:13:52,220 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:13:52,759 INFO [Thread-0] [ProductPluginManager] Loading plugin: zimbra-plugin.jar
2008-12-17 16:13:53,229 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:13:54,202 INFO [Thread-0] [AgentDaemon] Product Plugin Manager initalized
2008-12-17 16:13:54,214 INFO [Thread-0] [AgentCommandsServer] Registering Agent Commands Service with Agent Transport
2008-12-17 16:13:54,215 INFO [Thread-0] [AgentCommandsServer] Agent commands started up
2008-12-17 16:13:54,239 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:13:54,271 INFO [Thread-0] [AutoinventoryCommandsServer] Registering AI Commands Service with Agent Transport
2008-12-17 16:13:54,272 INFO [Thread-0] [AutoinventoryCommandsServer] Autoinventory Commands Server started up
2008-12-17 16:13:55,249 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:13:55,295 ERROR [Thread-2] [AutoinventoryCommandsServer] Unable to send autoinventory platform data to server, sleeping for 15 secs bef
ore retrying. Error: Unable to communicate with server -- provider not yet setup
2008-12-17 16:13:56,215 WARN [Thread-0] [CommandsServer] Agent certificate not found -- generating a new one
2008-12-17 16:13:56,254 INFO [Thread-0] [CommandsServer] Commands Server started up
2008-12-17 16:13:56,259 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:13:56,272 INFO [Thread-0] [ControlCommandsServer] Registering Control Commands Service with Agent Transport
2008-12-17 16:13:56,272 INFO [Thread-0] [ControlCommandsServer] Control Commands Server started up
2008-12-17 16:13:56,313 INFO [Thread-0] [SenderThread] Maximum metric batch size set to 500
2008-12-17 16:13:56,338 INFO [Thread-0] [TrackerThread] Event report batch size set to 100
2008-12-17 16:13:56,359 INFO [Thread-0] [MeasurementCommandsServer] Registering Measurement Commands Service with Agent Transport
2008-12-17 16:13:56,361 INFO [Thread-0] [MeasurementCommandsServer] Measurement Commands Server started up
2008-12-17 16:13:56,371 INFO [Thread-0] [LiveDataCommandsServer] Registering Live Data Commands Service with Agent Transport
2008-12-17 16:13:56,371 INFO [Thread-0] [LiveDataCommandsServer] Live Data Commands Server started up
2008-12-17 16:13:56,372 INFO [Thread-0] [AgentTransportLifecycleImpl] Agent is not using new transport.
2008-12-17 16:13:56,554 INFO [Thread-0] [AgentDaemon] Agent started successfully
2008-12-17 16:13:56,555 INFO [WrapperStartStopAppMain] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:13:56,641 INFO [WrapperStartStopAppMain] [AgentConnection] 127.0.0.1:2144 -> bizapp:getCAMServer
2008-12-17 16:13:57,269 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:13:57,323 ERROR [Thread-0] [SSLConnectionListener] Rejecting client from /127.0.0.1: Passed an invalid auth token (1229526734776-685910
2251947936104-3224293828832927794)
2008-12-17 16:13:57,335 ERROR [Thread-0] [CommandListener] Failed handling new connection
org.hyperic.hq.agent.AgentConnectionException: Client from /127.0.0.1 unauthorized
at org.hyperic.hq.bizapp.agent.server.SSLConnectionListener.handleNewConn(SSLConnectionListener.java:122)
at org.hyperic.hq.bizapp.agent.server.SSLConnectionListener.getNewConnection(SSLConnectionListener.java:159)
at org.hyperic.hq.agent.server.CommandListener.listenLoop(CommandListener.java:157)
at org.hyperic.hq.agent.server.AgentDaemon.start(AgentDaemon.java:843)
at org.hyperic.hq.agent.server.AgentDaemon$RunnableAgent.run(AgentDaemon.java:925)
at java.lang.Thread.run(Unknown Source)
2008-12-17 16:14:10,298 ERROR [Thread-2] [AutoinventoryCommandsServer] Unable to send autoinventory platform data to server, sleeping for 22 secs bef
ore retrying. Error: Unable to communicate with server -- provider not yet setup
2008-12-17 16:14:32,807 ERROR [Thread-2] [AutoinventoryCommandsServer] Unable to send autoinventory platform data to server, sleeping for 33 secs bef
ore retrying. Error: Unable to communicate with server -- provider not yet setup
2008-12-17 16:15:06,565 ERROR [Thread-2] [AutoinventoryCommandsServer] Unable to send autoinventory platform data to server, sleeping for 50 secs bef
ore retrying. Error: Unable to communicate with server -- provider not yet setup
2008-12-17 16:15:52,379 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping2008-12-17 16:15:57,191 ERROR [Thread-2] [AutoinventoryCommandsServer] Unable to send autoinventory platform data to server, sleeping for 75 secs bef
ore retrying. Error: Unable to communicate with server -- provider not yet setup
2008-12-17 16:16:26,433 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:getCAMServer
2008-12-17 16:16:26,478 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:createToken
2008-12-17 16:16:29,773 INFO [Thread-0] [SSLConnectionListener] Locking auth token
2008-12-17 16:16:29,985 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:setCAMServer
2008-12-17 16:16:30,032 INFO [Thread-0] [CommandsServer] Setting the HQ server to: http://10.249.180.44:7080/jboss-lather/JBossLather
2008-12-17 16:16:30,035 INFO [Thread-0] [AgentTransportLifecycleImpl] Stopping agent transport.
2008-12-17 16:16:30,043 INFO [Thread-5] [ConfigPopulateThread] Starting config populate thread
2008-12-17 16:16:30,046 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:getCAMServer
2008-12-17 16:16:30,113 ERROR [Thread-2] [AutoinventoryCommandsServer] Error starting scanner: java.lang.NoClassDefFoundError
java.lang.NoClassDefFoundError
at org.hyperic.lather.client.LatherHTTPClient.invoke(LatherHTTPClient.java:99)
at org.hyperic.hq.bizapp.client.AgentCallbackClient.invokeLatherCall(AgentCallbackClient.java:157)
at org.hyperic.hq.bizapp.client.AutoinventoryCallbackClient.aiSendReport(AutoinventoryCallbackClient.java:52)
at org.hyperic.hq.autoinventory.agent.server.AutoinventoryCommandsServer.scanComplete(AutoinventoryCommandsServer.java:331)
at org.hyperic.hq.autoinventory.ScanManager.scanComplete(ScanManager.java:308)
at org.hyperic.hq.autoinventory.Scanner.notifyScanComplete(Scanner.java:273)
at org.hyperic.hq.autoinventory.Scanner.start(Scanner.java:211)
at org.hyperic.hq.autoinventory.ScanManager.mainRunLoop(ScanManager.java:141)
at org.hyperic.hq.autoinventory.ScanManager.access$000(ScanManager.java:41)
at org.hyperic.hq.autoinventory.ScanManager$1.run(ScanManager.java:107)
2008-12-17 16:20:18,366 INFO [WrapperListener_stop_runner] [AgentConnection] 127.0.0.1:2144 -> agent:die
2008-12-17 16:20:18,415 INFO [WrapperListener_stop_runner] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:20:19,474 INFO [WrapperListener_stop_runner] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:20:20,534 INFO [WrapperListener_stop_runner] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-17 16:20:20,535 INFO [Thread-0] [AgentCommandsServer] Agent commands shut down
2008-12-17 16:20:20,535 INFO [Thread-0] [AutoinventoryCommandsServer] Autoinventory Commands Server shutting down
2008-12-17 16:20:21,044 INFO [Thread-0] [AutoinventoryCommandsServer] Autoinventory Commands Server shut down
2008-12-17 16:20:21,045 INFO [Thread-0] [CommandsServer] Commands Server shut down
2008-12-17 16:20:21,045 INFO [Thread-0] [ControlCommandsServer] Control Commands Server shut down
2008-12-17 16:20:21,045 INFO [Thread-0] [MeasurementCommandsServer] Measurement Commands Server shutting down
2008-12-17 16:20:21,045 INFO [SenderThread] [SenderThread] Measurement sender interrupted
2008-12-17 16:20:21,046 INFO [Thread-0] [MeasurementCommandsServer] Measurement Commands Server shut down
2008-12-17 16:20:21,058 INFO [Thread-0] [AgentDaemon] Agent shut down
0 Kudos

Was this totally new and fresh agent installation? Or was this agent uppgrade?

It seems that token is wrong. If that's the case try to run agent with setup option.
0 Kudos
roadrunner99
Contributor
Contributor

all installations fail.

i started on a fresh environment to install it, after agent setup he connects once to the server, i can put the agent to the ressources list and then its done.

i also ran the setup so many times where he recreates the token, with no success.

someone told me to try the former agent version 3.2.6. But same behavior......

at the end i think i have tried everything....but he will not connect.

by the way i also checked again the server and the connection to the agent. this should work.

my only open question remains, why he mentions that he will connect to 127.0.0.1:2114.....this shouldn't be. isnt it?
0 Kudos

Those lines related to localhost ip are ok.

There is one thing what I would like you to check. I administer mostly solaris machines. Sometimes this setup process which should happen when plugin is first started, ends up in errors or jams. I haven't never pay that much attention it because I usually don't run manual setup.

So this is what I would like you to try:

1. Destroy agent.
2. Unpack it again(now you have clean bundle)
3. locate conf/agent.properties file and configure it manually

change these values what you use and insert it to agent.properties. If you don't use ssl comment camSSLPort out and uncomment camPort and change camSecure to 'no'. Also select whether you want to use uni- or bi-directional communication(yes or no).
--clip--
agent.setup.camIP=your.server.com
#agent.setup.camPort=7080
agent.setup.camSSLPort=443
agent.setup.camSecure=yes
agent.setup.camLogin=your_login_username
agent.setup.camPword=your_login_password
agent.setup.agentIP=*default*
agent.setup.agentPort=*default*
#agent.setup.resetupTokens=no

##
## enables unidirectional communications between HQ Agent
## and HQ Server in HQ Enterprise Edition
##
agent.setup.unidirectional=yes
--clip--

4. Start agent
----

Before you do all this, you could see if that setup process made all of these needed parameters available. Purpose of this is to start clean agent without setup process.

Let us know what happens... This is a long shot, though.
0 Kudos
roadrunner99
Contributor
Contributor

I did it like you suggested:

Unfortunately it wasnt working. See Logfile exerpt at the end. I had to use port 7433 for secure communication, but i tested, the communication should be ok.

But I found out something interessting. We are using solaris zones. So when i install a agent in global zone every thing works fine. In Solaris Zones no metrics are sent to server.
Has someone experience with this. Is the operating system in a zone somewhat different?


2008-12-18 14:41:52,298 ERROR [Thread-0] [CommandListener] Failed handling new connection
org.hyperic.hq.agent.AgentConnectionException: Client from /127.0.0.1 unauthorized
at org.hyperic.hq.bizapp.agent.server.SSLConnectionListener.handleNewConn(SSLConnectionListener.java:122)
at org.hyperic.hq.bizapp.agent.server.SSLConnectionListener.getNewConnection(SSLConnectionListener.java:159)
at org.hyperic.hq.agent.server.CommandListener.listenLoop(CommandListener.java:157)
at org.hyperic.hq.agent.server.AgentDaemon.start(AgentDaemon.java:843)
at org.hyperic.hq.agent.server.AgentDaemon$RunnableAgent.run(AgentDaemon.java:925)
at java.lang.Thread.run(Thread.java:595)
2008-12-18 14:41:52,837 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:getCAMServer
2008-12-18 14:41:53,329 INFO [main] [AgentConnection] 127.0.0.1:2144 -> agent:ping
2008-12-18 14:41:54,378 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:getCAMServer
2008-12-18 14:41:54,863 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:createToken
2008-12-18 14:41:56,030 INFO [Thread-0] [SSLConnectionListener] Locking auth token
2008-12-18 14:41:56,271 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:setCAMServer
2008-12-18 14:41:56,759 INFO [Thread-0] [CommandsServer] Setting the HQ server to: https://10.249.180.44:7443/jboss-lather/JBossLather
2008-12-18 14:41:56,780 INFO [Thread-0] [AgentTransportLifecycleImpl] Stopping agent transport.
2008-12-18 14:41:56,787 INFO [Thread-5] [ConfigPopulateThread] Starting config populate thread
2008-12-18 14:41:56,791 INFO [main] [AgentConnection] 127.0.0.1:2144 -> bizapp:getCAMServer
2008-12-18 14:41:57,004 ERROR [Thread-2] [AutoinventoryCommandsServer] Error starting scanner: java.lang.NoClassDefFoundError
java.lang.NoClassDefFoundError
at org.hyperic.lather.client.LatherHTTPClient.invoke(LatherHTTPClient.java:99)
at org.hyperic.hq.bizapp.client.AgentCallbackClient.invokeLatherCall(AgentCallbackClient.java:157)
at org.hyperic.hq.bizapp.client.AutoinventoryCallbackClient.aiSendReport(AutoinventoryCallbackClient.java:52)
at org.hyperic.hq.autoinventory.agent.server.AutoinventoryCommandsServer.scanComplete(AutoinventoryCommandsServer.java:331)
at org.hyperic.hq.autoinventory.ScanManager.scanComplete(ScanManager.java:308)
at org.hyperic.hq.autoinventory.Scanner.notifyScanComplete(Scanner.java:273)
at org.hyperic.hq.autoinventory.Scanner.start(Scanner.java:211)
at org.hyperic.hq.autoinventory.ScanManager.mainRunLoop(ScanManager.java:141)
at org.hyperic.hq.autoinventory.ScanManager.access$000(ScanManager.java:41)
at org.hyperic.hq.autoinventory.ScanManager$1.run(ScanManager.java:107)
2008-12-18 14:41:57,005 ERROR [Thread-5] [SystemErr] Exception in thread "Thread-5"
2008-12-18 14:41:57,007 ERROR [Thread-5] [SystemErr] java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
2008-12-18 14:41:57,008 ERROR [Thread-5] [SystemErr] at org.apache.commons.httpclient.HttpClient.<clinit>(HttpClient.java:65)
2008-12-18 14:41:57,008 ERROR [Thread-5] [SystemErr] at org.hyperic.lather.client.LatherHTTPClient.invoke(LatherHTTPClient.java:99)
2008-12-18 14:41:57,009 ERROR [Thread-5] [SystemErr] at org.hyperic.hq.bizapp.client.AgentCallbackClient.invokeLatherCall(AgentCallbackClient.java
:157)
2008-12-18 14:41:57,009 ERROR [Thread-5] [SystemErr] at org.hyperic.hq.bizapp.client.MeasurementCallbackClient.getMeasurementConfigs(MeasurementCa
llbackClient.java:107)
2008-12-18 14:41:57,010 ERROR [Thread-5] [SystemErr] at org.hyperic.hq.measurement.agent.server.ConfigPopulateThread.run(ConfigPopulateThread.java
:80
0 Kudos

Gosh, this is starting to be so weird. I haven't had any problems with zones. Both zone types works ok, whole root and sparce root. Is this agent failing on all zones you have or is it just this one spesific zone?

You could also try starting agent directly from bundle (/bundles/agent-4.0.1-905/bin/agent.sh). In case that wrapper process is doing something stupid.
0 Kudos
excowboy
Virtuoso
Virtuoso

Hi Roadrunner,

I think this is a know issue with running Hyperic HQ Agents in Solaris Zones. The problem is the global loopback device. Obviously Zones provide only one loopback device. So the first starting Agents binds to the loopback device and the second, third one etc. can't bind because it's already in use.
I don't have a Zones test environment, but one idea is to use different ports for each Agent you start, e.g.
Global Zone Port 2144, Zone #1 Port 22144, Zone #2 Port 22145 etc.

HTH,
Mirko
0 Kudos

Mirko, this may not be the case. Every non-global zone has it's own locigal loopback address. So binding should not be the problem. I don't have any problems to run multiple zones with agents installed + agent on global zone.

There is one specific case where we could theoretically expect some problems. If we need bindings in multiple zones and we are accessing ip's on other zones within same physical hardware.

Most software can be installed to zone. Surprisingly every software where I've seen problems in zones are coming directly from Sun Microsystem. Doesn't it sound ridiculous 🙂

Roadrunner, where is your hq server installed? Is it in some other zone in same hardware?
0 Kudos
excowboy
Virtuoso
Virtuoso

Hi Janne,

thanks for your comment. Sorry I'm not very familiar with Zones and I don't know much about the network setup, but maybe its a problem with ipfilter.
Do you have a special network setup for your Zone hosts ?

Cheers,
Mirko
0 Kudos
roadrunner99
Contributor
Contributor

Hello All,

we have tried all the hints you gave. Nothing works. Also Solaris specialists have taken a look.

We will get a company that should take a look to it.

Thanks for all your valuable support.

Regards
Roadrunner
0 Kudos

If and when you get this resolved, please let us know. I'm personally very interested to know what is this issue. If this indeed is a zone issue, I would like to know about it. We're mostly using zones in development side where 95% of the hosts are zones.

There has been problems, but nothing what we haven't been able to sort out. Nothing about HQ, but problems are mostly network related issues. There's always problems with software's, regardless what OS you are using (well, problems in BillGates land are pane in the ass). But solaris is the best OS if you want to know what the problem is. Groovy language is my new fiancee, but it used to be solaris DTrace. I've solved to many weird problems with it.
0 Kudos

> Do you have a special network setup for your Zone
> hosts ?

Well kind of but nothing to do with this issue. These tweaks are mostly related to different sub networks what we're using on zones but not want to expose to global zone. During the solaris10 life cycle, zones has evolved a lot. So ancient solaris10 version could theoretically be the case(would not be surprised), but somehow I wont believe it.

Very weird case, though.
0 Kudos