VMware Cloud Community
edan_hyperic
Enthusiast
Enthusiast

HHQ-2676 fix breaks pid checking on Solaris

According to http://jira.hyperic.com/browse/HHQ-2676, a fix was made to the 4.1.0 agent to ensure the PID is found by the wrapper script on Solaris in the case of really long ps "args." This works, but it breaks the testpid() function within the wrapper script, which expects $PSEXE to accept a -p argument.

Here are some test results. This is executing a stop where the agent command is running. It does stop the agent (most of the time anyway), but it looks like something went wrong:

solaris10sparc% ./bin/hq-agent.sh stop
Stopping HQ Agent...
/usr/ucb/ps: illegal option -- p
usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ]
/usr/ucb/ps: illegal option -- p
usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ]
Stopped HQ Agent.

In this case, the agent is aleady running, and the wrapper script thinks the PID is missing and starts the agent again. Rut roh, 'raggy.

solaris10sparc% ./bin/hq-agent.sh start
Starting HQ Agent.../usr/ucb/ps: illegal option -- p
usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ]
Removed stale pid file: /opt/hyperic/hq-agent/wrapper/sbin/../../wrapper/hq-agent.pid


To fix this, I changed the instances of "testpid" to "getpid" in the stopit() function. This way the Solaris specific 'ps' command is used. I actually don't fully understand why testpid() exists in addition to getpid() in the script, as getpid() was already used to "test" whether the PID existed.

In my testing, it keeps a second agent from being started, and prevents the error messages from being printed on stop. I've attached a diff from the 4.1.2-1053 agent I downloaded for Solaris. Note, it also has some changes from "echo -n" to "printf" to try to get the message "Agent starting..." to print out correctly.

This is how it looks without changes:

solaris10sparc% ./bin/hq-agent.sh start
-n Starting HQ Agent...

Even with these changes, it still doesn't print "Successful" after the agent starts up. But I wanted to at least post out this question to see if anyone else has seen this behavior on Solaris or if another fix was more obvious 🙂

Cheers.
0 Kudos
1 Reply
excowboy
Virtuoso
Virtuoso

Hi edan,

thanks for your posting and your fix. I have a Solaris 9 Box running an Agent 4.0.3 and because I'm planning to upgrade to 4.1.2 I will verify the things you've described.

Cheers,
Mirko
0 Kudos