According to http://jira.hyperic.com/browse/HHQ-2676, a fix was made to the 4.1.0 agent to ensure the PID is found by the wrapper script on Solaris in the case of really long ps "args." This works, but it breaks the testpid() function within the wrapper script, which expects $PSEXE to accept a -p argument.
Here are some test results. This is executing a stop where the agent command is running. It does stop the agent (most of the time anyway), but it looks like something went wrong:
solaris10sparc% ./bin/hq-agent.sh stop Stopping HQ Agent... /usr/ucb/ps: illegal option -- p usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ] /usr/ucb/ps: illegal option -- p usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ] Stopped HQ Agent.
In this case, the agent is aleady running, and the wrapper script thinks the PID is missing and starts the agent again. Rut roh, 'raggy.
solaris10sparc% ./bin/hq-agent.sh start Starting HQ Agent.../usr/ucb/ps: illegal option -- p usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ] Removed stale pid file: /opt/hyperic/hq-agent/wrapper/sbin/../../wrapper/hq-agent.pid
To fix this, I changed the instances of "testpid" to "getpid" in the stopit() function. This way the Solaris specific 'ps' command is used. I actually don't fully understand why testpid() exists in addition to getpid() in the script, as getpid() was already used to "test" whether the PID existed.
In my testing, it keeps a second agent from being started, and prevents the error messages from being printed on stop. I've attached a diff from the 4.1.2-1053 agent I downloaded for Solaris. Note, it also has some changes from "echo -n" to "printf" to try to get the message "Agent starting..." to print out correctly.
Even with these changes, it still doesn't print "Successful" after the agent starts up. But I wanted to at least post out this question to see if anyone else has seen this behavior on Solaris or if another fix was more obvious 🙂