10 Replies Latest reply: Mar 2, 2009 9:24 AM by jvalkeal_hyperic RSS

    Multiple agents gets started and runs out of memory

    kishorerajput Hot Shot
      Hi,
      I have an agent running on the same box as server and agent starts successfully but over the time their are more agents created automatically and each agent keeps on consuming memory and finally machine runs out of memory.

      I have other agents running on other machines and they are able to communicate with server and they are running perfectly fine and each machine has only one agent running , Do any one have idea why multiple agents gets created for the agent on the same machine as Hyperic server?

      Thanks,
      Kishore.
        • 1. Re: Multiple agents gets started and runs out of memory
          excowboy Master
          Hi,

          what OS is this ? Which HQ Agent Version are you running ?

          Cheers,
          Mirko
          • 2. Re: Multiple agents gets started and runs out of memory
            kishorerajput Hot Shot
            Hi,
            The OS is :Solaris 10 Sun Sparc
            Hyperic Agent : hyperic-agent-3.2.3-EE

            Thanks,
            Kishore.
            • 3. Re: Multiple agents gets started and runs out of memory
              excowboy Master
              Hi,

              could you please upgrade your Agent to the latest 3.x version (3.2.6) or to 4.0.3 and report if the error still occurs  ?

              Cheers,
              Mirko
              • 4. Re: Multiple agents gets started and runs out of memory
                kishorerajput Hot Shot
                Can you please send me the link which mentions the step by step approach of upgrading the Hyperic Client?

                I will give it a try.

                Thanks.
                • 6. Re: Multiple agents gets started and runs out of memory
                  jvalkeal_hyperic Expert
                  Could this be related to bug in jre, which spawns extra hq java processes. For me it happened when solaris jre did a fork to run external scripts. There is at least 2 support cases in jira for this issue, with workarounds.

                  Hard to say until there's stack dumps from jre and os, thought.
                  • 7. Re: Multiple agents gets started and runs out of memory
                    excowboy Master
                    Hi Janne,

                    OS users do not have access to JIRA support cases, so could your probably post a workaround ?

                    Cheers,
                    Mirko
                    • 8. Re: Multiple agents gets started and runs out of memory
                      jvalkeal_hyperic Expert
                      This was the situation:
                      These are processes shown by ps:
                      root 20175 1 0 Dec 18 ? 21:41 /opt/hyperic/hyperic-hq-agent-4.0.1-EE/wrapper/sbin/../../wrapper/sbin/wrapper-
                      root 20176 20175 0 Dec 18 ? 246:46 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
                      root 1771 20176 0 Dec 25 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
                      root 11942 20176 0 Jan 01 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
                      root 15521 20176 0 Jan 03 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
                      root 18470 20176 0 Jan 05 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
                      root 20349 20176 0 Jan 06 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
                      root 24932 20176 0 06:20:16 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
                      root 24007 20176 0 17:30:16 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..

                      As you can see that original process(20176) is started by wrapper. All other stucked childs are forked by this main process, which you see by comparing parent id's.

                      Snippets from jstack:
                      #/usr/jdk/jdk1.5.0_14/bin/jstack 20176
                      Thread t@58166: (state = IN_NATIVE)
                      - java.lang.UNIXProcess.waitForProcessExit(int) @bci=0 (Interpreted frame)
                      - java.lang.UNIXProcess.access$900(java.lang.UNIXProcess, int) @bci=2, line=17 (Interpreted frame)
                      - java.lang.UNIXProcess$2$1.run() @bci=17, line=86 (Interpreted frame)

                      #/usr/jdk/jdk1.5.0_14/bin/jstack 1771
                      Thread t@109: (state = IN_NATIVE)
                      - java.lang.UNIXProcess.forkAndExec(byte[], byte[], int, byte[], int, byte[], boolean, java.io.FileDescriptor, java.io.FileDescriptor, java.io.FileDescriptor) @bci=0 (Interpreted frame)
                      - java.lang.UNIXProcess.<init>(byte[], byte[], int, byte[], int, byte[], boolean) @bci=62, line=53 (Interpreted frame)
                      - java.lang.ProcessImpl.start(java.lang.String[], java.util.Map, java.lang.String, boolean) @bci=182, line=65 (Interpreted frame)
                      - java.lang.ProcessBuilder.start() @bci=112, line=451 (Interpreted frame)
                      - java.lang.Runtime.exec(java.lang.String[], java.lang.String[], java.io.File) @bci=16, line=591 (Interpreted frame)
                      - org.hyperic.util.exec.Execute.execute() @bci=16, line=316 (Interpreted frame)
                      - org.hyperic.hq.product.ExecutableProcess.collect() @bci=98, line=202 (Interpreted frame)
                      - org.hyperic.hq.product.Collector.run() @bci=41, line=562 (Interpreted frame)
                      - edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.runWorker(edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker) @bci=46, line=1061 (Interpreted frame)
                      - edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=575 (Interpreted frame)
                      - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)

                      At this time below found from agent.log:
                      2008-12-25 20:17:55,342 INFO [pool-1-thread-12] [Execute] waitFor() interrupted
                      2008-12-25 20:17:57,359 ERROR [pool-1-thread-12] [ExecutableProcess] [../../bundles/agent-4.0.1-EE-905/pdk/work/scripts/sendmail/hq-sendmail-stat]: Timeout
                      running [../../bundles/agent-4.0.1-EE-905/pdk/work/scripts/sendmail/hq-sendmail-stat ]
                      -------------------------------------------------

                      Workarounds are:
                      - See http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6276483 and apply workaround #2 (jre/lib/security/java.security) to your JRE
                      - Add plugins.exclude=ntp,sendmail to your agent.properties (so exclude plugins which runs external scripts)

                      Modifying java.security resolved my problems.
                      -----------------------------------------------------

                      This specific issue was related to x86 solaris. But it may also happen in sparc. I've seen these spawned processes also on sparc once. Unfortunately I was too quick to restart agent and I forget to store jstack and pstack outputs from the processes. So I'm not exactly sure if this is the case.

                      It's nasty issue with 1.5 java. Only fixed on 1.6 and I believe Sun wont backport the fix to older jre's.
                      • 9. Re: Multiple agents gets started and runs out of memory
                        jvalkeal_hyperic Expert
                        Also removing 'security.provider.1=sun.security.pkcs11.SunPKCS11 ${java.home}/lib/security/sunpkcs11-solaris.cfg' from java.security will brake agent.

                        Jre will expect to find default provider which is the first one. This wasn't that clear in workaround. So after removing security.provider.1 rename security.provider.2 to security.provider.1. security.provider.3 to security.provider.2, etc....
                        • 10. Re: Multiple agents gets started and runs out of memory
                          jvalkeal_hyperic Expert
                          I finally found this bug to happen also on Solaris sparc. Process dumps and thread dumps from solaris is showing exact match if comparing to Solaris x86.

                          I've done same workaround by modifying java.security. We'll see within few days whether this fix works or not.