VMware Cloud Community
dominic7
Virtuoso
Virtuoso

ESX 3.5.0 Update 2 [cimservera <defunct>]

Is anyone else getting a tone of defunct processes from cimservera after they install ESX 3.5.0 Update 2?

I've got cluster that is generating a lot of these that was all rebuilt at update 2. Unfortunately the output doesn't format well in here.

Tags (5)
Reply
0 Kudos
124 Replies
jonb157
Enthusiast
Enthusiast

This worked for us. In my /var/log/messages I was only seeing entries that corresponds to the "Daily System Identification Task" in HP Sim. Fortunately ours only runs weekly so not too many entries yet in our ESX hosts. I just disabled the scan task until HP/VmWare comes up with a long-term solution. Good catch on this on MalcO

Reply
0 Kudos
dominic7
Virtuoso
Virtuoso

I don't really use the SIM ( it's managed by someone else ) so I turned the port off on the firewall so I don't get any more more defunct processes. So far it's been about 24 hours with no additional processes:

esxcfg-firewall -d CIMHttpsServer && service pegasus restart

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal

I've been following this thread for quite sometime and have been working with our HP rep. He has issued the following email with the fix. I have not yet been able to test in our environment. For those using SIM, hopefully this helps.

Just got word from an HP team that there is a patch kit for HP SIM that is supposed to include a fix to prevent queries to the ESX cimservera process to solve this issue. The cause seems to be a change in the cim providers of the latest ESX 3.5 U2 release and this patch changes the way the Insight Agents and SIM communicate with it. I tested this on my lab server and it took about 2 minutes but did re-start SIM.

Link to patch: http://h18013.www1.hp.com/products/servers/management/hpsim/dl_windows52sp1.html?jumpid=reg_R1002_US...

Reply
0 Kudos
jonb157
Enthusiast
Enthusiast

Will this apply if you are already running HP SIM 5.2 sp2? I as because when i run the patch kit it says "HP SIM 5.2 not detected" I know it is because it says

Build Version C.05.02.02.00

Build date:

2008-07-04 10:23

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal

Will this apply if you are already running HP SIM 5.2 sp2? I as because when i run the patch kit it says "HP SIM 5.2 not detected" I know it is because it says

|Build Version C.05.02.02.00|

|Build date:|2008-07-04 10:23|

I'm working with HP on this now, hopefully I'll have an answer soon.

Reply
0 Kudos
Dollar
Enthusiast
Enthusiast

Just for added info (and resolution/workaround clarification), Troy steered me to this thread from another thread. A big help. I am on Insight Manager 5.1 (Insight Addition) with every plug-in HP has ever released for the darn thing. Upgrading to 5.2 will be a pain, so I went in and disabled scheduled jobs. The zombies/defunct processes have quit. The only scheduled jobs I have remaining are Initial Data Collection and Initial Hardware Status polling. Not a big issue since I am using SNMP andI will still recieve hardare failure notification via SNMP traps. I'll leave it this way until either VMWare releases a patch or I can get the Insight Manager update scheduled (after someone else confirms this does indeed resolve the issue).

I also have a case open with VMWare and will push for a patch.

Thanks for the help guys.

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal

here's some more fuel to add to the fire regarding the Insight Manager Agents on ESX

http://kb.vmware.com/kb/1004771

To us... it's now clear, bye bye IM agents

Reply
0 Kudos
jonb157
Enthusiast
Enthusiast

FYI, I've now updated to 5.2 and also ran the hotfix patch kit successfully and....still no resolution. I forced a Daily Inventory scan and then checked my /var/log/messages and the same error appeared. So any download you see on the HPSim website will NOT fix the issue. We will have to wait for a new HP or VmWare update to solve the Daily Inventory Scan task issue; I've disabled mine to stop the errors from logging.

Reply
0 Kudos
laurensdekoning
Contributor
Contributor

I have exact the same problem.

I've tried serveral things, namely i've upgraded the HP Management Agents to the latest version 8.1.0 and i've updated/patched my ESX-servers to the latest build (from 110268 to 113339).

But unfortunately no luck with this. And I have several log messages in my /var/log/messages that indicate there is an authentication problem somehow.

Which I don't understand, but the service account it says to've failed to authenticate with is our HP Insight Manager service account. So I'm guessing it's the user that let's the ESX server talk to the SNMP service.

Sep 30 05:11:08 vm-srv01 cimservera[15644]: user "our-service-account" failed to authenticate

Sep 30 05:11:10 vm-srv01 cimservera[15645]: user "our-service-account" failed to authenticate

Sep 30 05:11:13 vm-srv01 cimservera[15646]: user "our-service-account" failed to authenticate

Sep 30 05:11:14 vm-srv01 cimservera[15647]: user "our-service-account" failed to authenticate

Sep 30 05:11:16 vm-srv01 cimservera[15648]: user "our-service-account" failed to authenticate

Sep 30 05:11:18 vm-srv01 cimservera[15649]: user "our-service-account" failed to authenticate

Sep 30 05:11:20 vm-srv01 cimservera[15650]: user "our-service-account" failed to authenticate

Sep 30 05:11:22 vm-srv01 cimservera[15651]: user "our-service-account" failed to authenticate

Reply
0 Kudos
Daune_Mattoon
Contributor
Contributor

Any update on this issue? This hit me this morning and my entire cluster failed. I basically stop the pegasus service and stopped it from starting on a restart.

I tried the HP SIM updates (I have DL3805's), but no change. I also updated to VirtualCenter 2.5 Update 3.0, but agin no change. I have a ticket open at VMWare.

Commands to stop and disable server on restart.

service pegasus stop

chkconfig --level 5 pegasus off

Reply
0 Kudos
Daune_Mattoon
Contributor
Contributor

chkconfig --level 3 pegasus off

Level 3 as well

Reply
0 Kudos
DGI_Drift
Contributor
Contributor

So if I run this commands,

service pegasus stop

chkconfig --level 5 pegasus off

so will disappear?

Reply
0 Kudos
Daune_Mattoon
Contributor
Contributor

Hey,

This will stop the service from running and keep "Pegasus" from starting if the servers reboot. This is just a stop gap until the real fix comes out. Basically, the Pegasus service cannot handle certain messages correctly and causes it to fail, thus leaving the defunct process going. In my network (80 VM's across 6 servers) I see about 10-15 per hour and eventually will cause the servers to hang and must be restarted. Vmotion does not even work. Below are the exact commands to run on each server to keep this from occurring until the fix comes out.

service pegasus stop

chkconfig --level 5 pegasus off

chkconfig --level 3 pegasus off

Good luck,

Dave

Reply
0 Kudos
AndrewJarvis
Enthusiast
Enthusiast

We have this too - have disabled the WBEM discovery in HP SIM for now - had to reboot two hosts yesterday grrrrrr ... is it me or are they getting a little lax nowadays?

Reply
0 Kudos
stick2golf
Contributor
Contributor

Yeah..This is a very big issue. I had to reboot 2 out of our 6 servers, HA nd DRS was freaking out and VMotion was failing. 25 VM's were rebooted at 9:40am, not good.

Support from VMWare was great, but it seems that this issue would be resolved by now via patch instead of these workarounds.

Reply
0 Kudos
larden
Contributor
Contributor

Can someone post the current suggested work around for us all? Saves calling support

Thanks

VMware Rocks!

VMware Rocks!
Reply
0 Kudos
dominic7
Virtuoso
Virtuoso

I don't know that anyone has received an official workaround from VMware. I know that I have been disabling the firewall port for the CIMHttpsServer and this has stopped all the defunct processes and has been working well for a few months. Disabling pegasus I think will break the hardware monitoring ( not that it's super awesome right now ). The only side effect to disabling the http port on the firewall is that some tools like the EMC ECC Agent don't work, though it turns out it barely works anyway.

esxcfg-firewall -d CIMHttpsServer && service pegasus restart

Reply
0 Kudos
stick2golf
Contributor
Contributor

I have a call with VMWare engineer today at 2:00cst and will provide any update on this issue.

1) Stop the pegasus task and disable it from starting if esx server is restarted:

service pegasus stop

chkconfig --level 5 pegasus off

chkconfig --level 3 pegasus off

2) Disable WBEM discovery in HP SIM (Note: This can be done as well, but if you perform #1 you don't need to do this)

HPSIM -&gt; Options -&gt; Protocol Settings -&gt; Global Protocol Settings .... Uncheck WBEM

Reply
0 Kudos
stick2golf
Contributor
Contributor

Oh yeah.. Step #1 was recommended by VMWare engineer, but I have not seen any "official" workaround posted by VMWare..

Reply
0 Kudos
EPL
Contributor
Contributor

are there any negatives to turning this service off?

Reply
0 Kudos