ESX 3.5.0 Update 2 [cimservera <defunct>]

dominic7 · ‎07-31-2008

Is anyone else getting a tone of defunct processes from cimservera after they install ESX 3.5.0 Update 2?

I've got cluster that is generating a lot of these that was all rebuilt at update 2. Unfortunately the output doesn't format well in here.

cmadzela · ‎08-12-2008

A boatload of Zombies... here too!

0 Z root 8536 3673 0 79 0 - 0 nct> Aug05 ? 00:00:00

0 Z root 8537 3673 0 79 0 - 0 nct> Aug05 ? 00:00:00

0 Z root 8543 3673 0 78 0 - 0 nct> Aug05 ? 00:00:00

0 Z root 32350 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32351 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32352 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32353 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32354 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32355 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32356 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32357 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32391 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32397 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32398 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32399 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32400 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32401 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32402 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32403 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32404 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32405 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32406 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00

0 Z root 32407 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00

Rabie · ‎08-13-2008

Yup same here, we are running HP DL 380 G5 Hardware with the 8.1.0 agents.

fabian_de1 · ‎08-13-2008

I've got the same error running ESX on HP DL 585. The problem is independent of the hp agents. I have several server with and without hp agents. All have the same problem. pegasus restart kills the zombies, but I think there should be a better solution than using a cron job. Anyone any idea?

Rabie · ‎08-14-2008

Mine was resolved with a reboot of my ESX nodes.

They seem to have started after updating to update 2 and then installing the 8.1.0 HP

agents, the agents restart the pegasus services (I think cimservers is part of it)

I did reboot after installing the agents as everything looked ok. But I had the issue on all my nodes,

so all I can assume is that the issue was related...

Oops spoke to soon, I see it's back.

Will have to log a call then.

EPL · ‎08-14-2008

I'm also in the same boat. I have the 7.91 version of the agents, and restarting pegasus seems to resolve the issue for a little while.

Anyone call Vmware support on this issue?

dominic7 · ‎08-14-2008

I don't use the agents at all, and continue to have the problem. I

think the issue is entirely with VMware though it seems all of us have

HP hardware.

EPL · ‎08-15-2008

Anyone have an update on this?

MalcO · ‎08-27-2008

Installed the latest patches today but the problem is still there.

j_dubbs · ‎08-27-2008

We also have the same issue. Running update2 (just finished updating the whole cluster to all the latest patches available in Update Manager) and running the HP 8.10 agents. I do no recall seeing this with the 7.x agents - but then we were running Update 1 at that time. I am going to remove the HP agents and confirm if the issue persists like in the other posts. If confirmed - ticket will be opened with VMware.

dominic7 · ‎08-27-2008

When you open the SR, will you post it here? I'd like to be able to

relate it when I call in. I've been off on paternity leave for some

time and haven't yet opened an SR.

On Wed, Aug 27, 2008 at 11:52 AM, j_dubbs

admin · ‎08-27-2008

Hi all

I am having the same issue. What happens after 17 days is that ther r about 32000 of these processes. ESX have a max value of +- 32000 PID. Thus when all have been used up, one cannot SSH into the server, log in from the console or the ESX server discconects from VC.

Also we have HP sevrers with the HP agents loaded. Our Dell servers does not have this problem.

Have a look at your cron log, /var/log/cron & cron.1. you might see that some of the job have not run. Also look in your /var/log/messages. There is a lot of login failures.

We have logged a call with HP to have resolved.

j_dubbs · ‎08-27-2008

That explains why all of our nodes went belly up around the same time. I am going to keep an eye on our process count (we have removed the HP agents from one box for now to see a virgin esx compared..) Can you let us know if HP comes up with anything for you, thanks.

admin · ‎08-27-2008

Hi

Will do. We have SR open at VMware and call open at HP. Give us 24 hours to feedback

dpomeroy · ‎08-27-2008

We also have the same issue. Running update2 (just finished updating the whole cluster to all the latest patches available in Update Manager) and running the HP 8.10 agents. I do no recall seeing this with the 7.x agents - but then we were running Update 1 at that time. I am going to remove the HP agents and confirm if the issue persists like in the other posts. If confirmed - ticket will be opened with VMware.

Are you running this build, 110268?

Don Pomeroy

VMware Communities User Moderator

EPL · ‎08-27-2008

Yes all my servers are all running 110268, running on HP DL380's. It doesn't seem to matter what generation of 380's they are. Both G4 and G5 have the same issue.

altonius · ‎08-27-2008

I have a few DL 585 G2 and G5s, with 3.5 U2 (Build 110181) Agent 8.1 installed. However I am not getting any cim alerts. If I do a ps -A |grep cim this is all I get....

ps -A |grep cim

4280 ? 01:10:12 cimserver

4410 ? 00:00:00 cimprovagt

4502 ? 00:00:00 cimprovagt

4505 ? 00:00:00 cimprovagt

4692 ? 00:16:54 cimprovagt

4708 ? 00:14:00 cimprovagt

4723 ? 00:05:21 cimprovagt

4730 ? 00:04:25 cimprovagt

4735 ? 00:09:03 cimprovagt

4740 ? 00:00:38 cimprovagt

4742 ? 00:00:00 cimprovagt

4750 ? 00:00:46 cimprovagt

4756 ? 00:01:18 cimprovagt

4760 ? 00:01:31 cimprovagt

4762 ? 00:01:29 cimprovagt

4765 ? 00:01:23 cimprovagt

4767 ? 00:00:40 cimprovagt

4771 ? 00:00:19 cimprovagt

These servers aren't really facing any real load at the moment so I'm not sure if server load is causing this.

MalcO · ‎08-28-2008

We have 3 DL580G4 servers all running the HP agents ver 8.1. Two of these are on 3.5 ver 110268 and have the cimservera problem the other is still on 3.5 ver 64607 and doesn't have this issue. It's a pity there isn't a method to uninstall the vmware patches.

j_dubbs · ‎08-28-2008

I believe they were all running 110181 (basically Update 2 GA with the timebomb patch) and HP Agent 8.1. I've completed upgraded all the hosts to 110268 last night, and removed the HP agent from one host. I will monitor throughout the day and see if the problem is resolved with the newer build of Update 2. I also have not seen this problem on our Dell servers (2950 III) but only on our HP servers (BL460 and BL480) I will keep you posted if the issue still exists with the newer build.. Cheers

admin · ‎08-28-2008

Hi

Have a look in /var/log/messages. I had a lot of errors from a user account on the HP Cim server trying to connect to the esx server.

We then stopped the esx server from being monitored...and no more cimservera defunct processes !!

Also we r at the latest ESX patch level and HP agants.