A boatload of Zombies... here too!
0 Z root 8536 3673 0 79 0 - 0 nct> Aug05 ? 00:00:00
0 Z root 8537 3673 0 79 0 - 0 nct> Aug05 ? 00:00:00
0 Z root 8543 3673 0 78 0 - 0 nct> Aug05 ? 00:00:00
0 Z root 32350 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32351 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32352 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32353 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32354 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32355 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32356 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32357 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32391 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32397 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32398 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32399 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32400 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32401 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32402 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32403 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32404 3673 0 79 0 - 0 nct> Aug06 ? 00:00:00
0 Z root 32405 3673 0 78 0 - 0 nct> Aug06 ? 00:00:00
Yup same here, we are running HP DL 380 G5 Hardware with the 8.1.0 agents.
I've got the same error running ESX on HP DL 585. The problem is independent of the hp agents. I have several server with and without hp agents. All have the same problem. pegasus restart kills the zombies, but I think there should be a better solution than using a cron job. Anyone any idea?
Mine was resolved with a reboot of my ESX nodes.
They seem to have started after updating to update 2 and then installing the 8.1.0 HP
agents, the agents restart the pegasus services (I think cimservers is part of it)
I did reboot after installing the agents as everything looked ok. But I had the issue on all my nodes,
so all I can assume is that the issue was related...
Oops spoke to soon, I see it's back.
Will have to log a call then.
I'm also in the same boat. I have the 7.91 version of the agents, and restarting pegasus seems to resolve the issue for a little while.
Anyone call Vmware support on this issue?
I don't use the agents at all, and continue to have the problem. I
think the issue is entirely with VMware though it seems all of us have
HP hardware.
Anyone have an update on this?
Installed the latest patches today but the problem is still there.
We also have the same issue. Running update2 (just finished updating the whole cluster to all the latest patches available in Update Manager) and running the HP 8.10 agents. I do no recall seeing this with the 7.x agents - but then we were running Update 1 at that time. I am going to remove the HP agents and confirm if the issue persists like in the other posts. If confirmed - ticket will be opened with VMware.
When you open the SR, will you post it here? I'd like to be able to
relate it when I call in. I've been off on paternity leave for some
time and haven't yet opened an SR.
On Wed, Aug 27, 2008 at 11:52 AM, j_dubbs
Hi all
I am having the same issue. What happens after 17 days is that ther r about 32000 of these processes. ESX have a max value of +- 32000 PID. Thus when all have been used up, one cannot SSH into the server, log in from the console or the ESX server discconects from VC.
Also we have HP sevrers with the HP agents loaded. Our Dell servers does not have this problem.
Have a look at your cron log, /var/log/cron & cron.1. you might see that some of the job have not run. Also look in your /var/log/messages. There is a lot of login failures.
We have logged a call with HP to have resolved.
That explains why all of our nodes went belly up around the same time. I am going to keep an eye on our process count (we have removed the HP agents from one box for now to see a virgin esx compared..) Can you let us know if HP comes up with anything for you, thanks.
Hi
Will do. We have SR open at VMware and call open at HP. Give us 24 hours to feedback
We also have the same issue. Running update2 (just finished updating the whole cluster to all the latest patches available in Update Manager) and running the HP 8.10 agents. I do no recall seeing this with the 7.x agents - but then we were running Update 1 at that time. I am going to remove the HP agents and confirm if the issue persists like in the other posts. If confirmed - ticket will be opened with VMware.
Are you running this build, 110268?
Don Pomeroy
VMware Communities User Moderator
Yes all my servers are all running 110268, running on HP DL380's. It doesn't seem to matter what generation of 380's they are. Both G4 and G5 have the same issue.
I have a few DL 585 G2 and G5s, with 3.5 U2 (Build 110181) Agent 8.1 installed. However I am not getting any cim alerts. If I do a ps -A |grep cim this is all I get....
ps -A |grep cim
4280 ? 01:10:12 cimserver
4410 ? 00:00:00 cimprovagt
4502 ? 00:00:00 cimprovagt
4505 ? 00:00:00 cimprovagt
4692 ? 00:16:54 cimprovagt
4708 ? 00:14:00 cimprovagt
4723 ? 00:05:21 cimprovagt
4730 ? 00:04:25 cimprovagt
4735 ? 00:09:03 cimprovagt
4740 ? 00:00:38 cimprovagt
4742 ? 00:00:00 cimprovagt
4750 ? 00:00:46 cimprovagt
4756 ? 00:01:18 cimprovagt
4760 ? 00:01:31 cimprovagt
4762 ? 00:01:29 cimprovagt
4765 ? 00:01:23 cimprovagt
4767 ? 00:00:40 cimprovagt
4771 ? 00:00:19 cimprovagt
These servers aren't really facing any real load at the moment so I'm not sure if server load is causing this.
We have 3 DL580G4 servers all running the HP agents ver 8.1. Two of these are on 3.5 ver 110268 and have the cimservera problem the other is still on 3.5 ver 64607 and doesn't have this issue. It's a pity there isn't a method to uninstall the vmware patches.
I believe they were all running 110181 (basically Update 2 GA with the timebomb patch) and HP Agent 8.1. I've completed upgraded all the hosts to 110268 last night, and removed the HP agent from one host. I will monitor throughout the day and see if the problem is resolved with the newer build of Update 2. I also have not seen this problem on our Dell servers (2950 III) but only on our HP servers (BL460 and BL480) I will keep you posted if the issue still exists with the newer build.. Cheers
Hi
Have a look in /var/log/messages. I had a lot of errors from a user account on the HP Cim server trying to connect to the esx server.
We then stopped the esx server from being monitored...and no more cimservera defunct processes !!
Also we r at the latest ESX patch level and HP agants.