craigateb
Contributor
Contributor

cimserver process high cpu useage on core0

I have been noticing an issue lately with two of our ESX host servers. Over the period of about a day, one of the cimserver processes eventually starts using 100% of CPU 0 on the ESX hosts. When this is occurring a reboot of the ESX host stalls displaying backtrace messages from the cimserver process. I have been able to manually restart pegasus (service pegasus restart) and clear up the hog process, but the issue returns again the next day.

ESX hosts are Sun x4140 servers.

ESX version is 3.5.0 U3 Build 123630

0 Kudos
9 Replies
Troy_Clavell
Immortal
Immortal

you should try to install the patch listed in this KB, it may help

http://kb.vmware.com/kb/1006657

0 Kudos
craigateb
Contributor
Contributor

I have run all the updates on both our ESX host servers using Update Manager. Both ESX hosts are now at ESX 3.5.0 143128. One of the hosts has already developed the cimserver process issue again after the updates completed this morning. Core 0 is at 100% useage and a top command displays that it is a cimserver process again. Again a service pegasus restart command got rid of the cimserver process, but it looks like even with the updates the issue will come back.

0 Kudos
larjona
Enthusiast
Enthusiast

I have the same situation on 2 esx server 3.0.5 build 143128.

The patch you say, is applied:

esxupdate query

Installed software bundles:

-


Name

-


Install Date --- --- Summary ---

3.5.0-64607 21:06:12 02/21/08 Full bundle of ESX 3.5.0-64607

ESX350-200802305-SG 09:47:07 03/11/08 openssl security update

ESX350-200802303-SG 09:47:07 03/11/08 util-linux security update

ESX350-200802408-SG 09:47:07 03/11/08 Security Updates to the Python Package.

ESX350-200803209-UG 09:34:34 02/04/09 Update to the ESX Server Service Console

ESX350-200810201-UG 09:39:44 02/04/09 Updates VMkernel, Service Console, hostd

ESX350-200803212-UG 09:50:31 02/04/09 Update VMware qla4010/qla4022 drivers

ESX350-200803213-UG 09:51:50 02/04/09 Driver Versioning Method Changes

ESX350-200803214-UG 09:52:53 02/04/09 Update to Third Party Code Libraries

ESX350-200804405-BG 09:53:40 02/04/09 Update to VMware-esx-drivers-scsi-megara

ESX350-200805504-SG 09:54:28 02/04/09 Security Update to Cyrus SASL

ESX350-200805505-SG 09:55:12 02/04/09 Security Update to unzip

ESX350-200805506-SG 09:55:58 02/04/09 Security Update to Tcl/Tk

ESX350-200805507-SG 09:56:46 02/04/09 Security Update to krb5

ESX350-200805514-BG 09:57:35 02/04/09 Update to VMware-esx-drivers-net-e1000

ESX350-200808206-UG 09:59:07 02/04/09 Update to vmware-hwdata

ESX350-200808210-UG 09:59:49 02/04/09 Update to VMware-esx-drivers-net-ixgbe

ESX350-200808211-UG 10:00:31 02/04/09 Update to the tg3 Driver

ESX350-200808212-UG 10:01:20 02/04/09 Update to the MegaRAID SAS Driver

ESX350-200808215-UG 10:02:08 02/04/09 Update to the Emulex SCSI Driver

ESX350-200808218-UG 10:03:13 02/04/09 Security Update to Samba

ESX350-200808406-SG 10:05:02 02/04/09 Security Update to Perl

ESX350-200808407-BG 10:05:48 02/04/09 Updates Software QLogic FC Driver

ESX350-200808409-SG 10:06:37 02/04/09 Security Update to BIND

ESX350-200810203-UG 10:08:13 02/04/09 Updates MPT SCSI Driver

ESX350-200810204-UG 10:09:00 02/04/09 Updates bnx2x Driver for Broadcom

ESX350-200810205-UG 10:10:32 02/04/09 Updates CIM and Pegasus

ESX350-200810208-UG 10:13:10 02/04/09 Updates esxupdate documentation

ESX350-200810209-UG 10:14:01 02/04/09 Updates bnx2 Driver for Broadcom

ESX350-200810210-UG 10:24:53 02/04/09 Updates HP Storage Component Drivers

ESX350-200810212-UG 10:25:46 02/04/09 Updates VMkernel iSCSI Driver

ESX350-200810214-UG 10:26:43 02/04/09 Updated Time Zone Rules

ESX350-200810215-UG 10:28:10 02/04/09 Updates Web Access

ESX350-Update-02 10:28:24 02/04/09 ESX Server 3.5.0 Update 2

ESX350-Update01 10:28:34 02/04/09 ESX Server 3.5.0 Update 1

ESX350-Update03 10:28:44 02/04/09 ESX Server 3.5.0 Update 3

ESX350-200901402-SG 20:40:51 02/05/09 Security Update to ESX Scripts

ESX350-200811401-SG 20:43:48 02/05/09 Updates VMkernel, hostd, and Other RPMs

ESX350-200811406-SG 20:44:38 02/05/09 Security Update to bzip2

ESX350-200901406-BG 20:47:29 02/05/09 Updates Kernel Source and VMNIX

ESX350-200811408-BG 20:48:25 02/05/09 Updates QLogic Software Driver

ESX350-200901401-SG 20:50:16 02/05/09 Updates VMkernel, VMX, hostd etc

ESX350-200901404-BG 20:51:35 02/05/09 Updates VMware Tools

ESX350-200901405-BG 20:52:21 02/05/09 Updates lnxcfg

ESX350-200901407-BG 20:53:47 02/05/09 Updates Pegasus

ESX350-200901408-BG 20:54:45 02/05/09 Updates SATA Drivers

ESX350-200901409-SG 20:55:55 02/05/09 SNMP Security Update

ESX350-200901410-SG 20:56:48 02/05/09 Security Update for libxml2

With the command service pegasus restart, this process end. But i will be at 100% in a few days again.

0 Kudos
javella
Contributor
Contributor

Say hello to running VMware ESX 3.5 on the Sun X64 Opterons family.

We have 7 X4600 M2 in the field, we've gone from 3.0.2 and stepped through 3.5, U1, U2, U3 and soon U4. Throughout the whole adventure, 3.5 has given the cimserver/pegasus problems to the point where we have turned off the pegasus service all together. I would highly advise the doing the same, we've had so many support issues with VMware and these servers, at one point they told us we weren't on the HCL, however that turned out to be a documentation issue on their end. Vmware has said this problem won't be fixed until U5, and there's a somewhat partial fix in U4. We've also had many PSODs which VMware couldn't explain. We are looking at maybe moving to the X4140s since we're not nearly using all the bays on the X4600, and they have a newer motherboard chipset.

A side note about X4600 M2, please make sure you monitor your local hardrives through syslog if you're not using boot from san, neither the ILOM nor the VI3 Health Status will monitor them for you. Sun told me they don't have the LSI SAS controller talking to the ILOM yet, they said maybe by the end of this year last I talked to them at VMworld Vegas 2008. He also said that applies to their whole x64 family but I haven't checked into that.

-ja

craigateb
Contributor
Contributor

Thanks for the info. I have been leaning towards disabling the Pegasus service for a while now. It has taken Sun more than a month to get our support on VMware licenses worked out and it's still not resolved. Otherwise I would have logged a support ticket on the issue.

Luckily no PSOD for us yet with the x4140. There have been some quirks but nothing major yet. Occasional keyboard controller (probably from the ILOM console) freaking out and displaying a lot of messages on the systems console till it calms down. In the beginning the x4140 with ESX installed would not talk to the 4 nvidia nics in the server when a device was in PCIe slot one (LSI Adapter). Windows Server talked to the NICs just fine no matter what slot the LSI adapter was in. Very weird behavior, but a BIOS update resolved this issue. Other than that, never try to update the SP bios while the server is running. Prolly something i should have learned before hand. Dell servers have taught me bad habits about updating bios while system is running.

Good info on the LSI controller and the ILOM. I'll have to look into that some.

0 Kudos
plusque
Contributor
Contributor

Thanks for this Post.

We are currently using the same hardware and are also having the same problem with this pegasus service.

therefore we are also waiting until vmware is fixing this problem.

0 Kudos
craigateb
Contributor
Contributor

We contacted Sun on this issue. They said a partial update is in U4 of ESX and a full fix should be in U5. After increasing the memory given to the service console from 200MB to 800MB and going to U4, we have not seen this issue return.

Craig

0 Kudos
plusque
Contributor
Contributor

Thanks, next week i'm going to make the update! wondering if this helps! regards

0 Kudos
udaykumar-blr
Contributor
Contributor

Hi,

Even with ESX 3.5 U4 , this issue is still exist. You just cannot escape with this issue:).

we had the same issue with HP BL680c servers, many of the hosts got hung and PSOD simultanesously,. I can see on many hosts cimserver was the culprit.

0 Kudos