VMware Cloud Community
sebster
Contributor
Contributor

CPU peaks every 10 minutes, no VMs running

Hi,

I have 3 dell 2950 machines running ESXi which have CPU utilization spikes up to 75% in the performance log every 10 minutes even when there are no VMs running at all. I upgraded one of them today to the latest build (153840), but it still has this problem.

In the logs I also find the following messages (this is after a reboot and no virtual machines are running at all):

Mar 30 13:32:15 -- MARK --

Mar 30 13:32:42 vmkernel: 0:00:40:30.772 cpu4:1524)WARNING: UserThread: 406: Pee

r table full for sfcbd

Mar 30 13:32:42 vmkernel: 0:00:40:30.772 cpu4:1524)WARNING: World: vm 10004: 910

: init fn user failed with: Out of resources!

Mar 30 13:32:42 vmkernel: 0:00:40:30.772 cpu4:1524)WARNING: World: vm 10004: 177

5: WorldInit failed: trying to cleanup.

Mar 30 13:32:44 sfcb[1412]: Process "lsi_storage" PID is 10056

Mar 30 13:32:44 vmkernel: 0:00:40:33.194 cpu7:1524)WARNING: UserThread: 406: Peer table full for sfcbd

Mar 30 13:32:44 vmkernel: 0:00:40:33.194 cpu7:1524)WARNING: World: vm 10129: 910: init fn user failed with: Out of resources!

Mar 30 13:32:44 vmkernel: 0:00:40:33.194 cpu7:1524)WARNING: World: vm 10129: 1775: WorldInit failed: trying to cleanup.

Mar 30 13:33:15 ntpd[1322]: synchronized to 193.48.168.130, stratum 2

Does anybody know what could be wrong and what I could do to resolve this issue?

Regards,

Sebastiaan

0 Kudos
12 Replies
kjb007
Immortal
Immortal

Are you running ESXi installable or embedded? Make sure your image is good: http://communities.vmware.com/message/1083040

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
sebster
Contributor
Contributor

On 2 of 2950's it's the installable, on one of them it's embedded. All 3 of them are having this issue.

The one I upgraded to the latest version is an installable ESXi.

I saw the post you linked to, but since 2 of my servers don't use the embedded I figured that couldn't be the issue. Also, reinstalling is kind of not an option (not an easy option anyway), considering the systems are running in production. Also if I reinstall then I still don't know how it came about, and I can't reinstall my production systems on a regular basis.

Regards,

Sebastiaan

0 Kudos
kjb007
Immortal
Immortal

What kind of storage are you using? Local or SAN? FC / iSCSI? How many disks/LUNs?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
sebster
Contributor
Contributor

Local storage, Dell PERC 6/i raid controller with RAID5; the two configurations with installable have 5 disks (41 hot spare) with a total of 836GB, the other one has Dell PERC 6/i with RAID10,5 disks (41 hot spare) with a total of 1.36TB.

0 Kudos
RParker
Immortal
Immortal

Also, reinstalling is kind of not an option (not an easy option anyway)

Odd, since your post shows 'no VMs running'. So that means you were able to bring the VM's down. And why isn't an easy option? AN install of ESX takes less than 15 minutes. I would try the console ESX 3.5 U3 version. We have a few 2950, never had a problem with any of them with any of the versions of ESX.

0 Kudos
sebster
Contributor
Contributor

Yes, that's on one of the servers, which is not in produciton yet. Sure I can reinstall that one now (though I have to go to the datacenter 100km away every time I want to do that), but soon it'll be running production as well, complicating the matters even more.

And reinstalling might fix it for now, but it gives no clue as to what caused the problem or how to avoid it in the future.

Regards,

Sebastiaan

0 Kudos
sebster
Contributor
Contributor

Yes, that's on one of the servers, which is not in produciton yet. Sure I can reinstall that one now (though I have to go to the datacenter 100km away every time I want to do that), but soon it'll be running production as well, complicating the matters even more.

And reinstalling might fix it for now, but it gives no clue as to what caused the problem or how to avoid it in the future.

Regards,

Sebastiaan

0 Kudos
sebster
Contributor
Contributor

Yes, that's on one of the servers, which is not in produciton yet. Sure I can reinstall that one now (though I have to go to the datacenter 100km away every time I want to do that), but soon it'll be running production as well, complicating the matters even more.

And reinstalling might fix it for now, but it gives no clue as to what caused the problem or how to avoid it in the future.

Regards,

Sebastiaan

0 Kudos
kjb007
Immortal
Immortal

What about HT? If you have no vm's running, I would think maybe the hardware agents may be doing something. Have you restarted the management agents during this period to see if the CPU spikes continue? Are they regular, or intermittent?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
Dave_Mishchenko
Immortal
Immortal

Have you tried running rexstop from the RCLI to see what process is causing the CPU spikes?

sebster
Contributor
Contributor

I have rebooted the one machine, and also restarted the agents using

/sbin/services.sh restart

The other ESXi machines *do* run vm's, but they're also spiking every 10 minutes. One of them runs only one VM at the moment and the machine performance shows the spikes, while the VM perforamance graph does not. It's also very regular, *every* 10 minutes, it never skips.Regards,

Sebastiaan 

0 Kudos
kjb007
Immortal
Immortal

This appears to be a known issue. Not sure of resolution, but there's a thread open already: http://communities.vmware.com/thread/166114

There is a workaround for adjusting resource allocation for the CIM server.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB