VMware Cloud Community
thor918
Enthusiast
Enthusiast

cpu spikes on a poweredge 850

hi there.

I just finished up putting togheter a poweredge 850 with a perc5i sas raid card with two sata disks in raid 1.

The system seems fine, all is green in health. it has only 1GB ram at the moment. But runs just fine with that.

I just noticed that when I look at the preformance window for my cpu, there are regular cpuspikes.

it seems they accour every 10min

http://home.no.net/thor918/vmware/spikes.jpg

anyone have any clue why I get these spikes with a average 40%?

the spikes accoure even if no virtual machines are running.

0 Kudos
52 Replies
zaphod777
Contributor
Contributor

Hi,

I have the same problem on a Dell 2950... - did not try the resource allocation, as I read of the firmware update.

Unfortunatly, I cannot find any update dated 18th september.

I'm currently using ESX 3i update 2 (build 110271) on that machine. Which update/patch do you mean?

Thanks for your help!

0 Kudos
cimdad
VMware Employee
VMware Employee

Folks,

Could you mention whether you have any external storage attached? What kind? This would help VMware reproduce the problem.

Thanks.

0 Kudos
sapro27
Contributor
Contributor

Hi,

my DELL PE 2900 don't use external storage.

3 SAS HDD on internal PERC 6/i RAID Controller with RAID 5.

0 Kudos
Reefcrazed
Contributor
Contributor

I have the same exact spikes roughly every 10 minutes. My VM's are not doing any work and the spikes are not under there performance log, the spikes are on the localhost.domain. I have been watching it for a week now and the spikes are regular every 10 minutes. I am running ESXi update 2. I cannot install Update 3 because I am running two Opteron 8347 processors and for some sense of intelligence VMware decided not to let update 3 install on a pair of Opteron 8347's.

0 Kudos
joel_gibby
Contributor
Contributor

We are experiencing the same behavior 25% cpu spikes every 10 minutes on the dot) on a Dell PE2970, connected to a Dell MD3000i via iSCSI. The system also takes a LONG time to boot up (but does eventually) and seems to be hanging during some kind of iSCSI discovery.

0 Kudos
sapro27
Contributor
Contributor

I response a hanging of my guest-systems too. At same time of hanging i get this message:

Dec 30 07:44:19 vmkernel: 0:00:09:25.960 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=64813, status=bad0001, retval=bad0001

Dec 30 07:44:49 vmkernel: 0:00:09:55.966 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=64832, status=bad0001, retval=bad0001

Dec 30 07:45:16 vmkernel: 0:00:10:22.970 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=64841, status=bad0001, retval=bad0001

Dec 30 07:59:23 vmkernel: 0:00:24:29.163 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=209144, status=bad0001, retval=bad0001

Dec 30 08:01:35 vmkernel: 0:00:26:41.184 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216374, status=bad0001, retval=bad0001

Dec 30 08:04:14 vmkernel: 0:00:29:20.206 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216529, status=bad0001, retval=bad0001

Dec 30 08:08:38 vmkernel: 0:00:33:44.266 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216764, status=bad0001, retval=bad0001

Dec 30 08:09:23 vmkernel: 0:00:34:29.276 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216806, status=bad0001, retval=bad0001

Dec 30 08:09:53 vmkernel: 0:00:34:59.279 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216831, status=bad0001, retval=bad0001

Dec 30 08:11:26 vmkernel: 0:00:36:32.292 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216903, status=bad0001, retval=bad0001

Dec 30 08:11:50 vmkernel: 0:00:36:56.297 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216923, status=bad0001, retval=bad0001

Dec 30 08:12:08 vmkernel: 0:00:37:14.299 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216932, status=bad0001, retval=bad0001

Dec 30 08:12:38 vmkernel: 0:00:37:44.305 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216956, status=bad0001, retval=bad0001

Dec 30 08:14:17 vmkernel: 0:00:39:23.320 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=217018, status=bad0001, retval=bad0001

0 Kudos
cimdad
VMware Employee
VMware Employee

The CPU spike problem has been identified to be caused by a misbehaving LSI driver. The problem has been resolved in the upcoming ESX 4.0 as well as ESX 3.5 Update 4. Unfortunately, I do not have the release schedule of when these releases will be available to the general public. Thanks for your patience.

0 Kudos
joel_gibby
Contributor
Contributor

Well that's good news (except for not knowing when it will be resolved). In our case it doesn't seem to be impacting performance too much, but we're only running 6 servers on our 2970 right now. Thanks for the info.

Joel

0 Kudos
rexenov
Contributor
Contributor

what the update 4? i'm know about update 3 only?

0 Kudos
GregoryVasilyev
Contributor
Contributor

Here is my situation:

Two PowerEdge 2900 servers with PERC6/i and RAID5 SAS purchased at the same time. One was put into production right away, the second sat in a box for about two months, so at the time I configured the second one there were updates for BIOS, RAID firmware, and ESXi itself, which I happily installed... Smiley Sad

The first box runs fine. The second one (with updates) has CPU spikes every 10 minutes caused by the famous sfcb process.

1st (good) server's software versions:

RAID 6.0.3-0002, 1.11.82-0473 2008-06-06

BIOS 2.3.1 2008-04-29

ESXi 3.5.0, 113338

2nd (bad) server's software versions:

RAID 6.1.1-0047, 1.21.02-0528 2008-10-07

BIOS 2.5.0 2008-09-12

ESXi 3.5.0, 130755

I plan to do tests and revert each component's software version to see exactly which one contributes to the problem.

Thanks,

Gregory Vasilyev

gav@rochdale.com

0 Kudos
sapro27
Contributor
Contributor

I have the same configuration like yours. All updates from dell have no effect on the cpu spikes - you can spend you time Smiley Sad

The new patch from this weekend: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100666...

seems to be no help again.

You have tree options:

  • At the actual patchlevel you can reduce the max. cpu for sfcb-process

  • Go back to 3.5.0 update 2

  • wait (and hope) for update4.

cu

Roland

0 Kudos
aroudnev
Contributor
Contributor

Looks as the problem IS NOT resolved on 3.5.0 Update 4. Maybe (I did not try yet) PERC firmware upgrade can help (through I am not 100% sure because I don't see how can I check exact ESXi/installable release).

I don't see this problem on new 2950 and 2900 and 1950 (dell) servers which come with Embedded ESXi. But it exists on our first dell 2950 with older cpu and firmware.

On the good side:

  • problem is not of any serious impact - just set up cpu limit and (maybe) decrease the priority. In the worst case scenario your health monitor will have old data, not a big deal.

  • the spikes are on a single cpu, so in 2 cpu / 2 core servers they don't have any performance impact.

I am concerned more about syslog noise created by this thing. And good point - the problem don't exist on the newer dell servers.

And, btw, the spikes appear ONLY after ESXi update.

0 Kudos
Kima95
Contributor
Contributor

I have a similar problem. Got peaks about every 2 minutes. Peaks are generated by the console-process (seen in esxtop) and are between 800 and 1500 MHz.

I have 2 Dell PowerEdge 2900 Servers with ESX 3.5.0, 123630. Same problem on both servers!! VMFS-volumes are on a SAN. VirtualCenter not installed yet.

Decrease the limit under system resource allocation works as a workaround, but I don't think it's a good idea to do this for the console??

EDIT: Seen with top: The command kipmi0 is consumig that much CPU!!Probably same thing as in http://communities.vmware.com/thread/186454

0 Kudos