cpu spikes on a poweredge 850 - Page 3

thor918 · ‎08-31-2008

hi there.

I just finished up putting togheter a poweredge 850 with a perc5i sas raid card with two sata disks in raid 1.

The system seems fine, all is green in health. it has only 1GB ram at the moment. But runs just fine with that.

I just noticed that when I look at the preformance window for my cpu, there are regular cpuspikes.

it seems they accour every 10min

http://home.no.net/thor918/vmware/spikes.jpg

anyone have any clue why I get these spikes with a average 40%?

the spikes accoure even if no virtual machines are running.

zaphod777 · ‎12-16-2008

Hi,

I have the same problem on a Dell 2950... - did not try the resource allocation, as I read of the firmware update.

Unfortunatly, I cannot find any update dated 18th september.

I'm currently using ESX 3i update 2 (build 110271) on that machine. Which update/patch do you mean?

Thanks for your help!

cimdad · ‎12-17-2008

Folks,

Could you mention whether you have any external storage attached? What kind? This would help VMware reproduce the problem.

Thanks.

sapro27 · ‎12-18-2008

Hi,

my DELL PE 2900 don't use external storage.

3 SAS HDD on internal PERC 6/i RAID Controller with RAID 5.

Reefcrazed · ‎12-19-2008

I have the same exact spikes roughly every 10 minutes. My VM's are not doing any work and the spikes are not under there performance log, the spikes are on the localhost.domain. I have been watching it for a week now and the spikes are regular every 10 minutes. I am running ESXi update 2. I cannot install Update 3 because I am running two Opteron 8347 processors and for some sense of intelligence VMware decided not to let update 3 install on a pair of Opteron 8347's.

joel_gibby · ‎12-29-2008

We are experiencing the same behavior 25% cpu spikes every 10 minutes on the dot) on a Dell PE2970, connected to a Dell MD3000i via iSCSI. The system also takes a LONG time to boot up (but does eventually) and seems to be hanging during some kind of iSCSI discovery.

sapro27 · ‎12-30-2008

I response a hanging of my guest-systems too. At same time of hanging i get this message:

Dec 30 07:44:19 vmkernel: 0:00:09:25.960 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=64813, status=bad0001, retval=bad0001

Dec 30 07:44:49 vmkernel: 0:00:09:55.966 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=64832, status=bad0001, retval=bad0001

Dec 30 07:45:16 vmkernel: 0:00:10:22.970 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=64841, status=bad0001, retval=bad0001

Dec 30 07:59:23 vmkernel: 0:00:24:29.163 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=209144, status=bad0001, retval=bad0001

Dec 30 08:01:35 vmkernel: 0:00:26:41.184 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216374, status=bad0001, retval=bad0001

Dec 30 08:04:14 vmkernel: 0:00:29:20.206 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216529, status=bad0001, retval=bad0001

Dec 30 08:08:38 vmkernel: 0:00:33:44.266 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216764, status=bad0001, retval=bad0001

Dec 30 08:09:23 vmkernel: 0:00:34:29.276 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216806, status=bad0001, retval=bad0001

Dec 30 08:09:53 vmkernel: 0:00:34:59.279 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216831, status=bad0001, retval=bad0001

Dec 30 08:11:26 vmkernel: 0:00:36:32.292 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216903, status=bad0001, retval=bad0001

Dec 30 08:11:50 vmkernel: 0:00:36:56.297 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216923, status=bad0001, retval=bad0001

Dec 30 08:12:08 vmkernel: 0:00:37:14.299 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216932, status=bad0001, retval=bad0001

Dec 30 08:12:38 vmkernel: 0:00:37:44.305 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=216956, status=bad0001, retval=bad0001

Dec 30 08:14:17 vmkernel: 0:00:39:23.320 cpu2:1062)LinSCSI: 3201: Abort failed for cmd with serial=217018, status=bad0001, retval=bad0001

cimdad · ‎12-30-2008

The CPU spike problem has been identified to be caused by a misbehaving LSI driver. The problem has been resolved in the upcoming ESX 4.0 as well as ESX 3.5 Update 4. Unfortunately, I do not have the release schedule of when these releases will be available to the general public. Thanks for your patience.

joel_gibby · ‎12-30-2008

Well that's good news (except for not knowing when it will be resolved). In our case it doesn't seem to be impacting performance too much, but we're only running 6 servers on our 2970 right now. Thanks for the info.

Joel

rexenov · ‎01-20-2009

what the update 4? i'm know about update 3 only?

GregoryVasilyev · ‎02-02-2009

Here is my situation:

Two PowerEdge 2900 servers with PERC6/i and RAID5 SAS purchased at the same time. One was put into production right away, the second sat in a box for about two months, so at the time I configured the second one there were updates for BIOS, RAID firmware, and ESXi itself, which I happily installed...

The first box runs fine. The second one (with updates) has CPU spikes every 10 minutes caused by the famous sfcb process.

1st (good) server's software versions:

RAID 6.0.3-0002, 1.11.82-0473 2008-06-06

BIOS 2.3.1 2008-04-29

ESXi 3.5.0, 113338

2nd (bad) server's software versions:

RAID 6.1.1-0047, 1.21.02-0528 2008-10-07

BIOS 2.5.0 2008-09-12

ESXi 3.5.0, 130755

I plan to do tests and revert each component's software version to see exactly which one contributes to the problem.

Thanks,

Gregory Vasilyev

gav@rochdale.com

sapro27 · ‎02-02-2009

I have the same configuration like yours. All updates from dell have no effect on the cpu spikes - you can spend you time

The new patch from this weekend: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100666...

seems to be no help again.

You have tree options:

At the actual patchlevel you can reduce the max. cpu for sfcb-process
Go back to 3.5.0 update 2
wait (and hope) for update4.

cu

Roland

aroudnev · ‎03-14-2009

Looks as the problem IS NOT resolved on 3.5.0 Update 4. Maybe (I did not try yet) PERC firmware upgrade can help (through I am not 100% sure because I don't see how can I check exact ESXi/installable release).

I don't see this problem on new 2950 and 2900 and 1950 (dell) servers which come with Embedded ESXi. But it exists on our first dell 2950 with older cpu and firmware.

On the good side:

problem is not of any serious impact - just set up cpu limit and (maybe) decrease the priority. In the worst case scenario your health monitor will have old data, not a big deal.

the spikes are on a single cpu, so in 2 cpu / 2 core servers they don't have any performance impact.

I am concerned more about syslog noise created by this thing. And good point - the problem don't exist on the newer dell servers.

And, btw, the spikes appear ONLY after ESXi update.

Kima95 · ‎03-26-2009

I have a similar problem. Got peaks about every 2 minutes. Peaks are generated by the console-process (seen in esxtop) and are between 800 and 1500 MHz.

I have 2 Dell PowerEdge 2900 Servers with ESX 3.5.0, 123630. Same problem on both servers!! VMFS-volumes are on a SAN. VirtualCenter not installed yet.

Decrease the limit under system resource allocation works as a workaround, but I don't think it's a good idea to do this for the console??

EDIT: Seen with top: The command kipmi0 is consumig that much CPU!!Probably same thing as in http://communities.vmware.com/thread/186454