VMware Cloud Community
virtualxchange
Contributor
Contributor

100%CPU exchange VM cured by vmotion

We've got ourselves a curious problem. We migrated our cluster with 4 hosts from ESX4.0 to ESXi4.1 and moved the hosts in a perimeter network. (HP media used and new install). Our Exchange 2003 sp2 server suddenly peaks after a day's use. Only a live vmotion settles the vm down. There's no ballooning. No resourcepool restriction. No contention and other vm's on the host have no problem whatsoever when exchange peaks. I've got a hard time troubleshooting the machine.

Within the VM there's no process peaking but the taskmanager performance flatlines at 100%. KB1001133 is not applicable here since the store is not peaking at all. There's 1vCPU on a uniprocessor HAL. No CPU reservations. No memory problems. Only a live migration fixes it.

esxtop busy:

ID GID NAME NWLD %USED %RUN %SYS %WAIT %RDY

642719 642719 exchange 4 100.34 101.84 0.12 297.95 0.28

esxtop after migration

642719 642719 exchange 4 6.44 16.33 0.20 383.21 0.47

I temporarily removed the vmware tools. Stopped the store and iis. Disabled and uninstalled dataprotector and antivirus. A reboot or shutdown does not settle the VM down... only a live migration does. This happens every day once.

Reply
0 Kudos
7 Replies
Deepeer
VMware Employee
VMware Employee

>>Within the VM there's no process peaking but the taskmanager performance flatlines at 100%.

This means something within the VM is hogging CPU ..

Can you check if the kernel is peaking ?? Task Manager->View->Show Kernel Times

You could also get performance diagnostic data using

http://www.microsoft.com/downloads/en/details.aspx?familyid=09115420-8c9d-46b9-a9a5-9bffcd237da2&dis...

Reply
0 Kudos
virtualxchange
Contributor
Contributor

There's no specific process responsible for the CPU Hog. It just totals to 100%

check the differences between hog and normal user (after vmotion). resp. hog.gif vs normal.gif

Kernel times during the CPU hog are normal? check kernel.gif

I've found something to replicate the CPU hog. When I run "C:\WINDOWS\system32\winmsd.exe /report C:\SysInfo.Txt" the vm hogs... like a pig. Even after the winmsd job is finished. It does so until it's vmotioned as stated in the subject. Since the winmsd job invokes "wmiprvse.exe" could it be a corrupt WMI repository? If so, I'll try "rundll32 wbemupgd, RepairWMISetup" after production hours.

Reply
0 Kudos
Deepeer
VMware Employee
VMware Employee

Also you might want to check if there is a registry corruption

Reply
0 Kudos
virtualxchange
Contributor
Contributor

made a support call:

Had the /3GB switch in the boot.ini because of but ESX4.1 has some trouble with this so I used . (from the 14th of september!!) Fixed with "Use Intel VT-X/AMD-V for instruction set virtualization and Intel EPT/AMD RVI for MMU virtualization"

The permanent fix for this issue has been designed and is included in ESX 4.1 Update 1

/edit: the /3GB CPU load problem we had is fixed in http://kb.vmware.com/kb/1027021. It's in ESXi410-201010401-SG.

If you start a Microsoft Windows Server 2003 32-bit virtual machine with /3GB switch defined in the boot.ini file on VMware ESXi 4.1, you might see the following symptoms:

  • Read or Write memory errors occur in the guest operating system.

  • A Remote Procedure Call (RPC) error is reported and the virtual machine is forced to reboot often.

  • A stop code of type 0x000000F4 occurs.

  • Microsoft .NET or Java applications might fail with memory errors.

  • The Microsoft Windows Event log might contain error messages similar to the following:
    Event Type: Error
    Event Source: .NET Runtime
    Event Category: None
    Description:.NET Runtime version 2.0.50727.3615 - Fatal Execution Engine Error (7A0979AE) (80131506)

Reply
0 Kudos
morrisosu
Contributor
Contributor

We are having a very similar problem with our VMware View cluster.  We have Windows XP Pro SP3 32bit guest VMs and they will randomly get stuck consuming around 35-40% CPU each and max out the hosts CPU power, but when you vmotion the guests around the issues immediately cure themselves. You can even vmotion to another host and then right back to the same host and the CPU usage will normalize.  We are running ESXi 4.1.0 #320137.

I am very open to suggestions!

Thanks so much,

Shane

Reply
0 Kudos
morrisosu
Contributor
Contributor

Has anyone ran across the problem I mentioned above?  I have Windows XP SP3 32bit guests that get stuck with high CPU to the point where they consume all of the ESX hosts availble CPU, but if I vMotion the guest VMs the issue clears up??  It happens randomly and this far I have found what is causing the issue.

Shane

Reply
0 Kudos
Speedbmp
Enthusiast
Enthusiast

i have seen some problems like this for view and other servers. here is what i found that fixed my problem.

edit the settings on the virtual machine

click resources

on the high lighted cpu make sure the check box is checked for unlimited

the click and high light memory and then check to make sure it is also checked for unlimited.

the problem i was having was the memory was being swaped out, ballooned, Compressed etc. and once i checked those boxes my problems went away.

Stephen

Reply
0 Kudos