Hi all,
I've been struggling with one particular host for a while and am unable to get to the root of the cause. What we've got so far:
Physical:
1 host, 2 quadcores with HT (8 cores total, 16 logical due to HT), 24gb ram
HP Proliant DL380 G6
Virtual:
1x Windows 2008R2 DC/FS 2vcpu
1x Windows 2008R2 SQL 4vcpu
1x Windows 2008R2 Exchange 2010 1vcpu
1x Windows 2012R2 RDS Currently 8vcpu
Recently, the old 2008R2 RDS was replaced with the new 2012R2 RDS. This ran well, nothing special, no complaints, only praise since it was quite a bit snappyier than the previous one.
Then, (and I haven't figured out what happened yet, could have been power failure) after about a week the Esxi host froze and we had to reboot. Servers started and all was well, until the problems began.
Currently, the 2012R2 VM is maxing out at 100% cpu usage in Windows Taskmanager, across all virtual cores. This can be achieved by something as simple as launching chrome and Outlook at the same time. Performance on this VM is terrible. All the other VM's are affected too, extremely high cpu usage in Windows, maxing out even with simple tasks. If I open up vSphere and check the CPU usage, this is nowhere near maximum capacity. The RDS vm hasn't gone over 3000mhz maximum.
What we've tried so far:
- Reboot the host
- Reboot the VM's
- Changed the power management from the HP High Performance setting to OS Control, and set the powersetting to high performance in vsphere. I can see through esxtop that all cores are at c0 permanently. (Couldn't check this before due to ILO2 failing to display this when HP High performance mode is on)
- I've tried all sorts of Vcpu assignments, down to 1 vcpu/1core all the way to 8 sockets on the 2012RDS. There is no resource pool, there are no reservations (I tried those, unfortunately, no effect)
- Shut down all VM's apart from one, (to test whether there is some sort of cpu contention), still the same.
I think something went belly up when the host rebooted, just unable to figure out why, and apart from the insane cpu usage, the server now runs stable.
Any input is greatly appreciated.
9:03:46pm up 1:52, 344 worlds, 4 VMs, 15 vCPUs; CPU load average: 0.19, 0.26, 0.22
PCPU USED(%): 2.5 2.6 2.5 3.0 4.1 2.1 2.7 1.9 2.9 5.7 2.2 2.4 1.9 1.7 2.3 2.4 AVG: 2.7
PCPU UTIL(%): 19 19 19 21 28 18 20 16 17 28 16 16 13 14 14 16 AVG: 19
CORE UTIL(%): 33 35 40 31 40 27 23 26 AVG: 32
ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %MLMTD %SWPWT
3785 3785 XXXXX 14 24.00 153.48 0.13 1243.64 11.57 2.75 633.36 0.81 0.00 0.00 0.00
10167 10167 XXXX01 10 6.87 54.83 0.11 944.21 3.31 0.95 341.84 0.21 0.00 0.00 0.00
3361 3361 XXXXX01 7 3.21 13.37 0.05 686.25 0.23 0.35 87.04 0.00 0.00 0.00 0.00
3349 3349 XXXXe01 10 1.64 8.89 0.06 990.63 0.28 0.50 191.21 0.02 0.00 0.00 0.00