I'm experiencing an odd issue with two VMs registering high CPU demand vs low usage. In the interest of being thorough, I'll be giving a lot of info of the host and VM configuration below as to avoid generic responses.
vSphere Cluster is composed of 10 UCS B200 M4 hosts, each with dual sockets (E5-2680 v3) CPUs. Each CPU is 12 cores @ 2.5 GHz for a total of 24 pCPUs per host (48 with HT reported by the hypervisor). Each host also has 384 GB of RAM. Host are running ESXi 6 U1. This cluster is dedicated to SQL VMs. As such, memory and cpu in this cluster are not being over-committed in any way. There are several SQL VMs with different vCPU/vMem setups. We have plenty of spare capacity on the cluster.
The two VMs in question are 8 vCPU / 64GB Mem VMs running Windows Server 2012 R2 Datacenter and SQL 2012 Enterprise. Each VM sits on its own host at the moment, meaning they are the ONLY WORKLOAD on the host they live in. We didn't mess with sockets/cores settings in the VM settings so they are the default 8 sockets/ 1 core setup. Thus we have verified the 8 vCPUs in the guest are in a single NUMA node. There are no reservations/limits set on the VM on either CPU or Memory. VM tools is running and current on both VMs. VM version level is 11 (ESXi 6.0 and later).
The issue we are experiencing is that even though the CPU usage on the VMs is averaging 20-25 percent, the CPU demand is pegged at 20Ghz (8 vCPUs x 2.5 Ghz). So vRealize is alerting about it. No other VMs in the cluster is having this behavior, some with the same amount of vCPUs configured, some with more. It's only these two VMs.
The first thing that came to my mind was 'Power Management' is not allowing the Host to give the VM all the CPU power its requesting. However I've verified this is not the case. Everything is setup to High Performance. And I've confirmed it further by doing the following:
- If I spin a stresslinux VM on the host with a similar vCPU configuration as these VM, I can bring all the vCPUs to full utilization.
- On the actual VMs having the issue, I can spin up two instances of CPUSTRESS from Sysinternals and bring the vCPUs to full 100% utilization (Don't tell the DBA about this).
- Heck, even if SQL is NOT RUNNING, CPU demand still won't go down despite CPU usage being 1%.
So I don't understand why the CPU demand counter is pegged at the full 20GHz allocated to the VM when the CPU usage is clearly nowhere near that AND there's nothing preventing the Host from giving the VM all the resources it demands. I've gone over every single setting I can think of and I'm not finding anything different on these VMs from other similarly configured ones NOT having the issue. Again, the VMs having this issue are both sitting on separate Hosts and are the only workload on them. Meaning only 8 pCPUs out of 24 on the host are in use.
Other things I've tried: moving the VMs to other hosts, and rebooting the VMs.
Any insights would be appreciated. At this point I'm simply out of ideas.