We are running a planned high cpu intensive migration that requires high cpu usage of multiple VMs. We found it was going unusually slow until we changed cpu resource shares from NORMAL to HIGH. (we don't have DRS enabled, and did this at the VM level)
Prior to this switch, there was no cpu contention according to vcenter: all other VMs were at our normal state of near idle cpu usage, and there was a fair amount of spare idle cpu % on each esxi host. After flipping to HIGH on our 10 VMs pertinent to this load, we instantly saw a bump to normal/expected levels. (i.e. cpu usage reported in both vsphere and in the windows VMs started grabbing all available cpu)
Shouldn't these share settings only come into play if there is contention for resources? What could I be missing here?
Thanks,
Jaime
If you could reproduce this issue all the time, please contact VMware support.
Please provide more details about your:
- ESXi version information
- physical hardware and CPUs
- physical thread to vCPU ratios
- kind of workload/OSes/applications you're running
- order of magnitude of the improvement
Also have a look at the CPU ready time (%RDY) and %CSTP counter. In general I agree with you assumption, but even if you have plenty of spare CPU cycles, contention can still occur if too many vCPUs request being scheduled at the same time on a lower number of available physical cores/threads.
Hi!
- esxi 5.1.0,799733
- ten hosts, all with dual socket that are each 4 or 6 core intel nehalem class or higher, all with HT enabled
- average of 30 VMs per host, anywhere from 1 to 6 vcpu
- mostly win2003/2008r2
- super varied workload overall, but this current migration job is 90% of cpu load: it's basically lots of OCR, PDF image enhancement, and file copying
- all VMs involved in this migration are 6 vcpu.
Here's esxtop with cpu share set to normal for VM co-nt-iap9:
And here it is when set to high:
%USED will bump up by about 40%, this is in line with what I saw last week. The other metrics look good...
Do you see the same behavior on a host that runs no or very few other VMs as well? Is the increase of %USED similar to with what you observe inside the GuestOS in terms of %CPU utilization and application performance?
I can't explain why you see this, but to rule out issues with dynamic frequency adjustment of CPU power saving features, can you check the physical hosts power management settings in the BIOS? Set it to static high performance or to OS control, and in the latter case you may also want to set the ESXi power management settings to static high:
http://blogs.vmware.com/performance/2013/05/power-management-and-performance-in-esxi-5-1.html
Also I think a 4+ vCPU VM on a 4-core CPU physical server like in your screenshots (even with HT) may present some drawback due to NUMA constraints:
http://frankdenneman.nl/2010/02/03/sizing-vms-and-numa-nodes/
http://frankdenneman.nl/2010/09/13/esx-4-1-numa-scheduling/
Is it the same when you run it on a 6-core host? Is CPU hot-add enabled on the VM (this automatically disables wide NUMA)?
As far as I can guess you have more vCPU then pCPU so this is when the CPU scheduler needs to schedule the workloads. When you use "normal" shares the CPU's are treated equally. This can impact the performance of your "performing" machines. When you switch some machines to "High shares" this means that the workloads of those machines should get more priority and more CPU time then other machines. This results in your case to a better result on the "performing" machines, but your other machines will get even less CPU run time.
As you can see in ESXTOP ready time the scheduler is busy an X percentage of the sample size to schedule the workload to a pCPU. You see it drop a little between the two images, but this could be luck.
You could try to put the "performance" machine on a host without (or less) CPU overcommit, (shutdown some VM's, lower CPU's on some VM's, migrate to another host) and put the shares back to "Normal". I guess that the performance will be better
I hope this gives you a bit of hint in the right direction ![]()
Some useful information:
Reservations and CPU scheduling - frankdenneman.nl
https://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
Been unable to get super consistent results or find time to check bios.
Convinced it is indeed related to number of VM's per host, just not sure exactly how.
Thanks for your help guys, will write back if I find some clear answers.
