vSphere vcpu scheduling imbalance

ltbraswell · ‎10-29-2009

My group has been doing some testing comparing virtual to bare-metal performance with a particularly high-compute utilizing application.

Platform: Intel E5540 (Nehalem) quad-core, dual-socket. 48GB of 1066MHz memory (properly balanced)

OS/guest OS: RHEL 4.7 (for both bare-metal and virtual configs)

Configuration: Hyper-threading off for both esx and bare-metal. Numa enabled and verified on both ESX and RHEL 4.7 (bare-metal)

ESX version: 4.0u0 build 164009

ESX host power.CpuPolicy is set to "static"

The application is a single-threaded modeling/simulation type program. It is not I/O intensive at all and basically each job generates a load average contribution of 1 (i.e. it pegs a cpu for each job). What we have been doing is running 1,2,4, and 8 simultaneous jobs and comparing the virtual to bare-metal physical performance with different combinations of VM vcpu counts (1,2,4 vcpu). Don't know a lot of details about the application other than when scaling from 1 to 8 jobs on bare-metal, the performance of any one job degrades somewhat so it seems to be memory constrained (not sure if the is sensitive to last level cache size or raw RAM performance). The "model" size itself for each job is 1.5GB.

The observation is that when simultaneously running 2 and 4 of these applications, each in its own 2vcpu virtual (so 2 and 4 virtuals respectively but all running on one 2-socket physical). More often that not for the 2 job case, we will see the worlds representing these active vcpus both running on 1 socket (with the other socket basically idle). For the 4 job case (though not as frequently), we have seen the load split 3/1 across the 2 sockets. When we run 8 jobs we get an even distribution of active vcpu worlds across the 2 sockets (4 on each).

The question is: Why is it that when there are 2 active vcpus does ESX schedule these vcpus on one socket more often than not. Same thing for 4 active vcpus, lopsided scheduling 3/1, though not as frequently. The hypothesis is that this behavior is related to some power management function of the scheduler (i.e. it would rather load up one socket more fully so that the cores on the other socket have a chance to be freq scaled down or in the halted state and therefore save power).

Ordinarily for an application that is not sensitive to memory performance or last level cache size this might not be an issue but for our application it apparently is sensitive the load being spread across sockets.

We can workaround this lopsided scheduling by using affinity rules to pin virtuals to sockets and manually force the load to be spread but we were hoping for some way to get it to balance the load without using affinity rules. Note that our hosts are running with power.CpuPolicy static.

Any ideas?

Message was edited by: ltbraswell changed subject to remove the reference to version 4

vSeanClark · ‎05-03-2010

Check this link out: http://communities.vmware.com/docs/DOC-5501 and look at "Cell Size". That's the reason you see VMs processon on one socket at a time. Only way to get them on two sockets would be to have 8 vCPU VM on system with multiple quad core sockets. In that case the cell size will have to expand to run an 8-vCPU VM.

Please consider awarding points for "Correct" or "Helpful".

Sean Clark - vExpert, VCP - http://twitter.com/vseanclark - http://seanclark.us

Sean Clark - http://twitter.com/vseanclark

All

vSphere vcpu scheduling imbalance