Skip navigation

I spent a great deal of time answering customers' questions about the scheduler.  Never have so many questions been asked about such an abstruse component for which so little user influence is possible.  But CPU scheduling is central to system performance, so VMware strives to provide as much information on the subject as possible.  In this blog entry, I want to point out a few nuggets of information on the CPU scheduler.  These four bullets answer 95% of the questions I get asked.


Item 1: ESX 4's Scheduler Better Uses Caches Across Sockets

On UMA systems with low load levels, virtual machine performance improves when each virtual CPU (vCPU) is placed on its own socket.  This is because providing each vCPU its own socket also give it the entire cache on that CPU.  On page 18 of a recent paper on the scheduler written by Seongbeom Kim, a graph highlights the case where vCPU spreading improves performance.




The X-axis represents different combinations of VM and vCPU counts.  SPECjbb is memory intensive and shows great gains with increases in CPU cache.  The few cases that show dramatic benefit due to the ESX 4.0 scheduler are benefiting from the distribution of vCPUs across sockets.  Very large gains are possible in this somewhat uncommon case.


Item 2: Overuse of SMP Only Slows Consolidated Environments At Saturation

For years customers have asked me how many vCPUs they should give to their VMs.  The best guidance, "as few as possible", seems too vague to satisfy.  It remains the only correct answer, unfortunately.  But a recent experiment performed by Bruce Herndon's team sheds some light on this VM sizing question.


In this experiment we ran VMmark against VMs that were configured outside of VMmark specifications.  In one case some of the virtual machines were given too few vCPUs and in another they were given too many.  Because VMmark's workload is fixed, changing VM sizes does not alter the amount of work performed by the VMs.  In other words, the system's score does not depend on the VMs' vCPU count.  Until CPU saturation, that is.




Notice that the scores are similar between the undersized, right-sized, and over-sized VMs.  Up until tile 10 (60 VMs) they are nearly identical.  There is a slight difference in processor utilization that begins to impact throughput (score) as the system runs out of CPU.  At that point wasted cycles dedicated to unneeded vCPUs negatively impact the system performance.  Two points I will call out from this work:


  • Sloppy VI admins that provide too many vCPUs need not worry about performance when their servers are under low load.  But performance will suffer when CPU utilization spikes.

  • The penalty of over-sizing VMs gets worse as VMs get larger.  Using a 2-way VM is not that bad, but unneeded use of 4-way VM when one or two processors suffice can cost up to 15% of your system throughput.  I presume that unnecessarily eight vCPUs would be criminal.


Item 3: ESX Has Not Strictly Co-scheduled Since ESX 2.5

I have documented ESX's relaxation of co-scheduling previously (Co-scheduling SMP VMs in VMware ESX Server).  But this statement cannot be repeated too frequently: ESX has not strictly co-scheduled virtual machines since version 2.5.   This means that ESX can place vCPUs from SMP VMs individually.  It is not necessary to wait for physical cores to be available for every vCPU before starting the VM.  However, as Item 3 pointed out, this does not give you free license to over-size your VMs.  Be frugal with your SMP VMs and assign vCPUs only when you need them.


Item 4: The Cell Construct Has Been Eliminated in ESX 4.0

In the performance best practices deck that I give at conferences I talk about the benefits of creating small virtual machines over large ones.  In versions of ESX up to ESX 3.5, the scheduler used a construct called a cell that would contain and lock CPU cores.  The vCPUs from a single VM could never span a cell.  With a ESX 3.x's cell size of four this meant that VMs never spanned multiple four-core sockets.  Consider this figure: 


What this figure shows is that a four-way VM on ESX 3.5 can only be placed in two locations on this hypothetical two-socket configuration.  There are 12 combinations for a two-way VM and eight for a uniprocessor VM.  The scheduler has more opportunities to optimize VM placement when you provide it with smaller VMs.


In ESX 4 we have eliminated the cell lock so VMs can span multiple sockets, as item one states.  Continue to think of this placement problem as a challenge to the scheduler that you can alleviate.  By choosing multiple, smaller VMs you free the scheduler to pursue opportunities to optimize performance in consolidated environments.

Just over a week ago I had the privilege of riding along with VMware's Professional Services Organization as they piloted a possible performance offering.  We are considering two possible services: one for performance troubleshooting and another for infrastructure optimization.  During this trip we piloted the troubleshooting service, focusing on the customer's disappointing experience with SQL Server's performance on vSphere.


If you have read my blog entries (SQL Server Performance Problems Not Due to VMware) or heard me speak, you know that SQL performance is a major focus of my work.  SQL Server is the most common source of performance discontent among our customers, yet 100% of the problems I have diagnosed were not due to vSphere.  When this customer described the problem, I knew this SQL Server issue was stereotypical of my many engagements:

"We virtualized our environment nearly a year ago and and quickly determined that virtualization was not right for our SQL Servers.  Performance dropped by 75% and we know this is VMware's fault because we virtualized on much newer hardware on the exact same SAN.  We have since moved the SQL instance back to native."

Most professionals in the industry stop here, incorrectly bin this problem as a deficiency of virtualization, and move on with their deployments.  But I know that vSphere's abilities with SQL Server are phenomenal, so I expect to make every user happy with their virtual SQL deployment. I start by challenging the assumptions and trust nothing that I have not seen for myself.  Here are my first steps on the hunt for the source of the problem:


  1. Instrument the SQL instance that has been moved back to native to profile its resource utilization.  Do this by running Perfmon to collect stats on the database's memory, CPU, and disk usage.

  2. Audit the infrastructure and document the SAN configuration.  Primarily I will need RAID group and LUN configuration and an itemized list of VMDKs on each VMFS volume.

  3. Use esxtop and vscsiStats to measure resource utilization of important VMs under peak production load.


There are about a dozen other things that I could do here, but my experience in these issues is that I can find 90% of all performance problems with just these three steps.  Let me start by showing you the two RAID groups that were most important to the environment.  I have greatly simplified the process of estimating these groups' performance, but the rough estimate will serve for this example:


RAID Group


Performance Estimate


RAID5 using 4 15K disks

4 x 200 = 800 IOPS


RAID5 using 7 10K disks

7 x 150 = 1050 IOPS

We found two SQL instances in their environment that were generating significant IO: one that had been moved back to native and one that remained in a virtual machine.  By using Perfmon for the native instance and vscsiStats the virtual one, we documented the following demands during a one-hour window:


SQL Instance


Average IOPS

X (physical)



Y (virtual)




In the customer's first implementation of the virtual infrastructure, both SQL Servers, X and Y, were placed on RAID group A.  But in the native configuration SQL Server X was placed on RAID group B.  This meant that the storage bandwidth of the physical configuration was approximately 1850 IOPS.  In the virtual configuration the two databases shared a single 800 IOPS RAID volume.


It does not take a rocket scientist to realize that users are going to complain when a critical SQL Server instances goes from 1050 IOPS to 400.  And this was not news to the VI admin on-site, either.  What we found as we investigated further was that virtual disks requested by the application owners were used in unexpected and undocumented ways and frequently demanded more throughput than originally estimated.  In fact, through vscsiStats analysis (Using vscsiStats for Storage Performance Analysis), my contact and I were able to identify an "unused" VMDK with moderate sequential IO that we immediately recognized as log traffic.  Inspection of the application's configuration confirmed this.


Despite the explosion of VMware into the data center we remain the new kid on the block.  As soon as performance suffers the first reaction is to blame the new kid.   But next time you see a performance problem in your production environment, I urge you to look at the issue as a consolidation challenge, and not a virtualization problem.  Follow the best practices you have been using for years and you can correct this problem without needing to call me and my colleagues to town.


Of course, if you want to fly us out for to help you correct a specific problem or optimize your design, I promise we will make it worth your while.

Last week Chris Wolf moderated a debate on virtual platform performance between myself and Simon Crosby, CTO of Citrix.  A recording of the debate was put online shortly after its conclusion.


Simon and I disagreed on a few issues and demonstrated different strategies in the discussion.  My goal in representing the fine efforts of our performance team was to show to the audience VMware's commitment to product performance.  This commitment is demonstrated through a never ending series of benchmark publications and continual product improvement.  In the years since I joined VMware we have quantified ESX's ability to serve web pages (SPECweb), enable massive numbers of database transactions (TPC-C, with disclaimers), and establish industry leadership in consolidated workloads (VMmark).  As we released these and dozens of other numbers, Citrix has remained silent on its own product's performance.


I was pleased that the event's format gave me the opportunity to discuss our accomplishments.  My only regret was that I lacked the time to dispense with the most important of several factual inaccuracies from Simon.  At one point in the discussion Simon claimed that VMmark is not run by anyone except VMware.  In fact, it is closer to the truth to say that VMmark is run by everyone except VMware.  A quick view of the VMmark results page will show results from every major server vendor, with no submissions from VMware.


Thanks to the Burton Group and Chris Wolf for letting me participate.  It was a pleasure.