VMware Cloud Community
sthompson500
Contributor
Contributor

CPU usage doesn't match

I seem to recall in a vmware class that the cpu usage graph shown in windows task manager should not be the main metric to show a virtual machines performance but I can't seem to find any good information or articles as to why this might be and what I should be looking at to determine how well the machine is performing and receiving all the cpu resources it wants.

For example I have a MS SQL server running Windows 2012 and SQL Server 2012 . It has been assigned 32v CPU's and 96GB of RAM. When we open windows task manager we see CPU usage at 57%:

But in vSphere it the performance tab for this VM shows it is below 50%:

Looking at the host as a hole, CPU and Memory usage is fairly low too:

There are only 5 VM's on this entire host, the rest are using nothing compared to this monster VM:

Granted the difference between the Windows Task Manager and vSphere isn't a huge difference but this is only a snapshot at the moment. From time to time the windows task manger will creep up to 60-70% CPU usage. This sends the developers into a panic and they are getting to the point where they want more CPU's assigned to the VM because "windows" says it's using all the CPU.

I actually believe it has to many cpu's assigned to it, but trying to show on paper why this might be is nearly impossible... can't find any documentation or articles stating the best way to read these performance metrics.

We do have vCOPS and it too says the VM is wasteful and doesn't need 32 vCPU's. But the report is really simply that. Management and developers want to know why and vCOPS does not explain this either.

What performance metric should I be paying attention to? WIndows? vSphere? and why? Thanks for any help anyone can provide.

22 Replies
weinstein5
Immortal
Immortal

Welcome to the Community - First off the best way to see how well the VM is performing from the users and do not rely just on statements it is slow - try to empirical data to show degradation - e.g. timing database searches, population of data on a web page


The windows task manager really measures the percent usage of CPU resources allocated to the VM - so what you are seeing is that the VM is using ~60% of the CPU cycles being provided - from the vcenter graphs shows that there is still 50% of the allocated cpu cycles so there so as the CPU load increases the ESXi host will allocate more cycles to the VM

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
Reply
0 Kudos
MKguy
Virtuoso
Virtuoso

The comparison is moot, just because you are comparing two entirely different measuring intervals here.

The vCenter performance graph in realtime mode uses a 20 second measuring interval, that means every CPU usage value on this graph represents the average during the previous 20 seconds.

Obviously the Windows task manager graph fluctuates a lot more because it updates every second (or like every 3 seconds if you set the update interval to low).

Needless to say that this way you will rarely see close to identical values for the two, unless the actual CPU load constantly stays at the same level for some time.

-- http://alpacapowered.wordpress.com
sthompson500
Contributor
Contributor

So the overall feeling then is to add more CPU's to this VM? Am i reading the replies correctly?

Reply
0 Kudos
Titanomachia
Enthusiast
Enthusiast

How many CPUs are on the host? 57% to me doesn't indicate a CPU issue, maybe CPU contention but adding more will not help. Do you know it is CPU related, the storage may be the issue.

Best thing to do is SSH to the ESXi host and run ESXTOP. Here view the ready time for the VM, view the storage latency stats, IO etc also. Post these metrics so we can get closer to the issue.

Reply
0 Kudos
JPM300
Commander
Commander

Hey sthompson500,

Your host shows that you have 32 CPU, but how does your host actually get the 32 CPU is it 4 socket 8 core CPU's?  The reason I ask is because of how numa works and trying to keep VM's within numa barriers to fine tune best performance.  We do a fair amount of SQL visualization and recently did a fresh test on a bunch 2x8Core systems with 256GB of memory on each host.  What we did is ran Dell's Benchmark Factory on the physical host only running SQL and created a bunch of baselines with industry standard tests, transactions per second, ect, ect,ect, benchmark factory has them all baked in.  We then P2V'ed the SQL system over to a temp host and installed ESXi 5.5u1 on the host that previously ran SQL.  We then moved the P2V'ed SQL back over to this new host and ran the tests again.  We found that there was about a 10% overhead when visualizing the system from physical system that had full rein of the server prior.  Another thing we found was going past 8vCPU yielded very little extra performance.  So running the tests with 8vCPU performed just as good as 16vCPU.  This is mainly because of the way the VMware scheduler works and NUMA on your system.

Each server has a NUMA barrier, what this is in a nut shell is when you open the case each CPU gets a bank of memory assigned to it.  So in the case I just described we had 2 socket 8 core system.  So each socket gets a bank of memory.  In this case each socket got 126GB of memory as the system had a total of 256GB.  Now if you keep a VM within a NUMA barrier we has the best results as instructions between the CPU/Memory is VERY fast, however if we built a VM with 16 CPU and 200GB of memory, it then had to sometimes send instructions to the bank of memory in the other NUMA node.  Now this is still fast, don't get me wrong but its not as fast and can hurt performance.  Also with more vCPU to schedule its harder on the VMware scheduler.

So depending on how your host is setup you can try and play with that a little and see what the results are like.  You can also do some SQL fine tuning sometimes as well to help a little but it all depends on the SQL environment.  With that all said its not impossible to scale up in VM size to support LARGE SQL, Oracle, SAP, ect type VM's however it usually becomes more costly and harder to achieve the performance metrics you want.  In most cases it usually better to break these LARGE VM's down into 1-2 smaller VM's and spread the load out which allows the instructions requests to process faster and gives the VM scheduler an easier time.  However if this can't be done there is ways of making LARGE VM's scale up and work, its just a little trickier as you also have to worry about failover ect which becomes much harder when you get into very LARGE VM's alone other issues.

Like other people have said I would go and watch esxtop for a while during peak performance or during a test and see if any of the numbers surpass these:

http://www.yellow-bricks.com/esxtop/

If they do you can find the contention point and work through it to get the performance you want.

With that all said, in the past I have also had some issues with vCOPS and the way it does its waste management tracking, and you can fine tune it ALOT to try and get it precise as possible, however many times due to the way it tracks resources vs how Windows tracks its resources, you still need to do a per VM investigation on VM's its flagging as very wastefull ect.  A good example of this is SQL and how Vmware tracks memory.  VMware tracks memory on a VM in 3 ways.  Consumed, Granted, and Active.  Consumed and Granted are pretty well the same thing but its essentially how much memory the VM is given.  Active tracks how much active pages to memory in the VM is happening, so essentially active memory calls.  This is a very effective way of tracking memory but it has some flaws or issues.  What if your Windows OS is caching a bunch of memory for peak periods but not actively using it.  The VM will show the active memory in the VM will be low and wasteful.  So you look at this and then cut back the memory by a few GB, then when the next spike occurs or working process your users / developers complain its laggy or slow due to the fact that what was once cached is now swapping to disk.  Another good example of this is how SQL caches memory.  It caches pretty well 98% of what it can get its greedy hands on but to see what it is actually actively using in that cached bank of memory you have to use differement metrics.  It also depends on what your SQL box is doing as if it isn't storing indexing or procedures constantly in memory to be reused effectively it won't actively use all of the memory it has cached effectively.  There is ways in SQL to find out how much of the Active memory out of the what it has cached is being used, this can help reduce how much memory you are acutally allocating to the VM in many cases as well.  As if your allocating 90GB of memory to the VM but you know SQL is only ever using 20% of that at the heaviest workloads you can reduce it greatly.

I hope this has helped, if you have any questions please feel free to ask.

***Edit*

One thing I also forgot to mention.  I have seen in some rare occasions on large VM's or very CPU heavy VM's where hyperthredding was hurting performance.  This was because of what hyperthredding does.  Hyperthredding is taking a core and splitting its it in half essentially to create more cores or threads to process requests.  So if you have a 8Core system and you turn on hyperthredding it will show 16cores.  This isn't a true 16cores however as each core that hyperthredding has split shares the same cache and this can sometimes be harmful when large requests come in.  The best analogy I have read to explain this is this:  Say you have a road, what is the best way to get more traffic down this road?  You draw a line down the middle and have traffic going in both directions, or side by side.  However what if you have a semi that can't fit on either side of the road due to the new line drawn, well now the semi has to wait till the road is clear prior to being able to transfer its cargo down the road.  So sometimes large requests feel laggy with hyperthredding and as soon as you turn it off this artificial lag goes away.  However this is also very situational and in most cases hyperthredding will help performance so its a test and see kind of situation. Either way when sizing an environment I don't take hyperthredding into consideration, and look at it as a perk or bonus if we can leverage it.

NealeC
Hot Shot
Hot Shot

As Titanomachia suggests adding CPU is only recommended if you are seeing a high CPUREADY value in ESXTOP

However, looking at your windows task manager it appears you've added 32 vCPU (just read your description and that's correct)

So straight away I would say that is your problem.

In the physical world, more cores/cpus is good and means more resource.

But on a VM it's not the case.  The ESXi host can only schedule CPU time for that VM when it has 32 cores free.

So bad analogy time:

Think of a post office with 32 counters (I know imagine them having that many staff!!!).  but for some weird reason you and 31 friends turn up and can only be served simultaneously.

So until all 32 counters are empty you have to wait.

I would try halving your vcpu count to 16 to see if that improves performance.  You may well find that the host can schedule time for the VM more easily if you're not asking it to find a slot with 32 cores available.

-------------- If you found this or any other answer useful please consider the use of the Helpful or Correct buttons to award points. Chris Neale VCIX6-NV;vExpert2014-17;VCP6-NV;VCP5-DCV;VCP4;VCA-NV;VCA-DCV;VTSP2015;VTSP5;VTSP4 http://www.chrisneale.org http://www.twitter.com/mrcneale
Reply
0 Kudos
NealeC
Hot Shot
Hot Shot

What JPM300 said Smiley Happy

-------------- If you found this or any other answer useful please consider the use of the Helpful or Correct buttons to award points. Chris Neale VCIX6-NV;vExpert2014-17;VCP6-NV;VCP5-DCV;VCP4;VCA-NV;VCA-DCV;VTSP2015;VTSP5;VTSP4 http://www.chrisneale.org http://www.twitter.com/mrcneale
Reply
0 Kudos
Titanomachia
Enthusiast
Enthusiast

Good point on NUMA.

Open and browser and navigate to https://<hostname_or_IP>/folder

Here go to your VM home folder and the vmware.log. Search for numa and numahost and post the content. This will tell us if your vNuma is optimal for this VM.

Reply
0 Kudos
JPM300
Commander
Commander

Here is a quick blog about checking the vmware.log file for the numa settings:   Checking the vNUMA Topology | VMware vSphere Blog - VMware Blogs

However if you could post it the vmware.log that would be great so we can take a look.  Also you can check this on ESXTOP which is outlined on the yellowbricks article I linked earlier.

:

MEMN%L80If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. Check “GST_ND(X)” to find out which NUMA nodes are used.
Reply
0 Kudos
King_Robert
Hot Shot
Hot Shot

Have you check that Vmware tools configured properly on this VM ?

Reply
0 Kudos
sthompson500
Contributor
Contributor

@JPM300 - First thanks for the details reply, I really appreciate that and found it informative.

Our host is a 4 socket, 8 cores each giving us 32 physical cores and 64 logical with hyper-threading. While I get the  gest of your overall message, the problem isn't in convincing ME that 32 CPU likely isn't needed. It's convincing management and the developers it likely isn't. Because they are stuck looking only at Windows task manager and seeing it's using 60-70% CPU. At this time this is our "slow" time and are expected to be busier in the next month or two. Their train of thought is "if it's at 70% now we're going to need a lot more CPU's when we are busy in a month".

I wholeheartedly believe we don't need 32CPU's assigned to the VM. But maybe with the help from here I can either prove that point or be proven why I'm wrong.

@NealeC - That's the thing, the host is really underworked right now having only 1 real VM on it, the other 3-4 VM's on the host isn't doing anything (see you can see in the screen shots above). Also in the screen shots above you can see our Ready Time summation average is 1,500ms.

@Titanomachia - Thanks for the tidbit of information, never knew about the https://ipaddress/folder  I did open the file and here are the contents regarding NUMA. I will say this first. while I have heard of it I don't fully understand it nor know how to find the optimal settings...

NUMA contents of vmware.log:

2014-06-15T12:05:21.600Z| vmx| I120: numa: Setting.vcpu.maxPerVirtualNode=8 to match cpuid.coresPerSocket

2014-06-15T12:05:21.600Z| vmx| I120: numa: VCPU 0: VPD 0 (PPD 0)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 1: VPD 0 (PPD 0)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 2: VPD 0 (PPD 0)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 3: VPD 0 (PPD 0)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 4: VPD 0 (PPD 0)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 5: VPD 0 (PPD 0)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 6: VPD 0 (PPD 0)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 7: VPD 0 (PPD 0)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 8: VPD 1 (PPD 1)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 9: VPD 1 (PPD 1)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 10: VPD 1 (PPD 1)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 11: VPD 1 (PPD 1)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 12: VPD 1 (PPD 1)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 13: VPD 1 (PPD 1)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 14: VPD 1 (PPD 1)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 15: VPD 1 (PPD 1)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 16: VPD 2 (PPD 2)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 17: VPD 2 (PPD 2)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 18: VPD 2 (PPD 2)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 19: VPD 2 (PPD 2)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 20: VPD 2 (PPD 2)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 21: VPD 2 (PPD 2)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 22: VPD 2 (PPD 2)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 23: VPD 2 (PPD 2)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 24: VPD 3 (PPD 3)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 25: VPD 3 (PPD 3)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 26: VPD 3 (PPD 3)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 27: VPD 3 (PPD 3)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 28: VPD 3 (PPD 3)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 29: VPD 3 (PPD 3)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 30: VPD 3 (PPD 3)

2014-06-15T12:05:21.601Z| vmx| I120: numa: VCPU 31: VPD 3 (PPD 3)

2014-06-15T12:05:21.601Z| vmx| I120: numaHost: 4 virtual nodes, 4 virtual sockets, 4 physical domains

@King_Robert - Yes VMware tools is installed and current.

Reply
0 Kudos
Titanomachia
Enthusiast
Enthusiast

What CPUs does the ESXi host have? how many sockets and cores?

Have you configured the VM CPUs with 8 CPUs per socket and four sockets?

Reply
0 Kudos
sthompson500
Contributor
Contributor

@Titanomachia - 4 sockets, 8 cores each. The host CPU's are Intel E5-4640's. The VM vCPU setup is 4 sockets and 8 cores.

Reply
0 Kudos
Titanomachia
Enthusiast
Enthusiast

How many other VMs are you running on this host? and what are their CPU allocations? I'd be interested to see the ready time for this VM. You can view this in the performance tab on vCentre under CPU or via ESXTOP through SSH.

Reply
0 Kudos
sthompson500
Contributor
Contributor

@Titanomachia - I'm not trying to be rude and I do appreciate the replies, but literally everything you've asked for is shown in the very first post of this thread. :smileyplain:

Reply
0 Kudos
JPM300
Commander
Commander

From what you posted from the vmware.log it looks like numa is okay.  You can also double check it from ESXTOP with the metric I posted above.  With that said from the previous screenshot the CPU ready time looks okay as its still under 10 which is kind of the universal base line.  However there was a few spikes in the previous screenshot where the CPU ready time spiked a bit higher then you would like to see.

I hear what your saying about getting the dev's / management to get past that more CPU is better mentality, but much like the goverment just throwing more money at a problem doesn't always fix the problem Smiley Happy  I have struggled with this a lot in the past as well but the proof is in the results.

The only way to get around this is to do some testing, which is unfortant as it sounds like it is already in production, which means you'll need some quick outage windows ect.  One thing you can do is if your Windows Server is 2008 R2 Enterprise or Datacenter you can turn on CPU / Memory Hot Add.  This way when you reduce the CPU / Memory for testing if you don't get the results your looking for you can hot add in more memory / CPU without a reboot.  However if your not running an OS version that allows for the hot add it will require more outage windows.

I would say get a baseline of how long your SQL processes take to complete then compare it to when you change the resources around.  So if it takes 2min to run a popular query on 32 vCPU but 1min 30 seconds off 16 the proof is there dispite the CPU% inside Windows Task Manager.

This may be a silly question but what is you greatest obsticle, is it that the VM just isn't getting the performance numbers you would like, or is it just the dev team / management worrying that the server is using 70% CPU consistantly.

Here are some PDF's that you can use to help defend your case:

http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf

page 19

https://communities.vmware.com/servlet/JiveServlet/previewBody/21181-102-1-28328/vsphere-oversubscri...

http://www.zdnet.com/virtual-cpus-the-overprovisioning-penalty-of-vcpu-to-pcpu-ratios-4010025185/

Message was edited by: JPM300

Reply
0 Kudos
Titanomachia
Enthusiast
Enthusiast

apologies, I missed the last two screenshots. The issue is ready time, its average is over 1600ms, that's very high and is the result of over provisioning the vCPUs. Are you able to power the other VMs off to test?

Reply
0 Kudos
sthompson500
Contributor
Contributor

@JPM300 - I haven't had a chance to read the links, but will certainly take a look!

"but much like the government just throwing more money at a problem doesn't always fix the problem" - well actually to make matters even harder, this IS a government machine. **sigh**

About Hot Add, I believe one of the DBA's found info that even with hot add enabled and we add more CPU's to the VM, while the OS WILL take advantage of the added hosts the SQL processes will not until restarted which equals an outage.

The greatest obstacle as bluntly as possible is the fact developers are not communicating, I have no idea how or what they are testing. From what I've managed to pull from them is they are running some kind of benchmark and seeing metrics of some kind but believe they should be faster. That really sums up exactly what I know and as I'm sure you'll see, this doesn't help me and I'm sure not you as well. That said the developers are going to management stating they need more CPU. Management and Devs both are seeing the "high" CPU usage in Windows and agreeing and now wanting to throw more resources at it. Is this in production? Yes, of course it is.

I will get some ESXTOP stats and if my request to be available when they perform benchmarking again I'll get ESXTOP then too.

Reply
0 Kudos
weinstein5
Immortal
Immortal

sthompson500 wrote:

It's convincing management and the developers it likely isn't. Because they are stuck looking only at Windows task manager and seeing it's using 60-70% CPU. At this time this is our "slow" time and are expected to be busier in the next month or two. Their train of thought is "if it's at 70% now we're going to need a lot more CPU's when we are busy in a month".

I wholeheartedly believe we don't need 32CPU's assigned to the VM. But maybe with the help from here I can either prove that point or be proven why I'm wrong.

Educating the management and end users is always the challenge - they need to understand that a VM  pulls resources from a shared pool of resources and will only pull what it needs - so in your case the VM is not being constrained by the lack of resource on the ESXi host and is only using ~50% of what is assigned to it and if the VM needs more CPU cycles it will it will be delivered up too the limit of what is assigned (i.e. 32 cores of 2.399 GHz). Currently it is receiving 32 cores of ~1.2 GHz  and the OS is indicating that it is using all of this -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
Reply
0 Kudos