Doesn't seem to be any contention. Any ideas why it seems to be limited to just 50% of the available CPU?
Now you see one of the MANY reasons why multi CPU VM's are a WASTE! This is what we try and illustrate, thanks for proving our point!
It is APPS that drive CPU, if the APPS are not multi threaded OR SMP, you can't take advantage of CPU unless the APPS are SPECIFICALLY written for it.
In a VM, it does a great job of demonstrating how inept apps really are....
Yeah, you will find that what ever the app is your running on that box can only take advantage of 2 threads which is 50% of the cpu resources you have allocated.
In addition to this (and no I was not smoking anything at the time) I have had a vm that did not appear to be able to consume more than 50% of the CPU. Had to cold power the vm off and then on again to get it to return to normal.
1 person found this helpful
It would seem that your app has only 2 threads, hence the relative 50% limit. Be careful with 4 vCPU guests, as they can cause CPU scheduling delays on many hosts. In general the recommendation is to allocate a maximum of half the number of physical processor cores to any one guest, but if there were two such guests there would be no free CPU resource for underlying ESX functions (for example iSCSI, NFS, memory management etc etc).
Please award points to any useful answer.
Thanks everyone who has responded so far. I really appreciate the help.
When we watch task manager we can see 4 different processes at the top of the list - together using only about 50% of the CPU. With 4 processes, we would expect it to fully utillize all 4 processors near 100% usage. This all came about as a result of my customer not being pleased with his VM performance. He then installed the application on an older physical server and it completes this simulation in less than 1/2 the time of the VM. For testing purposes we moved the VM to a standalone ESX host to be certain we weren't seeing contention with other VM's for CPU resources. Is it valid to think that because we see four processes related to this applicaiton concurrently running in task manager that we would see all 4 CPU's in action and not be limited to 50% usage by VMware or Windows? It is strange to me to see all 4 vCPU's all at or about 50% used for the duration of the test...
The ESX host is a Dell PowerEdge R710 - dual socket quad-core processors with hyperthreading and VT enabled - Intel E5540 @ 2.53GHZ
Have you tried giving it a higher cpu share or reservation?
What does the CPU ready values show?
Check for ballooning memory too.
You may want to consider turning hyperthreading off, VMware may be putting your vcpu's on same core and hyperthreading them, cutting your allocation in half.
CPU affinity as a test may help.
I would like to hear your results.
However, over allocation of CPU resources is a problem when you get into multiple vm's. The more CPU's the harder it is to schedule it in a busy host. Especially of the CPU allocation is even close to the number of physical cores.
I have app folks who always wanted 4-8 vcpu's on an 8 core host with 10 VM's. Always drives me crazy. When we got the 24 core blades, CPU ready plummeted by 1000%.
At the moment this is the only VM running on that ESX host, would think shares would not come into play. I am not using reservations but have defined resource limit of unlimited.
%RDY counters are very low to none
Thought about turning hyperthreading off but in the end we would like to get the most out of this ESX servers Would also mean that we would need to turn it off on all hosts in the cluster in the event that the VM was migrated (although may be a good exercise for a data point). Is there a way that I can tell how much time a vCPU is actually running as a hyperthread? Might be worth a test just to verify. Weird thing is it seems like VM is not leveraging the CPU weather logical or physical at this point beyond 50%. In addition it is basically exactly 50%.
Not sure what CPU affinity settings to make. I would hope that the scheduler is trying to use all cores accross mutliple sockets (there are only two sockets).
Yea I hate going with the SMP VM's, it adds complexity and reduces my consolidation ratios..
I suggested affininy, so you could try to have an alternative to shutting down and resetting hyperthread as a test. You can set affinity to spread out the cores as efficiently as possible so to force spreading the load across both cores and see what happens.
As for the customer putting their app on another box and it runs fine...thats a different install, wonder if there are any app settings that get in the way.
This is a real-world example I had a few years ago. An oracle database was running 2 copies of a database (different data for 2 different biz units, but the same app 2x). One db was running slow, the other was fine. Vendor darn near flat refused to support app on a vm, kept insisting that we run physical. Since mgmt was not going to purchase hardware, I spend a long time playing with resources. While I was able to make some minor improvements with some settings such as hyperthread off, isolated on a host, etc. nothing ever changed dramatically. I had to press back on the app guys as to why one db seemed ok and the other did not.... After nearly a YEAR, the dba and vendor started to go through db structure and found that a table had been added to the badly running db that was not in use. After fixing that, it ran fine. Not exactly your same problem, but it can show you that the application and OS configuration can make a difference.
Also, if your CPU ready is very low, that is good, its getting all the CPU its asking for. If the vm is only asking for that much, then I would think to look inside the vm itself.
We pull out perfmon a lot in vm's. Might have to dig a little deeper into the vm itself.
The Oracle example is very sadly so very common in the Oracle application vendor space. Horrid work ethics and mindset, and idiotic pricing with it.
Please award points to any useful answer.
When monitoring the guest o/s via perfmon we see a large proportion of CPU time spent in priviledge mode, actually it is fairly high on the physical as well but much higher on the VM. Roughly 80% of the CPU time seems to be spent in priv mode as opposed to physical server which is around 50%. There is very little paging occurring. Having trouble understanding why so much time is being spent in non-user mode. It is possible that a lot of floating-point calculations are being done - does a VM get direct access to Floating point unit or does hypervisor capture and do s/w translation? If so does that count toward priv mode or user mode?
How fast is your storage, your performance issue may not be your CPU.
When we first started with vmware, most vm's were not disk i/o intensive, then we tried to virtualize one server that was and it ran like a dog. We ended up doing massive i/o benchmarks and several SAN upgrades to get performance near what physical hardware can give.
if most of the CPU time is privileged mode, then it could be waiting on disk i/o (not paging, but file access). Saw this kind of think with a database once. Turned out processes were waiting for the i/o to complete before moving on.
disk Q length, disk reads/s, disk writes/s, %disk times, all these can really slow you down.
I had to use a tool called iometer and spent a bunch of time to get our system cranked up.
from what you've posted before, if CPU is 50%, and there are low ready times, it doesnt sound like a CPU problem. now you mentioned privileged time and that sounds like your processor is waiting for something. Took me a while to realize, high cpu times may have nothing to do with executing program instructions, but more about running the 'check loop' looking for the information requested from disk and other sources.