i say reduce vCPU to 2, see if the CPU ready times go down and performance is better on the guest
Can you explain why this would improve performance?
Thanks for the advice, however, %CSTP is at 0. Any other ideas?
. Any other ideas?
as I said earlier, decrease to 2vCPU's to see if that helps
First, did you expand the VM in esxtop to see the individual vmm rdy's? (hit 'e' then the vmid# and hit enter)
The reason having more vcpus than needed is bad is that ESX schedules CPU time for a virtual machine as a whole and not on a per vcpu basis, so if your vm is idling along and needs to perform a simple process that only uses 1 thread(and this will happen alot), it signals ESX that it needs cpu time. Then ESX has to wait for 4 cores to become available so that it can run that 1 thread on one core while the other 3 are taken out of the pool for no reason and no other vm's can use them until the VM's cpu time is up.
If you only have 1 vm with 4 vcpu's and a handfull of others using 1 vcpu, then you probably won't see much of a difference. But when you have more 2 and 4 vcpu guests, you won't be able to run as many vm's on the same host.
While there are some server roles that will need multiple vcpus (like a sql server with multiple production db's or a couple heavily used db's), it is a best practice to create a new vm with only 1 vcpu, watch it over a period time (6 weeks or so) and see if it is pegging cpu usage alot. If so, assign a second vcpu and watch it again.
If you treat your vm's the same way you build out physical systems; you will quickly lose the many benefits and cost reductions that can be achieved through virtualization.
That's not entitely true Danm66. Since 3.x Relaxed Co-Scheduling has been introduced and this basically means that if an application is single threaded an idle loop will be detected on the other vCPU's by the scheduler and it will deschedule these and make them available to other worlds. Some article to read for more info on this topic:
The answer the question of the topic starter might indeed be as simple as reducing the number of vCPUs for this particular VM. This will, more than likely, reduce ready time and improve performance. The reason for this is that by reducing the amount of vCPUs you will increase the total amount of scheduling options. 3.5 works with so called scheduler cells and if you have for instance 8 cores and a 4 vCPU VM you will only have two option to schedule, either Cell 1 or Cell 2. If any of the other VMs occupy a single vCPU you will have to wait, and that's probably the %RDY you are seeing.
VMware Communities User Moderator | VCP | VCDX
Now available: Paper - vSphere 4.0 Quick Start Guide (via amazon.com) | PDF (via lulu.com)
True, but real-world experience has taught me that you can't rely on that, just like one of the links implies. I considerate somewhat of a gamble if you are relying upon it. Best practice is still not to configure for more than you need. Cell boundering can affect this too. It might be less of an issue as 6 & 8 core processors become more proliferant, but then the availability of 6 & 8-way vm's will grow, too.
The main issue comes down to education and taking the time to figure out what you need for a VM, instead of just giving out resources without verifying. I guess what I sense is that many administrators (not pointing the finger at mkielman or anyone else in particular) think that virtualization has relieved them from many of their duties and tasks when it has only removed/changed some and added/amplified others. <stepping down from soapbox
I didn't know you could expand CPU utilization further, Thanks for the tip! Anyway, I did this for the problematic VM and found that vmm0:server - vmm3:server all have a %RDY between 5-10%. The 4 vcpus all have a %RDY < 1.
The %WAIT time for all vcpus is close to 100% and for the vmms it is between 35-50%.
The users are experiencing very slow performance in their application, despite the OS metrics indicating CPU utilization is <50%.
Thank you for your explanation about CPUs. I didn't realize it worked in that way. The problem is, I have no idea where the bottleneck is because there was a performance problem with 2 virtual CPUs and now with 4 virtual CPUs.
Are the %RDY and %WAIT values a concern?
All, I appreciate your help and can assure you I am not trying to release my sys administrator duties as a result of deploying virtual machines. Simply put, my users are complaining of a performance issue which I cannot correlate to any OS related metrics (memory, CPU, disk) as they all look fine. I am now trying to dig deeper and just need assistance with this. The problem here is that I don't have time to read through all the documentation thoroghly and haven't been to VMWare training so I am relying on people such as yourselves to provide guidance. I've got to figureo out where the bottleneck is and if I can't, the users will demand physical hardware.
That said, I will reduce the # processors to 2 and see if the %RDY values change. If not, I will be back. Also, I still need assistance tracking down the %WAIT.
If I am reading the article http://www.yellow-bricks.com/2008/07/07/multiple-virtual-cpu-vms/ correctly, the bottom section mentions using %CSTP to determine if there are co-scheduling problems. In my case, %CSTP is 0. Doesn't this mean I do not have a co-scheduling problem?
My comment about "shirking" admin tasks was not pointed at you, just an observation of general tendency that I've seen as of late... As sysadmins, we all want to do as little work as possible - at the least so we can accomplish more during the limited hours we have in a work day - and some people take that too far. So, please don't take that personally. With that out of the way...
As I understand it, in esxtop, when you expand the vmid, it's the vmm's that are the actual "working" processes that associate a vcpu to a core. I've always been told that sustained >5% per vmm is cause for concern. The usual workaround/fix is to get rid of as many extra vcpu's as possible across all vm's on a host.
Task manager at only 50% for cpu usage doesn't rule out that multiple vcpus are causing the performance issues. If you are in top, check out the disk, memory and network stats. disk latency is the next biggest cause of performance issues. What do the kavg and davg look like?
My apologies for the delay. the KAVG/cmd is stable at .02 and the DAVG/cmd is anywhere from 5-20. These numbers look good, right?
Just to clarify your statement below, the >5% is referring to %RDY, correct?
if kavg is under .5, then you aren't queuing up any commands in the kernel; i.e. esx is sending commands out as soon as it gets them. so, yeah, those numbers are ok.