Re: Guest CPU %RDY >30 & %WAIT > 1000

mkielman · ‎02-01-2010

All -

I have a guest with a %RDY >30 and a %MLMTD = 0. It is also maintains a %WAIT of about 1000 (%IDLE is approx. 270). Using the thresholds listed at this site: http://www.yellow-bricks.com/2010/01/05/esxtop-valuesthresholds/ I am not seeing any Disk latency to worry about all the "MCTL" counters are 0 except MCTLMAX which I understand is nothing to worry about. The problem is, the end users are complaining of periodic performance issues and the OS counters are not pointing to anything unusual EXCEPT CPU is roughly around 30% with a Proc Queue of about 5 per processor. This guest is configured with 4 procs.

Any assistance would be greatly appreciated.

Thanks!

Troy_Clavell · ‎02-01-2010

i say reduce vCPU to 2, see if the CPU ready times go down and performance is better on the guest

mkielman · ‎02-01-2010

Can you explain why this would improve performance?

Troy_Clavell · ‎02-01-2010

Here, I'll let VMware explain it, but I think your CPU %RDY is a bit high.

http://kb.vmware.com/kb/1005362

mkielman · ‎02-02-2010

Thanks for the advice, however, %CSTP is at 0. Any other ideas?

Troy_Clavell · ‎02-02-2010

. Any other ideas?

as I said earlier, decrease to 2vCPU's to see if that helps

danm66 · ‎02-02-2010

First, did you expand the VM in esxtop to see the individual vmm rdy's? (hit 'e' then the vmid# and hit enter)

The reason having more vcpus than needed is bad is that ESX schedules CPU time for a virtual machine as a whole and not on a per vcpu basis, so if your vm is idling along and needs to perform a simple process that only uses 1 thread(and this will happen alot), it signals ESX that it needs cpu time. Then ESX has to wait for 4 cores to become available so that it can run that 1 thread on one core while the other 3 are taken out of the pool for no reason and no other vm's can use them until the VM's cpu time is up.

If you only have 1 vm with 4 vcpu's and a handfull of others using 1 vcpu, then you probably won't see much of a difference. But when you have more 2 and 4 vcpu guests, you won't be able to run as many vm's on the same host.

While there are some server roles that will need multiple vcpus (like a sql server with multiple production db's or a couple heavily used db's), it is a best practice to create a new vm with only 1 vcpu, watch it over a period time (6 weeks or so) and see if it is pegging cpu usage alot. If so, assign a second vcpu and watch it again.

If you treat your vm's the same way you build out physical systems; you will quickly lose the many benefits and cost reductions that can be achieved through virtualization.

depping · ‎02-02-2010

That's not entitely true Danm66. Since 3.x Relaxed Co-Scheduling has been introduced and this basically means that if an application is single threaded an idle loop will be detected on the other vCPU's by the scheduler and it will deschedule these and make them available to other worlds. Some article to read for more info on this topic:

http://www.yellow-bricks.com/2008/07/07/multiple-virtual-cpu-vms/

http://www.vmware.com/files/pdf/perf-vsphere-cpu_scheduler.pdf

http://www.yellow-bricks.com/2009/01/29/re-esxtop-drilldown-jason-boche/

The answer the question of the topic starter might indeed be as simple as reducing the number of vCPUs for this particular VM. This will, more than likely, reduce ready time and improve performance. The reason for this is that by reducing the amount of vCPUs you will increase the total amount of scheduling options. 3.5 works with so called scheduler cells and if you have for instance 8 cores and a 4 vCPU VM you will only have two option to schedule, either Cell 1 or Cell 2. If any of the other VMs occupy a single vCPU you will have to wait, and that's probably the %RDY you are seeing.

Duncan

VMware Communities User Moderator | VCP | VCDX

-

Now available: Paper - vSphere 4.0 Quick Start Guide (via amazon.com) | PDF (via lulu.com)

Blogging: | Twitter:

danm66 · ‎02-03-2010

True, but real-world experience has taught me that you can't rely on that, just like one of the links implies. I considerate somewhat of a gamble if you are relying upon it. Best practice is still not to configure for more than you need. Cell boundering can affect this too. It might be less of an issue as 6 & 8 core processors become more proliferant, but then the availability of 6 & 8-way vm's will grow, too.

The main issue comes down to education and taking the time to figure out what you need for a VM, instead of just giving out resources without verifying. I guess what I sense is that many administrators (not pointing the finger at mkielman or anyone else in particular) think that virtualization has relieved them from many of their duties and tasks when it has only removed/changed some and added/amplified others. <stepping down from soapbox

mkielman · ‎02-04-2010

I didn't know you could expand CPU utilization further, Thanks for the tip! Anyway, I did this for the problematic VM and found that vmm0:server - vmm3:server all have a %RDY between 5-10%. The 4 vcpus all have a %RDY < 1.

The %WAIT time for all vcpus is close to 100% and for the vmms it is between 35-50%.

The users are experiencing very slow performance in their application, despite the OS metrics indicating CPU utilization is <50%.

Thank you for your explanation about CPUs. I didn't realize it worked in that way. The problem is, I have no idea where the bottleneck is because there was a performance problem with 2 virtual CPUs and now with 4 virtual CPUs.

Are the %RDY and %WAIT values a concern?

mkielman · ‎02-04-2010

All, I appreciate your help and can assure you I am not trying to release my sys administrator duties as a result of deploying virtual machines. Simply put, my users are complaining of a performance issue which I cannot correlate to any OS related metrics (memory, CPU, disk) as they all look fine. I am now trying to dig deeper and just need assistance with this. The problem here is that I don't have time to read through all the documentation thoroghly and haven't been to VMWare training so I am relying on people such as yourselves to provide guidance. I've got to figureo out where the bottleneck is and if I can't, the users will demand physical hardware.

That said, I will reduce the # processors to 2 and see if the %RDY values change. If not, I will be back. Also, I still need assistance tracking down the %WAIT.

Megan

mkielman · ‎02-04-2010

If I am reading the article http://www.yellow-bricks.com/2008/07/07/multiple-virtual-cpu-vms/ correctly, the bottom section mentions using %CSTP to determine if there are co-scheduling problems. In my case, %CSTP is 0. Doesn't this mean I do not have a co-scheduling problem?

danm66 · ‎02-04-2010

Megan,

My comment about "shirking" admin tasks was not pointed at you, just an observation of general tendency that I've seen as of late... As sysadmins, we all want to do as little work as possible - at the least so we can accomplish more during the limited hours we have in a work day - and some people take that too far. So, please don't take that personally. With that out of the way...

As I understand it, in esxtop, when you expand the vmid, it's the vmm's that are the actual "working" processes that associate a vcpu to a core. I've always been told that sustained >5% per vmm is cause for concern. The usual workaround/fix is to get rid of as many extra vcpu's as possible across all vm's on a host.

Task manager at only 50% for cpu usage doesn't rule out that multiple vcpus are causing the performance issues. If you are in top, check out the disk, memory and network stats. disk latency is the next biggest cause of performance issues. What do the kavg and davg look like?

mkielman · ‎02-10-2010

My apologies for the delay. the KAVG/cmd is stable at .02 and the DAVG/cmd is anywhere from 5-20. These numbers look good, right?

Just to clarify your statement below, the >5% is referring to %RDY, correct?

danm66 · ‎02-10-2010

if kavg is under .5, then you aren't queuing up any commands in the kernel; i.e. esx is sending commands out as soon as it gets them. so, yeah, those numbers are ok.

mkielman · ‎02-10-2010

Ok so it sounds like you are maintaining the %RDY is probably related to multiple vCPUs despite %CSTP remaining at 0?

danm66 · ‎02-10-2010

Yeah, I guess it's worth a shot. I don't think we ever asked what the historical performance stats were like for the vm and what you see within the guest when users are complaining about the performance like if the processor usage is high or memory usage is peaking, etc...

mkielman · ‎02-10-2010

That is exactly the problem. The guest doesn't report any performance bottlenecks (via perfmon) when the client complains of performance issues.

danm66 · ‎02-10-2010

Well, I suppose it could be that the host is just CPU constrained, regardless of how many vcpu's this guest has, which would indicate the high RDY but the low CSTP. For that conclusion to be reinforced, though, I would suspect that you would see the pcpu's at a fairly high utilization.

The hard part of this performance game is that you can't always be sure it's an ESX limitation. Just today a coworker was telling me about a client that was complaining about web serving performance. A comparable physical system delivered web content 4x quicker than the virtual, until my coworker tried using IP URL instead of hostname. 90 min. of troubleshooting and looking at performance stats only to find a DNS issue!

So, in that vein, I would suggest trying 2 vcpu or lighten up the load on the host, just to rule that out and if it doesn't help, then what else can you tell us about the server and how the users access it?

mkielman · ‎02-11-2010

The pcpu's are right around 40-50% each, would you consider that constrained? As you said, I am going to go ahead and move the VM to a less utilized host and if the %RDY remains high, I will decrease the amount of vcpus. If you think of anything else, let me know!

I did have lengthy meeting with the group experiencing the performance issues and it turns out they have their application installed in a non-standard config in which multiple roles are installed on the same server when they are supposed to be split out. What we are going to do for now is provide them with enough virtual machines (staying mindful of CPU allocation ) so they can properly divide the workloads. Once we have reconfigured the environment we will revisit any issues which arise. Due to the lack of metrics to support resource constraint on the guest, I am hopeful the problem lies withint he applications themselves. Unfortunately, HP refuses to provide support due to their non-standard configuration.

Thank you!!