VMware Cloud Community
gdewulf18480
Contributor
Contributor
Jump to solution

%WAIT - %IDLE What is Average for a VM?

I have been doing some research on VM performance problems and worked with VMware support to find out that there is possibly an IO bottle neck occuring with a VM I am experiencing performance issues on. The technician found that the %WAIT - %IDLE on average for this VM is around %200. The tech said this was abnormally high. I forgot to ask him what normal was but wanted to put a post to find out if some people out there know what they are averaging. Is this an average number to see? I have the system doing nothing right now and I'm seeing 579 for %WAIT and 372 for %IDLE. The tech was able to confirm that no IO queueing was occuring in the VMkernel or on the HBA controller.

Some average % posts would help. Thanks.

Tags (2)
Reply
0 Kudos
1 Solution

Accepted Solutions
FredPeterson
Expert
Expert
Jump to solution

Supposibly the amount of time in I/O wait state is the difference of %WAIT - %IDLE. I was asking if anyone had the numbers for this defference. Is there anyone with some of these?

Q: How do I know the VCPU world is waiting for I/O events?

A: %WAIT - %IDLE can give you an estimate on how much CPU time is spent in waiting I/O events. This is an estimate only, because the world may be waiting for resources other than I/O. Note that we should only do this for VMM worlds, not the other kind of worlds. Because VMM worlds represent the guest behavior the best. For disk I/O, another alternative is to read the disk latency stats which we will explain in the disk section.

%WAIT by itself cannot be used, you have to expand it and look at the individual vCPU %WAIT values and subtract their idle times. You can't do it on the initial CPU view of a single line per VM in esxtop. If the tech you spoke to did not explain this, he doesn't know what he should.

I have plenty of VMs that never see below 300% when I take %wait - %idle and I do not have any problems.

This is an example I just grabbed of a NimSoft server:

100 100 DHSNMS1 5 192.63 193.80 0.24 291.44 0.03 1.04

%USED is 192 - in other words, nearly both CPUs are being used at 100%

The %idle is 1.04 and the %WAIT is 291.44 - according to the math of wait -idle, I have some super serious IO issues going on. But that is simply not the case and by expanding I can see that.

5665 100 vmware-vmx 1 0.09 0.11 0.00 100.00 0.01 0.00

5667 100 vmassistant.566 1 0.62 0.65 0.00 99.63 0.00 0.00

5703 100 mks:DHSNMS1 1 0.01 0.01 0.00 100.00 0.00 0.00

5704 100 vcpu-0:DHSNMS1 1 32.35 32.88 0.28 67.28 0.11 67.14

5705 100 vcpu-1:DHSNMS1 1 31.58 32.05 0.00 68.11 0.12 67.85

So the values changed unfortunately due to ESXTOP cycling when you expand...so thats annoying, but anyway. In this you see %wait is 67 and 68 respectively, with the other 3 being at 100% (which is completely normal), and the %idle being 0, 0, 0, 67 and 67.8. So doing the math on the CPUs themselves shows a wait - idle of basically 0 - or very little IO wait going on with the CPUs. But if you add up all the %waits and %idles, you get 434ish and the 134ish, and doing the math has the difference of 300 - which is the 3 that are always at 100%.

So there is no way we can answer your question, we can only help you understand how the values need to be interpreted.

IO wait states are almost always going to be disk anyway it is unlikely it is network or user input, so looking at the ESXTOP disk stats is a far better way of determining if there is an IO problem.

View solution in original post

Reply
0 Kudos
9 Replies
FredPeterson
Expert
Expert
Jump to solution

Are you sure about the 579 and 272 numbers? What version of ESX are you on?

Reply
0 Kudos
RParker
Immortal
Immortal
Jump to solution

The technician found that the %WAIT - %IDLE on average for this VM is around %200.

It's supposed to be high, that means it's not doing anything, hence WAIT / IDLE. Same as IDLE CPU% on your computer, the system idle SHOULD be high if no other processes are running, 99%.

1 CPU = 100%. So you must have 4 vCPU in this VM (or at least 2)

Reply
0 Kudos
RParker
Immortal
Immortal
Jump to solution

Are you sure about the 579 and 272 numbers? What version of ESX are you on?

I see 300% consistently for %WAIT on ESX 4.0. It doesn't sound logical, but that's what it is...

Reply
0 Kudos
FredPeterson
Expert
Expert
Jump to solution

Yah, same here, depends on your vCPU count and %USED though.

%WAIT in v4+ by default has 3 processes that pretty much are always 100%, so for any calculation using %WAIT to be effective you need to take this into account, or just take out the 300%. So a single vCPU using 100%, your %WAIT is always going to be 300, if its any higher then you do in fact have IO wait going on. If it was using 50% of the single vCPU, %WAIT would be 350. If you had two vCPUs and it was using 50%, your %WAIT is going to be 400 because 500% is the "doing nothing" number, but in this case you've got one CPU doing 100%, or 50% of the total.

Its only been a couple weeks for me being on v4.1 (prev 3.5) but wasn't there only two proceses in 3.5 that were always at 100%? Can't remember...thats how often I looked at it.

Wait values are funny. Being high means either nothing is going on or there is IO wait going on. You need more information though to accurately guage what the number should be.

Reply
0 Kudos
gdewulf18480
Contributor
Contributor
Jump to solution

Yeah, Im sure on these numbers. Running Vsphere 4.0 Update 2 and a Windows 2008 SP2 VM running 4vCPU w/ 8GB ram. Also, the hardware it is running ontop of is one Dell Poweredge R900 with just this one VM running on it.

Reply
0 Kudos
gdewulf18480
Contributor
Contributor
Jump to solution

Supposibly the amount of time in I/O wait state is the difference of %WAIT - %IDLE. I was asking if anyone had the numbers for this defference. Is there anyone with some of these?

Reply
0 Kudos
PduPreez
VMware Employee
VMware Employee
Jump to solution

check out http://www.yellow-bricks.com/esxtop/

All the important Values to look at explained and what numbers is good

Hope this is what you are looking for

Regards

If you find this or any other answer useful please consider awarding points by marking the answer helpful or correct. Thank you.

Reply
0 Kudos
FredPeterson
Expert
Expert
Jump to solution

Supposibly the amount of time in I/O wait state is the difference of %WAIT - %IDLE. I was asking if anyone had the numbers for this defference. Is there anyone with some of these?

Q: How do I know the VCPU world is waiting for I/O events?

A: %WAIT - %IDLE can give you an estimate on how much CPU time is spent in waiting I/O events. This is an estimate only, because the world may be waiting for resources other than I/O. Note that we should only do this for VMM worlds, not the other kind of worlds. Because VMM worlds represent the guest behavior the best. For disk I/O, another alternative is to read the disk latency stats which we will explain in the disk section.

%WAIT by itself cannot be used, you have to expand it and look at the individual vCPU %WAIT values and subtract their idle times. You can't do it on the initial CPU view of a single line per VM in esxtop. If the tech you spoke to did not explain this, he doesn't know what he should.

I have plenty of VMs that never see below 300% when I take %wait - %idle and I do not have any problems.

This is an example I just grabbed of a NimSoft server:

100 100 DHSNMS1 5 192.63 193.80 0.24 291.44 0.03 1.04

%USED is 192 - in other words, nearly both CPUs are being used at 100%

The %idle is 1.04 and the %WAIT is 291.44 - according to the math of wait -idle, I have some super serious IO issues going on. But that is simply not the case and by expanding I can see that.

5665 100 vmware-vmx 1 0.09 0.11 0.00 100.00 0.01 0.00

5667 100 vmassistant.566 1 0.62 0.65 0.00 99.63 0.00 0.00

5703 100 mks:DHSNMS1 1 0.01 0.01 0.00 100.00 0.00 0.00

5704 100 vcpu-0:DHSNMS1 1 32.35 32.88 0.28 67.28 0.11 67.14

5705 100 vcpu-1:DHSNMS1 1 31.58 32.05 0.00 68.11 0.12 67.85

So the values changed unfortunately due to ESXTOP cycling when you expand...so thats annoying, but anyway. In this you see %wait is 67 and 68 respectively, with the other 3 being at 100% (which is completely normal), and the %idle being 0, 0, 0, 67 and 67.8. So doing the math on the CPUs themselves shows a wait - idle of basically 0 - or very little IO wait going on with the CPUs. But if you add up all the %waits and %idles, you get 434ish and the 134ish, and doing the math has the difference of 300 - which is the 3 that are always at 100%.

So there is no way we can answer your question, we can only help you understand how the values need to be interpreted.

IO wait states are almost always going to be disk anyway it is unlikely it is network or user input, so looking at the ESXTOP disk stats is a far better way of determining if there is an IO problem.

Reply
0 Kudos
gdewulf18480
Contributor
Contributor
Jump to solution

Thanks Fred. Awesome explanation and it answers my question. Thanks again!

Reply
0 Kudos