Supposibly the amount of time in I/O wait state is the difference of %WAIT - %IDLE. I was asking if anyone had the numbers for this defference. Is there anyone with some of these?
Q: How do I know the VCPU world is waiting for I/O events?
A: %WAIT - %IDLE can give you an estimate on how much CPU time is spent in waiting I/O events. This is an estimate only, because the world may be waiting for resources other than I/O. Note that we should only do this for VMM worlds, not the other kind of worlds. Because VMM worlds represent the guest behavior the best. For disk I/O, another alternative is to read the disk latency stats which we will explain in the disk section.
%WAIT by itself cannot be used, you have to expand it and look at the individual vCPU %WAIT values and subtract their idle times. You can't do it on the initial CPU view of a single line per VM in esxtop. If the tech you spoke to did not explain this, he doesn't know what he should.
I have plenty of VMs that never see below 300% when I take %wait - %idle and I do not have any problems.
This is an example I just grabbed of a NimSoft server:
100 100 DHSNMS1 5 192.63 193.80 0.24 291.44 0.03 1.04
%USED is 192 - in other words, nearly both CPUs are being used at 100%
The %idle is 1.04 and the %WAIT is 291.44 - according to the math of wait -idle, I have some super serious IO issues going on. But that is simply not the case and by expanding I can see that.
5665 100 vmware-vmx 1 0.09 0.11 0.00 100.00 0.01 0.00
5667 100 vmassistant.566 1 0.62 0.65 0.00 99.63 0.00 0.00
5703 100 mks:DHSNMS1 1 0.01 0.01 0.00 100.00 0.00 0.00
5704 100 vcpu-0:DHSNMS1 1 32.35 32.88 0.28 67.28 0.11 67.14
5705 100 vcpu-1:DHSNMS1 1 31.58 32.05 0.00 68.11 0.12 67.85
So the values changed unfortunately due to ESXTOP cycling when you expand...so thats annoying, but anyway. In this you see %wait is 67 and 68 respectively, with the other 3 being at 100% (which is completely normal), and the %idle being 0, 0, 0, 67 and 67.8. So doing the math on the CPUs themselves shows a wait - idle of basically 0 - or very little IO wait going on with the CPUs. But if you add up all the %waits and %idles, you get 434ish and the 134ish, and doing the math has the difference of 300 - which is the 3 that are always at 100%.
So there is no way we can answer your question, we can only help you understand how the values need to be interpreted.
IO wait states are almost always going to be disk anyway it is unlikely it is network or user input, so looking at the ESXTOP disk stats is a far better way of determining if there is an IO problem.