I'm using esxtop (on our ESX 3.0.1 system) to monitor various things like physical and logical CPU usage, free memory, and NIC traffic on the physical machine.
Right now, esxtop (in batch mode) returns nearly 4600 counters on our machine. (We have more than 30 virtual machines running right now.)
I've read in a number of places that we should pay attention to the '% Ready' metrics. However, there are nearly 240 different counters that end with '% Ready'. Some are part of a 'Group Cpu', and the rest are all for various 'vcpu-' items.
For example, for one particular machine (display-named 2003test2), we have 6 different ones with '% Ready':
esxhost\Group Cpu(72:2003test2)\% Ready
esxhost\Vcpu(72:2003test2:1406:vmware-vmx)\% Ready
esxhost\Vcpu(72:2003test2:1407:vmm0:2003test2)\% Ready
esxhost\Vcpu(72:2003test2:1408:vmware-vmx)\% Ready
esxhost\Vcpu(72:2003test2:1409:mks:2003test2)\% Ready
esxhost\Vcpu(72:2003test2:1410:vcpu-0:2003test2)\% Ready
Obviously, for virtual machines with more than 1 CPU, we'd also have 'vcpu-1', as well as a few other items.
So here's the question -- which '% Ready' should we be watching? It's not obvious from any of the documentation which fields are which.
And while we're on the subject, I notice that 'Group Cpu' typically seems to return more realistic-looking figures than any of the 'Vcpu' items. For example, watching '% Used' for both 'Group Cpu' and 'Vcpu ... vcpu-0' on a paticular machine seems to show very little activity on the Vcpu, while Group Cpu shows much higher numbers. (In our case, we don't have any sort of resource pools configured, so each virtual machine has its own Group Cpu, as far as I can tell.)
Ideas?????
Thanks!
Dan
Hi Dan,
Having only run esxtop in interactive mode, I'm not best placed to comment on the output when it's run in batch mode...
That being said, the 'right' % RDY value to look for is shown when you press 'C' (for CPU) in esxtop, run interactively. It's a per-VM value when presented in this fashion.
The % RDY value increases when the VM is waiting to execute a thread across multiple vCPUs - both cores/pCPUs have to be available to the VM simultaneously, and on a host with few cores and many multi-vCPU VMs this can limit performance. The name stems from the VM essentially being ready (but unable) to execute the instruction across multiple CPUs.
If you don't have multi-vCPU VMs this value is probably of much less interest to you.
Hope this helps,
Al
Hi Dan,
One Virtual Machine has multiple helping "threads" associated with it. In esxtop CPU stats, each of such "threads" is called a Vcpu. The Group Cpu corresponds to the entire VM. So, one Vcpu entry shows the statistics about one of the "threads"; while one Group Cpu entry shows the entire VM statistics, which is the sum of all its Vcpu statistics.
Depending on your need, you may focus on the right entity. I believe that you can start with the Group Cpu statistics, and narrow it down to the Vcpu statistics when Group Cpu statistics indicates a performance problem.
If you are interested in "Ready Time", you may check out the white paper "Ready Time Observations" from http://www.vmware.com/pdf/esx3_ready_time.pdf
I hope that answers your question.
Best,
-Zhelong
Hi Zhelong,
I am guessing you are from performancegroup in vmware?
Nice pointer on "Read Time"; can you please point to any other docs that explain other items less-familiar like "costop" given by esxtop ? thanks
The best place for the description on esxtop statistics is esxtop man page and ESX user's manual. Our "Resource Management Guide" (http://www.vmware.com/pdf/esx_resource_mgmt.pdf) is another helpful document.
There are a few statistics that we do not explain in the documents, because they are intended to be used by our engineers to debug performance problems. If they become useful to the users, we would add them to the related documents as well.
"costop" is one of these internal statistics. My best knowledge on this is that it has something to do smp scheduling.
Best,
-Zhelong