The following table of vCenter (VC) performance counters lists the counters with a description of their purpose. This page has been updated for vSphere 4, so the counter levels will differ slightly on older versions of VC.
Remember, with the exception of ready time, statistic levels one and two are the only ones needed for 99% of the performance monitoring and analysis out there. Don't spend many of your own cycles worrying about levels three and four!
For information on enabling VC to display and archive these counters see the Understanding vCenter Performance Statistics article.
Before you continue, you should know that all total count metrics reported by VC are reported over the sample window. When you're looking at live stats, this sample window is 20 seconds. When you're looking at archive stats, it will depend on the interval duration. That duration could be five minutes, 30 minutes, two hours, or one day.
This causes a lot of confusion when comparing esxtop results to live VC results to archived VC results. As an example, ready time might be reported as 10% in esxtop. In live VC results this amount of ready time would be reported as 2000 ms (10% of the 20s window.) In one day archive results, the same number would be reported as 30,000 ms (10% of the five minute interval duration.) All of these numbes reflect the same amount of ready time.
Level | Counter name in API | Description | Units |
1 | cpu.ready.summation | Ready time is the time spend waiting for CPU(s) to become available in the past update interval. | millisecond |
1 | cpu.usagemhz.average | The CPU utilization. The maximum possible value here is the frequency of the processors times the number of cores. As an example, a VM using 4000 MHz on a system with four 2 GHz processors is using 50% of the CPU (4000 / (4 * 2000) = 0.5) | megaHertz |
1 | cpu.usage.average | The CPU utilization. This value is reported with 100% representing all processor cores on the system. As an example, a 2-way VM using 50% of a four-core system is completely using two cores. | percent |
2 | cpu.reservedCapacity.average | CPU Reserved Capacity | megaHertz |
2 | cpu.idle.summation | CPU Idle | millisecond |
2 | cpu.swapwait.summation | Swap wait time is time that the world spent waiting for memory to be swapped in. When the VM is waiting for memory, it is not doing work. | millisecond |
3 | cpu.system.summation | System time is the time spent in VMkernel during the last update interval. This does not include guest code execution. | millisecond |
3 | cpu.wait.summation | Wait time is the time spent waiting for hardware or VMkernel lock thread locks during the last update interval. | millisecond |
3 | cpu.extra.summation | CPU extra is the time above the statically calculated entitlement. Entitlement is the share of processing time that a VM should get as a result of its vCPU count and assigned shares. You should not use or care about this counter in any of your own analysis. | millisecond |
3 | cpu.used.summation | CPU Used | millisecond |
3 | cpu.guaranteed.latest | Guaranteed time is reported as the amount of the reservation time that the VM used in the past update interval. As an example, if 2000 MHz have been reserved for the VM on an four-way, 2 GHz host, that's 25% of the CPU resource. In a 20s update interval, there are 80,000 ms available on this four-way system. That means 20,000 ms of time has been reserved. If a VM used only half of its available cycles, the guaranteed time is 10,000 ms. | millisecond |
4 | cpu.usage.none | CPU Usage (None) | percent |
4 | cpu.usage.minimum | CPU Usage (Minimum) | percent |
4 | cpu.usage.maximum | CPU Usage (Maximum) | percent |
4 | cpu.usagemhz.none | CPU Usage in MHz (None) | megaHertz |
4 | cpu.usagemhz.minimum | CPU Usage in MHz (Minimum) | megaHertz |
4 | cpu.usagemhz.maximum | CPU Usage in MHz (Maximum) | megaHertz |
Level | Counter name in API | Description | units |
1 | mem.consumed.average | The amount of machine memory that is in use by the VM. While a VM may have been configured to use 4 GB of RAM, as an example, it might have only touched half of that. Of the 2 GB left, half of that might be saved from memory sharing. That would result in 1 GB of consumed memory. | kiloBytes |
1 | mem.overhead.average | The memory used by the VMkernel to maintain and execute the VM. | kiloBytes |
1 | mem.swapinrate.average | The swap in rate reports the rate at which a VM's memory is being swapped in from disk. | kiloBytesPerSecond |
1 | mem.swapoutrate.average | The swap out rate reports the rate at which a VM's memory is being swapped out to disk. | kiloBytesPerSecond |
1 | mem.usage.average | The percentage of memory used as a percent of all available machine memory. Available for host and VM. | percent |
1 | mem.vmmemctl.average | The amount of memory currently claimed by the balloon driver. This is not a performance problem, per se, but represents the host starting to take memory from less needful VMs for those with large amounts of active memory. But if the host is ballooning, check swap rates (swapin and swapout) which would be indicative of performance problems. | kiloBytes |
2 | mem.granted.average | The amount of memory that was granted to the VM by the host. Memory is not granted to the host until it is touched one time and granted memory may be swapped out or ballooned away if the VMkernel needs the memory. | kiloBytes |
2 | mem.active.average | The amount of memory used by the VM in the past small window of time. This is the "true" number of how much memory the VM currently has need of. Additional, unused memory may be swapped out or ballooned with no impact to the guest's performance. | kiloBytes |
2 | mem.shared.average | The average amount of shared memory. Shared memory represents the entire pool of memory from which sharing savings are possible. The amount of memory that this has been condensed to is reported in shared common memory. So, total saving due to memory sharing equals shared memory minus shared common memory. | kiloBytes |
2 | mem.zero.average | The amount of zero pages in the guest. Zero pages are not represented in machine memory so this results in 100% savings when mapping from the guest to the machine memory. | kiloBytes |
2 | mem.unreserved.average | Memory Unreserved (Average) | kiloBytes |
2 | mem.swapused.average | The amount of swap memory currently in use. A large amount of swap memory is not a performance problem. This could be memory that the guest doesn't need. Check the swap rates (swapin, swapout) to see if the guest is actively in need of more memory than is available. | kiloBytes |
2 | mem.swapunreserved.average | Memory Swap Unreserved (Average) | kiloBytes |
2 | mem.sharedcommon.average | The average amount of shared common memory. Shared memory represents the entire pool of memory from which sharing savings are possible. The amount of memory that this has been condensed to is reported in shared common memory. So, total saving due to memory sharing equals shared memory minus shared common memory. | kiloBytes |
2 | mem.heap.average | Memory Heap (Average) | kiloBytes |
2 | mem.heapfree.average | Memory Heap Free (Average) | kiloBytes |
2 | mem.state.latest | Memory State | number |
2 | mem.swapped.average | Memory Swapped (Average) | kiloBytes |
2 | mem.swaptarget.average | Memory Swap Target (Average) | kiloBytes |
2 | mem.swapin.average | The rate at which memory is being swapped in from disk. A large number here represents a problem with lack of memory and a clear indication that performance is suffering as a result. | kiloBytes |
2 | mem.swapout.average | The rate at which memory is being swapped out to disk. A large number here represents a problem with lack of memory and a clear indication that performance is suffering as a result. | kiloBytes |
2 | mem.vmmemctltarget.average | Memory Balloon Target (Average) | kiloBytes |
2 | mem.sysUsage.average | Memory Used by vmkernel | kiloBytes |
2 | mem.reservedCapacity.average | Memory Reserved Capacity | megaBytes |
4 | mem.usage.none | Memory Usage (None) | percent |
4 | mem.usage.minimum | Memory Usage (Minimum) | percent |
4 | mem.usage.maximum | Memory Usage (Maximum) | percent |
4 | mem.granted.none | Memory Granted (None) | kiloBytes |
4 | mem.granted.minimum | Memory Granted (Minimum) | kiloBytes |
4 | mem.granted.maximum | Memory Granted (Maximum) | kiloBytes |
4 | mem.active.none | Memory Active (None) | kiloBytes |
4 | mem.active.minimum | Memory Active (Minimum) | kiloBytes |
4 | mem.active.maximum | Memory Active (Maximum) | kiloBytes |
4 | mem.shared.none | Memory Shared (None) | kiloBytes |
4 | mem.shared.minimum | Memory Shared (Minimum) | kiloBytes |
4 | mem.shared.maximum | Memory Shared (Maximum) | kiloBytes |
4 | mem.zero.none | Memory Zero (None) | kiloBytes |
4 | mem.zero.minimum | Memory Zero (Minimum) | kiloBytes |
4 | mem.zero.maximum | Memory Zero (Maximum) | kiloBytes |
4 | mem.unreserved.none | Memory Unreserved (None) | kiloBytes |
4 | mem.unreserved.minimum | Memory Unreserved (Minimum) | kiloBytes |
4 | mem.unreserved.maximum | Memory Unreserved (Maximum) | kiloBytes |
4 | mem.swapused.none | Memory Swap Used (None) | kiloBytes |
4 | mem.swapused.minimum | Memory Swap Used (Minimum) | kiloBytes |
4 | mem.swapused.maximum | Memory Swap Used (Maximum) | kiloBytes |
4 | mem.swapunreserved.none | Memory Swap Unreserved (None) | kiloBytes |
4 | mem.swapunreserved.minimum | Memory Swap Unreserved (Minimum) | kiloBytes |
4 | mem.swapunreserved.maximum | Memory Swap Unreserved (Maximum) | kiloBytes |
4 | mem.sharedcommon.none | Memory Shared Common (None) | kiloBytes |
4 | mem.sharedcommon.minimum | Memory Shared Common (Minimum) | kiloBytes |
4 | mem.sharedcommon.maximum | Memory Shared Common (Maximum) | kiloBytes |
4 | mem.heap.none | Memory Heap (None) | kiloBytes |
4 | mem.heap.minimum | Memory Heap (Minimum) | kiloBytes |
4 | mem.heap.maximum | Memory Heap (Maximum) | kiloBytes |
4 | mem.heapfree.none | Memory Heap Free (None) | kiloBytes |
4 | mem.heapfree.minimum | Memory Heap Free (Minimum) | kiloBytes |
4 | mem.heapfree.maximum | Memory Heap Free (Maximum) | kiloBytes |
4 | mem.swapped.none | Memory Swapped (None) | kiloBytes |
4 | mem.swapped.minimum | Memory Swapped (Minimum) | kiloBytes |
4 | mem.swapped.maximum | Memory Swapped (Maximum) | kiloBytes |
4 | mem.swaptarget.none | Memory Swap Target (None) | kiloBytes |
4 | mem.swaptarget.minimum | Memory Swap Target (Minimum) | kiloBytes |
4 | mem.swaptarget.maximum | Memory Swap Target (Maximum) | kiloBytes |
4 | mem.swapin.none | Memory Swap In (None) | kiloBytes |
4 | mem.swapin.minimum | Memory Swap In (Minimum) | kiloBytes |
4 | mem.swapin.maximum | Memory Swap In (Maximum) | kiloBytes |
4 | mem.swapout.none | Memory Swap Out (None) | kiloBytes |
4 | mem.swapout.minimum | Memory Swap Out (Minimum) | kiloBytes |
4 | mem.swapout.maximum | Memory Swap Out (Maximum) | kiloBytes |
4 | mem.vmmemctl.none | Memory Balloon (None) | kiloBytes |
4 | mem.vmmemctl.minimum | Memory Balloon (Minimum) | kiloBytes |
4 | mem.vmmemctl.maximum | Memory Balloon (Maximum) | kiloBytes |
4 | mem.vmmemctltarget.none | Memory Balloon Target (None) | kiloBytes |
4 | mem.vmmemctltarget.minimum | Memory Balloon Target (Minimum) | kiloBytes |
4 | mem.vmmemctltarget.maximum | Memory Balloon Target (Maximum) | kiloBytes |
4 | mem.overhead.none | Memory Overhead (None) | kiloBytes |
4 | mem.overhead.minimum | Memory Overhead (Minimum) | kiloBytes |
4 | mem.overhead.maximum | Memory Overhead (Maximum) | kiloBytes |
4 | mem.consumed.none | Memory Consumed (None) | kiloBytes |
4 | mem.consumed.maximum | Memory Consumed (Maximum) | kiloBytes |
4 | mem.consumed.minimum | Memory Consumed (Minimum) | kiloBytes |
4 | mem.sysUsage.none | Memory Used by vmkernel | kiloBytes |
4 | mem.sysUsage.maximum | Memory Used by vmkernel | kiloBytes |
4 | mem.sysUsage.minimum | Memory Used by vmkernel | kiloBytes |
Level | Counter name in API | Description | units |
1 | disk.maxTotalLatency | The highest reported total latency (device and kernel times) in the sample window. | milliseconds |
1 | disk.usage.average | Average disk throughput over the sample period. | kiloBytesPerSecond |
2 | disk.read.average | Average disk throughput due to read operaitons over the sample period. | kiloBytesPerSecond |
2 | disk.write.average | Average disk throughput due to write operations over the sample period. | kiloBytesPerSecond |
2 | disk.commands.summation | Disk Commands Issued | number |
2 | disk.commandsAborted.summation | The number of aborts that have occurred in the last window of time. Abort commands are issued by the guest when the storage system has not responded within an acceptable amount of time (as defined by the guest OS or application.) | number |
2 | disk.busResets.summation | Disk Bus Resets | number |
2 | disk.deviceReadLatency.average | Device read latency. This is the time the physical device from the HBA to the platter takes to service an IO request. | millisecond |
2 | disk.kernelReadLatency.average | Kernel read latency. This is the time the VMkernel takes to service an IO. This is the time between the guest OS and the device. | millisecond |
2 | disk.totalReadLatency.average | Total read latency. The sum of the device and kernel read latencies. | millisecond |
2 | disk.queueReadLatency.average | Queue Read Latency | millisecond |
2 | disk.deviceWriteLatency.average | Device write latency. This is the time the physical device from the HBA to the platter takes to service an IO request. | millisecond |
2 | disk.kernelWriteLatency.average | Kernel write latency. This is the time the VMkernel takes to service an IO. This is the time between the guest OS and the device. | millisecond |
2 | disk.totalWriteLatency.average | Total write latency. The sum of the device and kernel write latencies. | millisecond |
2 | disk.queueWriteLatency.average | Queue Write Latency | millisecond |
2 | disk.deviceLatency.average | Physical Device Command Latency | millisecond |
2 | disk.kernelLatency.average | Kernel Disk Command Latency | millisecond |
2 | disk.queueLatency.average | Queue Command Latency | millisecond |
3 | disk.numberRead.summation | The number of IO read operations in the previous sample period. Note that these operations may be variable sized up to 64 KB. | number |
3 | disk.numberWrite.summation | The number of IO write operations in the previous sample period. Note that these operations may be variable sized up to 64 KB. | number |
3 | disk.totalLatency.average | This is the average total latency over the sample window. Total latency is the sum of kernel and device latency for both read and write commands. | millisecond |
3 | disk.write.average | Disk Write Rate | kiloBytesPerSecond |
4 | disk.usage.none | Disk Usage (None) | kiloBytesPerSecond |
4 | disk.usage.minimum | Disk Usage (Minimum) | kiloBytesPerSecond |
4 | disk.usage.maximum | Disk Usage (Maximum) | kiloBytesPerSecond |
Level | Counter name in API | Description | units |
1 | net.usage.average | Network Usage (Average) | kiloBytesPerSecond |
2 | net.droppedRx.summation | The number of received packets that were dropped over the sample period. | number |
2 | net.droppedTx.summation | The number of transmitted packets that were dropped over the sample period. | number |
2 | net.received.average | Average network throughput for received traffic. | kiloBytesPerSecond |
2 | net.transmitted.average | Average network throughput for transmitted traffic. | kiloBytesPerSecond |
3 | net.packetsRx.summation | Network Packets Received | number |
3 | net.packetsTx.summation | Network Packets Transmitted | number |
4 | net.usage.none | Network Usage (None) | kiloBytesPerSecond |
4 | net.usage.minimum | Network Usage (Minimum) | kiloBytesPerSecond |
4 | net.usage.maximum | Network Usage (Maximum) | kiloBytesPerSecond |
Level | Counter name in API | Description | units |
1 | sys.uptime.latest | Uptime | second |
1 | sys.heartbeat.summation | Heartbeat | number |
1 | clusterServices.cpufairness.latest | CPU Fairness | number |
1 | clusterServices.memfairness.latest | Memory Fairness | number |
1 | clusterServices.effectivecpu.average | Effective CPU Resources | megaHertz |
1 | clusterServices.effectivemem.average | Effective Memory Resources | megaBytes |
1 | clusterServices.failover.latest | Current failover level | number |
3 | sys.resourceCpuUsage.average | Resource CPU Usage (Average) | megaHertz |
3 | managementAgent.memUsed.average | Memory Used (Average) | kiloBytes |
3 | managementAgent.swapUsed.average | Memory Swap Used (Average) | kiloBytes |
3 | managementAgent.swapIn.average | Memory Swap In (Average) | kiloBytesPerSecond |
3 | managementAgent.swapOut.average | Memory Swap Out (Average) | kiloBytesPerSecond |
3 | rescpu.actav1.latest | CPU Active (1 min. average) | percent |
3 | rescpu.actpk1.latest | CPU Active (1 min. peak) | percent |
3 | rescpu.runav1.latest | CPU Running (1 min. average) | percent |
3 | rescpu.actav5.latest | CPU Active (5 min. average) | percent |
3 | rescpu.actpk5.latest | CPU Active (5 min. peak) | percent |
3 | rescpu.runav5.latest | CPU Running (5 min. average) | percent |
3 | rescpu.actav15.latest | CPU Active (15 min. average) | percent |
3 | rescpu.actpk15.latest | CPU Active (15 min. peak) | percent |
3 | rescpu.runav15.latest | CPU Running (15 min. average) | percent |
3 | rescpu.runpk1.latest | CPU Running (1 min. peak) | percent |
3 | rescpu.maxLimited1.latest | CPU Throttled (1 min. average) | percent |
3 | rescpu.runpk5.latest | CPU Running (5 min. peak) | percent |
3 | rescpu.maxLimited5.latest | CPU Throttled (5 min. average) | percent |
3 | rescpu.runpk15.latest | CPU Running (15 min. peak) | percent |
3 | rescpu.maxLimited15.latest | CPU Throttled (15 min. average) | percent |
3 | rescpu.sampleCount.latest | Group CPU Sample Count | number |
3 | rescpu.samplePeriod.latest | Group CPU Sample Period | millisecond |
4 | sys.resourceCpuUsage.none | Resource CPU Usage (None) | megaHertz |
4 | sys.resourceCpuUsage.maximum | Resource CPU Usage (Maximum) | megaHertz |
4 | sys.resourceCpuUsage.minimum | Resource CPU Usage (Minimum) | megaHertz |
Great job Scott!
Hi Scott, the 20 second interval means
"the average of the last 20 seconds" or
"the value on a particular second, with interval taken every 20 second"?
The 20-second interval means that the values recorded were accrued or averaged over 20 seconds. So, when "ready time" reports a number of 2000 ms, it means that for 2000 ms of the previous 20,000 ms sample period the vCPU was ready to run and not getting resources.
Scott,
It’s great document and I am using it (along with another your docs) all the time…
Just 1 quick question: is there any doc available that will give some kind of guidance for most important metrics? Something like that: in normal condition parameter xxx should be not more than 123; metric yyy never should exceed limit 321, otherwise…
Thanks,
olegarr
No, there is no document today that provides this guidance.
However, this is a great question and one that we've been pondering a bit lately. There is a good deal of demand for guidance from VMware on thresholds for these metrics to advise customers of "yellow" and "red" levels for these counters. We're looking into building something like this now but would like to back it with a deep investigation using data from real deployments. It'll take us some time.
Any news about guidance from VMware on thresholds? I´m looking for this kind of data.
Scott, some clarifications:
cpu.usagemhz.average
You write: "The CPU utilization. The maximum possible value here is the frequency of the processors times the number of cores. As an example, a VM using 4000 MHz on a system with four 2 GHz processors is using 50% of the CPU (4000 / (4 * 2000) = 0.5)"
I'd say: "The CPU utilization. The maximum possible value for a single VM is the frequency of the processors times the number of vCPUs of the VM. The maximum possible value of the sum of cpu.usagemhz of all VMs on one ESX host is the frequency of the processors times the number of cores of that host."
cpu.usage.average
You write: "The CPU utilization. This value is reported with 100% representing all processor cores on the system. As an example, a 2-way VM using 50% of a four-core system is completely using two cores."
I'd say: "The CPU utilization in percent. Example: Assume you have an ESX host with 8 cores and 2 GHz each, this means the host has a capacity of 16 GHz. For non-hyperthreaded systems this is 100%. So if each VM is running at 100% utilization the corresponding MHz values add up to this capacity. For hyperthreaded systems the 100% mark of the ESX host is 1.5 times higher (VMware assumes these systems are 1.5 times more powerful). I have not seen this. I hardly see VMs with more than 75% utilization on a hyperthreaded systems - even when they are running at maximum load. For this reason I would be careful with percentages, and rather stick to the MHz values reported. This also explains why you set reservations and limits in MHz and not in %"
Hi, is there an update for vSphere 5?
The official SDK (http://pubs.vmware.com/vsphere-50/index.jsp?topic=/com.vmware.wssdk.apiref.doc_50/right-pane.html) is a bit brief, and sometimes does not explain.
Scott,
You are saying that vc samples for 20 seconds?
I was always lead to believe it was a snapshot in time, taken every 20 seconds, stored in a flat file on the hosts and held in ram in vc.
I also understood that the flat file would hold 60 minutes worth of snapshots, resulting in 180 datapoints per metric.
Kevin
VC reports performance counters no more frequently than 20s. Those stats are what are shown in the real time panel. They are kept in the VC DB for an hour, I believe. Then they rolled up (summed, averaged, etc.) for a longer period but at greater intervals.
Yes.
It's a shame the sampling frequency can't be increased. As you know, esxtop can get down to 2 second refreshes.
It is a shame. In fact, esxtop can go faster than that. How fast can you hit the spacebar?
resxtop not so good.
Concerning polling intervals, notice the roll-ups from 'Real Time' to 'Day' to 'Week' to 'Month':
see page 12 of vSphere Monitoring and Performance
What am I missing? When I run get-stattype for a VM it only returns
cpu.usage.average
cpu.usagemhz.average
cpu.ready.summation
mem.usage.average
mem.swapinRate.average
mem.swapoutRate.average
mem.vmmemctl.average
mem.consumed.average
mem.overhead.average
disk.maxTotalLatency.latest
net.usage.average
sys.uptime.latest
sys.heartbeat.summation
cpu.cpuentitlement.latest
mem.mementitlement.latest
disk.used.latest
disk.used.latest
disk.used.latest
disk.used.latest
disk.used.latest
disk.provisioned.latest
disk.unshared.latest
Cpu.usagemhz.maximum and minimum are not available. Why would this be?
This sentence needs to be update --> "This page has been updated for vSphere 4, so the counter levels will differ slightly on older versions of VC."
Also, could we have this updated to 5.5 please?
Thanks from Singapore.
e1
I agree with last post. This type of documentation should be delivered with each release.
Anyone from VMware listening?
This documentation has been formalized within our regular tech docs here:
Thank you very much Mark!
Amir
Quick question for you Mark. I'm assisting with the development of the vSphere monitoring capability of our proprietary appliance. I posted this question and am curious if you have some thoughts on the matter:
After doing some initial research, I'm concluding that the most prudent route would be to use perfManager to query the realtime stats every 5 minutes. It seems that only a few managed objects have quickstats and that would be a significant limitation.
We are a VMware partner and I'm also wondering if we could engage VMware for API assistance on this development project as well.
Thank you,
Amir