vCenter Performance Counters

vCenter Performance Counters

Introduction

The following table of vCenter (VC) performance counters lists the  counters with a description of their purpose.  This page has been  updated for vSphere 4, so the counter levels will differ slightly on  older versions of VC.

Remember, with the exception of ready time, statistic levels one and two  are the only ones needed for 99% of the performance monitoring and  analysis out there.  Don't spend many of your own cycles worrying about  levels three and four!

For information on enabling VC to display and archive these counters see the Understanding vCenter Performance Statistics article.

Understanding vCenter Measurement Windows

Before you continue, you should know that all total count metrics  reported by VC are reported over the sample window.  When you're looking  at live stats, this sample window is 20 seconds.  When you're looking  at archive stats, it will depend on the interval duration.  That  duration could be five minutes, 30 minutes, two hours, or one day.

This causes a lot of confusion when comparing esxtop results to live VC  results to archived VC results.  As an example, ready time might be  reported as 10% in esxtop.  In live VC results this amount of ready time  would be reported as 2000 ms (10% of the 20s window.)  In one day  archive results, the same number would be reported as 30,000 ms (10% of  the five minute interval duration.)  All of these numbes reflect the  same amount of ready time.

CPU Statistics

LevelCounter name in APIDescriptionUnits
1cpu.ready.summationReady time is the time spend waiting for CPU(s) to become available in the past update interval.millisecond
1cpu.usagemhz.averageThe CPU utilization.  The maximum possible value here is the frequency  of the processors times the number of cores.  As an example, a VM using  4000 MHz  on a system with four 2 GHz processors is using 50% of the CPU  (4000 / (4 * 2000) = 0.5)megaHertz
1cpu.usage.averageThe CPU utilization.  This value is reported with 100% representing all  processor cores on the system.  As an example, a 2-way VM using 50% of a  four-core system is completely using two cores.percent
2cpu.reservedCapacity.averageCPU Reserved CapacitymegaHertz
2cpu.idle.summationCPU Idlemillisecond
2cpu.swapwait.summationSwap wait time is time that the world spent waiting for memory to be  swapped in.  When the VM is waiting for memory, it is not doing work.millisecond
3cpu.system.summationSystem time is the time spent in VMkernel during the last update interval.  This does not include guest code execution.millisecond
3cpu.wait.summationWait time is the time spent waiting for hardware or VMkernel lock thread locks during the last update interval.millisecond
3cpu.extra.summationCPU extra is the time above the statically calculated entitlement.  Entitlement is the share of processing time that a VM should get as a  result of its vCPU count and assigned shares. You should not use or care about this counter in any of your own analysis.millisecond
3cpu.used.summationCPU Usedmillisecond
3cpu.guaranteed.latestGuaranteed time is reported as the amount of the reservation time that  the VM used in the past update interval.  As an example, if 2000 MHz  have been reserved for the VM on an four-way, 2 GHz host, that's 25% of  the CPU resource.  In a 20s update interval, there are 80,000 ms  available on this four-way system.  That means 20,000 ms of time has  been reserved.  If a VM used only half of its available cycles, the  guaranteed time is 10,000 ms.millisecond
4cpu.usage.noneCPU Usage (None)percent
4cpu.usage.minimumCPU Usage (Minimum)percent
4cpu.usage.maximumCPU Usage (Maximum)percent
4cpu.usagemhz.noneCPU Usage in MHz (None)megaHertz
4cpu.usagemhz.minimumCPU Usage in MHz (Minimum)megaHertz
4cpu.usagemhz.maximumCPU Usage in MHz (Maximum)megaHertz


Memory Statistics

LevelCounter name in APIDescriptionunits
1mem.consumed.averageThe amount of machine memory that is in use by the VM. While a VM may
have been configured to use 4 GB of RAM, as an example, it might have
only touched half of that. Of the 2 GB left, half of that might be
saved from memory sharing. That would result in 1 GB of consumed memory.
kiloBytes
1mem.overhead.averageThe memory used by the VMkernel to maintain and execute the VM.kiloBytes
1mem.swapinrate.averageThe swap in rate reports the rate at which a VM's memory is being swapped in from disk.kiloBytesPerSecond
1mem.swapoutrate.averageThe swap out rate reports the rate at which a VM's memory is being swapped out to disk.kiloBytesPerSecond
1mem.usage.averageThe percentage of memory used as a percent of all available machine memory.  Available for host and VM.percent
1mem.vmmemctl.averageThe amount of memory currently claimed by the balloon driver. This is
not a performance problem, per se, but represents the host starting to
take memory from less needful VMs for those with large amounts of
active memory. But if the host is ballooning, check swap rates (swapin
and swapout) which would be indicative of performance problems.
kiloBytes
2mem.granted.averageThe amount of memory that was granted to the VM by the host.  Memory is  not granted to the host until it is touched one time and granted memory  may be swapped out or ballooned away if the VMkernel needs the memory.kiloBytes
2mem.active.averageThe amount of memory used by the VM in the past small window of time.   This is the "true" number of how much memory the VM currently has need  of.  Additional, unused memory may be swapped out or ballooned with no  impact to the guest's performance.kiloBytes
2mem.shared.averageThe average amount of shared memory.  Shared memory represents the  entire pool of memory from which sharing savings are possible.  The  amount of memory that this has been condensed to is reported in shared  common memory.  So, total saving due to memory sharing equals shared  memory minus shared common memory.kiloBytes
2mem.zero.averageThe amount of zero pages in the guest.  Zero pages are not represented  in machine memory so this results in 100% savings when mapping from the  guest to the machine memory.kiloBytes
2mem.unreserved.averageMemory Unreserved (Average)kiloBytes
2mem.swapused.averageThe amount of swap memory currently in use.  A large amount of swap  memory is not a performance problem.  This could be memory that the  guest doesn't need.  Check the swap rates (swapin, swapout) to see if  the guest is actively in need of more memory than is available.kiloBytes
2mem.swapunreserved.averageMemory Swap Unreserved (Average)kiloBytes
2mem.sharedcommon.averageThe average amount of shared common memory.  Shared memory represents  the entire pool of memory from which sharing savings are possible.  The  amount of memory that this has been condensed to is reported in shared  common memory.  So, total saving due to memory sharing equals shared  memory minus shared common memory.kiloBytes
2mem.heap.averageMemory Heap (Average)kiloBytes
2mem.heapfree.averageMemory Heap Free (Average)kiloBytes
2mem.state.latestMemory Statenumber
2mem.swapped.averageMemory Swapped (Average)kiloBytes
2mem.swaptarget.averageMemory Swap Target (Average)kiloBytes
2mem.swapin.averageThe rate at which memory is being swapped in from disk.  A large number  here represents a problem with lack of memory and a clear indication  that performance is suffering as a result.kiloBytes
2mem.swapout.averageThe rate at which memory is being swapped out to disk.  A large number  here represents a problem with lack of memory and a clear indication  that performance is suffering as a result.kiloBytes
2mem.vmmemctltarget.averageMemory Balloon Target (Average)kiloBytes
2mem.sysUsage.averageMemory Used by vmkernelkiloBytes
2mem.reservedCapacity.averageMemory Reserved CapacitymegaBytes
4mem.usage.noneMemory Usage (None)percent
4mem.usage.minimumMemory Usage (Minimum)percent
4mem.usage.maximumMemory Usage (Maximum)percent
4mem.granted.noneMemory Granted (None)kiloBytes
4mem.granted.minimumMemory Granted (Minimum)kiloBytes
4mem.granted.maximumMemory Granted (Maximum)kiloBytes
4mem.active.noneMemory Active (None)kiloBytes
4mem.active.minimumMemory Active (Minimum)kiloBytes
4mem.active.maximumMemory Active (Maximum)kiloBytes
4mem.shared.noneMemory Shared (None)kiloBytes
4mem.shared.minimumMemory Shared (Minimum)kiloBytes
4mem.shared.maximumMemory Shared (Maximum)kiloBytes
4mem.zero.noneMemory Zero (None)kiloBytes
4mem.zero.minimumMemory Zero (Minimum)kiloBytes
4mem.zero.maximumMemory Zero (Maximum)kiloBytes
4mem.unreserved.noneMemory Unreserved (None)kiloBytes
4mem.unreserved.minimumMemory Unreserved (Minimum)kiloBytes
4mem.unreserved.maximumMemory Unreserved (Maximum)kiloBytes
4mem.swapused.noneMemory Swap Used (None)kiloBytes
4mem.swapused.minimumMemory Swap Used (Minimum)kiloBytes
4mem.swapused.maximumMemory Swap Used (Maximum)kiloBytes
4mem.swapunreserved.noneMemory Swap Unreserved (None)kiloBytes
4mem.swapunreserved.minimumMemory Swap Unreserved (Minimum)kiloBytes
4mem.swapunreserved.maximumMemory Swap Unreserved (Maximum)kiloBytes
4mem.sharedcommon.noneMemory Shared Common (None)kiloBytes
4mem.sharedcommon.minimumMemory Shared Common (Minimum)kiloBytes
4mem.sharedcommon.maximumMemory Shared Common (Maximum)kiloBytes
4mem.heap.noneMemory Heap (None)kiloBytes
4mem.heap.minimumMemory Heap (Minimum)kiloBytes
4mem.heap.maximumMemory Heap (Maximum)kiloBytes
4mem.heapfree.noneMemory Heap Free (None)kiloBytes
4mem.heapfree.minimumMemory Heap Free (Minimum)kiloBytes
4mem.heapfree.maximumMemory Heap Free (Maximum)kiloBytes
4mem.swapped.noneMemory Swapped (None)kiloBytes
4mem.swapped.minimumMemory Swapped (Minimum)kiloBytes
4mem.swapped.maximumMemory Swapped (Maximum)kiloBytes
4mem.swaptarget.noneMemory Swap Target (None)kiloBytes
4mem.swaptarget.minimumMemory Swap Target (Minimum)kiloBytes
4mem.swaptarget.maximumMemory Swap Target (Maximum)kiloBytes
4mem.swapin.noneMemory Swap In (None)kiloBytes
4mem.swapin.minimumMemory Swap In (Minimum)kiloBytes
4mem.swapin.maximumMemory Swap In (Maximum)kiloBytes
4mem.swapout.noneMemory Swap Out (None)kiloBytes
4mem.swapout.minimumMemory Swap Out (Minimum)kiloBytes
4mem.swapout.maximumMemory Swap Out (Maximum)kiloBytes
4mem.vmmemctl.noneMemory Balloon (None)kiloBytes
4mem.vmmemctl.minimumMemory Balloon (Minimum)kiloBytes
4mem.vmmemctl.maximumMemory Balloon (Maximum)kiloBytes
4mem.vmmemctltarget.noneMemory Balloon Target (None)kiloBytes
4mem.vmmemctltarget.minimumMemory Balloon Target (Minimum)kiloBytes
4mem.vmmemctltarget.maximumMemory Balloon Target (Maximum)kiloBytes
4mem.overhead.noneMemory Overhead (None)kiloBytes
4mem.overhead.minimumMemory Overhead (Minimum)kiloBytes
4mem.overhead.maximumMemory Overhead (Maximum)kiloBytes
4mem.consumed.noneMemory Consumed (None)kiloBytes
4mem.consumed.maximumMemory Consumed (Maximum)kiloBytes
4mem.consumed.minimumMemory Consumed (Minimum)kiloBytes
4mem.sysUsage.noneMemory Used by vmkernelkiloBytes
4mem.sysUsage.maximumMemory Used by vmkernelkiloBytes
4mem.sysUsage.minimumMemory Used by vmkernelkiloBytes


Disk Statistics

LevelCounter name in APIDescriptionunits
1disk.maxTotalLatencyThe highest reported total latency (device and kernel times) in the sample window.milliseconds
1disk.usage.averageAverage disk throughput over the sample period.kiloBytesPerSecond
2disk.read.averageAverage disk throughput due to read operaitons over the sample period.kiloBytesPerSecond
2disk.write.averageAverage disk throughput due to write operations over the sample period.kiloBytesPerSecond
2disk.commands.summationDisk Commands Issuednumber
2disk.commandsAborted.summationThe number of aborts that have occurred in the last window of time.  Abort commands are issued by the guest when the storage system has not  responded within an acceptable amount of time (as defined by the guest  OS or application.)number
2disk.busResets.summationDisk Bus Resetsnumber
2disk.deviceReadLatency.averageDevice read latency.  This is the time the physical device from the HBA to the platter takes to service an IO request.millisecond
2disk.kernelReadLatency.averageKernel read latency.  This is the time the VMkernel takes to service an  IO.  This is the time between the guest OS and the device.millisecond
2disk.totalReadLatency.averageTotal read latency.  The sum of the device and kernel read latencies.millisecond
2disk.queueReadLatency.averageQueue Read Latencymillisecond
2disk.deviceWriteLatency.averageDevice write latency. This is the time the physical device from the HBA to the platter takes to service an IO request.millisecond
2disk.kernelWriteLatency.averageKernel write latency.  This is the time the VMkernel takes to service an  IO.  This is the time between the guest OS and the device.millisecond
2disk.totalWriteLatency.averageTotal write latency.  The sum of the device and kernel write latencies.millisecond
2disk.queueWriteLatency.averageQueue Write Latencymillisecond
2disk.deviceLatency.averagePhysical Device Command Latencymillisecond
2disk.kernelLatency.averageKernel Disk Command Latencymillisecond
2disk.queueLatency.averageQueue Command Latencymillisecond
3disk.numberRead.summationThe number of IO read operations in the previous sample period.  Note that these operations may be variable sized up to 64 KB.number
3disk.numberWrite.summationThe number of IO write operations in the previous sample period.  Note that these operations may be variable sized up to 64 KB.number
3disk.totalLatency.averageThis is the average total latency over the sample window.  Total latency  is the sum of kernel and device latency for both read and write  commands.millisecond
3disk.write.averageDisk Write RatekiloBytesPerSecond
4disk.usage.noneDisk Usage (None)kiloBytesPerSecond
4disk.usage.minimumDisk Usage (Minimum)kiloBytesPerSecond
4disk.usage.maximumDisk Usage (Maximum)kiloBytesPerSecond


Network Statistics

LevelCounter name in APIDescriptionunits
1net.usage.averageNetwork Usage (Average)kiloBytesPerSecond
2net.droppedRx.summationThe number of received packets that were dropped over the sample period.number
2net.droppedTx.summationThe number of transmitted packets that were dropped over the sample period.number
2net.received.averageAverage network throughput for received traffic.kiloBytesPerSecond
2net.transmitted.averageAverage network throughput for transmitted traffic.kiloBytesPerSecond
3net.packetsRx.summationNetwork Packets Receivednumber
3net.packetsTx.summationNetwork Packets Transmittednumber
4net.usage.noneNetwork Usage (None)kiloBytesPerSecond
4net.usage.minimumNetwork Usage (Minimum)kiloBytesPerSecond
4net.usage.maximumNetwork Usage (Maximum)kiloBytesPerSecond


Other Statistics

LevelCounter name in APIDescriptionunits
1sys.uptime.latestUptimesecond
1sys.heartbeat.summationHeartbeatnumber
1clusterServices.cpufairness.latestCPU Fairnessnumber
1clusterServices.memfairness.latestMemory Fairnessnumber
1clusterServices.effectivecpu.averageEffective CPU ResourcesmegaHertz
1clusterServices.effectivemem.averageEffective Memory ResourcesmegaBytes
1clusterServices.failover.latestCurrent failover levelnumber
3sys.resourceCpuUsage.averageResource CPU Usage (Average)megaHertz
3managementAgent.memUsed.averageMemory Used (Average)kiloBytes
3managementAgent.swapUsed.averageMemory Swap Used (Average)kiloBytes
3managementAgent.swapIn.averageMemory Swap In (Average)kiloBytesPerSecond
3managementAgent.swapOut.averageMemory Swap Out (Average)kiloBytesPerSecond
3rescpu.actav1.latestCPU Active (1 min. average)percent
3rescpu.actpk1.latestCPU Active (1 min. peak)percent
3rescpu.runav1.latestCPU Running (1 min. average)percent
3rescpu.actav5.latestCPU Active (5 min. average)percent
3rescpu.actpk5.latestCPU Active (5 min. peak)percent
3rescpu.runav5.latestCPU Running (5 min. average)percent
3rescpu.actav15.latestCPU Active (15 min. average)percent
3rescpu.actpk15.latestCPU Active (15 min. peak)percent
3rescpu.runav15.latestCPU Running (15 min. average)percent
3rescpu.runpk1.latestCPU Running (1 min. peak)percent
3rescpu.maxLimited1.latestCPU Throttled (1 min. average)percent
3rescpu.runpk5.latestCPU Running (5 min. peak)percent
3rescpu.maxLimited5.latestCPU Throttled (5 min. average)percent
3rescpu.runpk15.latestCPU Running (15 min. peak)percent
3rescpu.maxLimited15.latestCPU Throttled (15 min. average)percent
3rescpu.sampleCount.latestGroup CPU Sample Countnumber
3rescpu.samplePeriod.latestGroup CPU Sample Periodmillisecond
4sys.resourceCpuUsage.noneResource CPU Usage (None)megaHertz
4sys.resourceCpuUsage.maximumResource CPU Usage (Maximum)megaHertz
4sys.resourceCpuUsage.minimumResource CPU Usage (Minimum)megaHertz
Comments

Great job Scott!

Hi Scott, the 20 second interval means

"the average of the last 20 seconds" or

"the value on a particular second, with interval taken every 20 second"?

The 20-second interval means that the values recorded were accrued or averaged over 20 seconds. So, when "ready time" reports a number of 2000 ms, it means that for 2000 ms of the previous 20,000 ms sample period the vCPU was ready to run and not getting resources.

Scott,

It’s great document and I am using it (along with another your docs) all the time…

Just 1 quick question: is there any doc available that will give some kind of guidance for most important metrics? Something like that: in normal condition parameter xxx should be not more than 123; metric yyy never should exceed limit 321, otherwise…

Thanks,

olegarr

No, there is no document today that provides this guidance.

However, this is a great question and one that we've been pondering a bit lately. There is a good deal of demand for guidance from VMware on thresholds for these metrics to advise customers of "yellow" and "red" levels for these counters. We're looking into building something like this now but would like to back it with a deep investigation using data from real deployments. It'll take us some time.

Any news about guidance from VMware on thresholds? I´m looking for this kind of data.

Scott, some clarifications:

cpu.usagemhz.average

You write: "The CPU utilization. The maximum possible value here is the frequency of the processors times the number of cores. As an example, a VM using 4000 MHz on a system with four 2 GHz processors is using 50% of the CPU (4000 / (4 * 2000) = 0.5)"

I'd say: "The CPU utilization. The maximum possible value for a single VM is the frequency of the processors times the number of vCPUs of the VM. The maximum possible value of the sum of cpu.usagemhz of all VMs on one ESX host is the frequency of the processors times the number of cores of that host."

cpu.usage.average

You write: "The CPU utilization. This value is reported with 100% representing all processor cores on the system. As an example, a 2-way VM using 50% of a four-core system is completely using two cores."

I'd say: "The CPU utilization in percent. Example: Assume you have an ESX host with 8 cores and 2 GHz each, this means the host has a capacity of 16 GHz. For non-hyperthreaded systems this is 100%. So if each VM is running at 100% utilization the corresponding MHz values add up to this capacity. For hyperthreaded systems the 100% mark of the ESX host is 1.5 times higher (VMware assumes these systems are 1.5 times more powerful). I have not seen this. I hardly see VMs with more than 75% utilization on a hyperthreaded systems - even when they are running at maximum load. For this reason I would be careful with percentages, and rather stick to the MHz values reported. This also explains why you set reservations and limits in MHz and not in %"

Hi, is there an update for vSphere 5?

The official SDK (http://pubs.vmware.com/vsphere-50/index.jsp?topic=/com.vmware.wssdk.apiref.doc_50/right-pane.html) is a bit brief, and sometimes does not explain.

Scott,

You are saying that vc samples for 20 seconds?

I was always lead to believe it was a snapshot in time, taken every 20 seconds, stored in a flat file on the hosts and held in ram in vc.

I also understood that the flat file would hold 60 minutes worth of snapshots, resulting in 180 datapoints per metric.

http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.dcadmin.doc_41/vc_p...

Kevin

VC reports performance counters no more frequently than 20s.  Those stats are what are shown in the real time panel.  They are kept in the VC DB for an hour, I believe.  Then they rolled up (summed, averaged, etc.) for a longer period but at greater intervals.

Yes.

It's a shame the sampling frequency can't be increased. As you know, esxtop can get down to 2 second refreshes.

It is a shame.  In fact, esxtop can go faster than that.  How fast can you hit the spacebar?  Smiley Happy

resxtop not so good.

Concerning polling intervals, notice the roll-ups from 'Real Time' to 'Day' to 'Week' to 'Month':

see page 12 of vSphere Monitoring and Performance

http://pubs.vmware.com/vsphere-50/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-50-monitor...

What am I missing? When I run get-stattype for a VM it only returns

cpu.usage.average

cpu.usagemhz.average

cpu.ready.summation

mem.usage.average

mem.swapinRate.average

mem.swapoutRate.average

mem.vmmemctl.average

mem.consumed.average

mem.overhead.average

disk.maxTotalLatency.latest

net.usage.average

sys.uptime.latest

sys.heartbeat.summation

cpu.cpuentitlement.latest

mem.mementitlement.latest

disk.used.latest

disk.used.latest

disk.used.latest

disk.used.latest

disk.used.latest

disk.provisioned.latest

disk.unshared.latest

Cpu.usagemhz.maximum and minimum are not available. Why would this be?

This sentence needs to be update --> "This page has been  updated for vSphere 4, so the counter levels will differ slightly on  older versions of VC."

Also, could we have this updated to 5.5 please?

Thanks from Singapore.

e1

I agree with last post.  This type of documentation should be delivered with each release.

Anyone from VMware listening?  Smiley Wink

This documentation has been formalized within our regular tech docs here:

vSphere 5.5 Documentation Center

Thank you very much Mark!

Amir

Quick question for you Mark.  I'm assisting with the development of the vSphere monitoring capability of our proprietary appliance.   I posted this question and am curious if you have some thoughts on the matter:

vSphere API - Performance Monitoring Question - Should I use PerformanceManager or QuickStats to q...

After doing some initial research, I'm concluding that the most prudent route would be to use perfManager to query the realtime stats every 5 minutes.  It seems that only a few managed objects have quickstats and that would be a significant limitation.

We are a VMware partner and I'm also wondering if we could engage VMware for API assistance on this development project as well.

Thank you,

Amir

Version history
Revision #:
1 of 1
Last update:
‎05-29-2008 05:15 PM
Updated by: