VirtualCenter (VC) is the entry point for virtual platform management but is less frequently used for performance analysis than esxtop. On the surface, VC is insufficient for performance analysis. But this is not necessarily the case. The VirtualCenter performance counter collection is reduced by default to minimize the data maintained by VC's database. The performance counters maintained by VC can be modified and detailed analysis can be performed based on those counters. This document will provide details necessary for understanding and enabling VC's performance monitoring capabilities.
Refer to the Performance Monitoring and Analysis for information on using these counters.
Our stats infrastructure has a lot of counters but our documentation has traditionally been quite thin in terms of descriptions. I got so sick of asking what stats are available at what stats level that I decided to start this page. Obviously it needs to be made more readable, but hopefully it is a start.
Remember that stats in VC are generally organized into 2 archival categories:
The basic flow is this: an ESX host stores statistics at 20s granularity for a period of 1 hour. Therefore, using the Host Client one can view the stats for a host/VM for the past-hour, or one can view those stats using the VI client attached to VirtualCenter. ESX will also aggregate the statistics into past-day statistics and store them for up to 1 day. These past-day statistics are sent to VC periodically and then stored in the database. The database is responsible for periodically taking these past-day stats and rolling them up into 30-minute weekly stats, and then doing the same for converting the weekly stats to monthly stats, etc. Because past-day, past-week, and past-month stats are stored in the database, I call them "archived" stats.
Statistics level is a means of organizing statistics for archiving purposes. Its worth noting that only stats levels one and two are useful for deployment performance monitoring and analysis. Levels three and four provide granularity and visibility that is useful only for developers.
The concept of "stats level" applies only to the archived stats: we only store a stat in the database if we are at the appropriate stats level for that particular statistic. Non-archived stats are unaffected by stats level. In other words, every metric listed below is collected at 20s granularity and stored on the ESX host for 1 hour. However, unless VC is set to the stats level appropriate to that statistic, we will not store the data in the database or rollup the stat into a past-day stat on the ESX host. You can specify the stats level independently for each of the archiving interval. In other words, you might want to store level 4 stats for up to 1 day, but level 3 stats for 1 week.
In practice, we use stats level to vary the level of detail for statistics that are archived. At stats level 1, we have pretty coarse-grained stats, while stats level 4 contains very detailed statistics, and also includes statistics for various instances (e.g., for each NIC of a VM).
There are 3 important calls that I often use for stats (please refer to the SDK documentation for more information):
Let me give a concrete example of stats level.
Suppose I want to know the value of mem.consumed.maximum for a given VM. This is the maximum amount of machine memory allocated to a VM (including overhead memory) over a specified interval. As shown below, this is a "level 4" statistic. This means that if I've set the stats level to 4 for past-day stats and then formulate a QuerySpec that asks for the value of this data 20 minutes ago at "past-day" granularity (i.e., at 5-minute granularity), then I will get a value. If the stats level is 2 for past-day (5-minute granularity) statistics, however, then such a query will not return a value, because it is level-4 stat and only level-2/level-1 stats are being stored at 5-minute granularity. In contrast, even if the stats level is 1, then if I formulate a QuerySpec with 20s (i.e., "real-time" or "past hour") as the interval of collection, I will get this value, because this data is stored for up to one hour at 20s granularity no matter what the stats level.
Understanding the update interval is a key component to understanding the performance statistics. The Virtual Infrastructure Client (VIC) displays live stats at a 20s update frequency. Archived stats are archived at their archive frequency. This is key to understanding the relative amounts of data presented by VC.
For instance, a ready time of 1,000ms in the VIC's live stats graph translates into 5% ready time (1,000 / 20,000.) The same amount of ready time in a five minute archival frequency would be 15,000 ms.
For a list of all counters, see the vCenter Performance Counters page.
This is useful information.
Would it be possible here or in the counters list to start providing definitions for some of the key performance counters? I am doing some CPU accounting and have struggled to understand the relationships between the following CPU counters: usage, usagemhz, system, wait, ready, extra, used, and guaranteed. This article: http://kb.vmware.com/kb/1002356 provided a good start, but I still have questions. The counter definitions in the Programming Guide are self-referencing so not useful.
Some specific questions:
- Is usage % a percentage of multiple potential PCPUs, if the number of VCPUs > 1? Which of the time components represented by the msec counters are included in this % "busy"?
- What does a CPU time unit of MHz mean? I'm used to that metric as a clock rate, not a consumption metric.
- Which of the counters (system, wait, ready, extra, guaranteed) are also included in the counter "used"?
- What is a short definition for the counters "extra" and "guaranteed"?
Hello,
I included information in this document and the VirtualCenter Performance Counters page to answer your questions. Note that wait, extra, guaranteed and system are level three counters that provide you no information to guide your own monitoring and analysis work.
Scott