<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:clearspace="http://www.jivesoftware.com/xmlns/clearspace/rss" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>VMware Communities : Document List - Performance &amp; VMmark</title>
    <link>http://communities.vmware.com/community/vmtn/general/performance?view=documents</link>
    <description>Latest Documents in Performance &amp; VMmark</description>
    <language>en</language>
    <pubDate>Thu, 15 Oct 2009 21:09:30 GMT</pubDate>
    <generator>Clearspace 1.10.12 (http://jivesoftware.com/products/clearspace/)</generator>
    <dc:date>2009-10-15T21:09:30Z</dc:date>
    <dc:language>en</dc:language>
    <item>
      <title>IOmega ix4-200d IOmeter results (100MB/Full Network)</title>
      <link>http://communities.vmware.com/docs/DOC-10925</link>
      <description />
      <pubDate>Thu, 15 Oct 2009 21:07:59 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10925</guid>
      <dc:date>2009-10-15T21:07:59Z</dc:date>
      <clearspace:dateToText>1 month, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Advanced Networking Performance Options</title>
      <link>http://communities.vmware.com/docs/DOC-10892</link>
      <description>Some of the advanced networking options available in vSphere 4.0 are reviewed in this paper. Many of these options control trade-offs between latency, throughput, CPU utilization, and reliability (e.g., dropped packets). It is not possible to optimize all of these at the same time, so option defaults are chosen to be suitable for the vast majority of applications. These options are provided to meet the stricter requirements of other applications. Advanced options often have subtle side effects, or merely move an issue from one area to another. Therefore it is recommended that VMware Support be engaged before changing such options, especially for production machines.&lt;br /&gt;
&lt;br /&gt;
There are over 100 options that can be set under Configuration &amp;rarr; Advanced Settings &amp;rarr; Net. Of these, the ones listed below are most likely to be useful for tuning networking performance. Many of the others are for internal testing or enable unreliable features.&lt;br /&gt;
&lt;br /&gt;
All of the options listed here take integer values. For the &amp;ldquo;Boolean&amp;rdquo; ones only the default value is shown: 0 for &amp;ldquo;false&amp;rdquo;, and 1 for &amp;ldquo;true&amp;rdquo;. Other parameters are shown with their default, minimum, and maximum values.&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Parameter Name&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;(Default, Minimum, Maximum)&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MaxPortRxQueueLen&lt;/td&gt;
&lt;td&gt;(80, 1, 500)&lt;/td&gt;
&lt;td&gt;Maximum length of the Rx queue for virtual ports whose clients support queueing. Possibly should be increased if Rx packet drops are seen in the port connected to a VM. Relevant only for e1000 vNICs used with Fault Tolerance (FT) and VLANs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MaxNetifTxQueueLen&lt;/td&gt;
&lt;td&gt;(500, 1, 1000)&lt;/td&gt;
&lt;td&gt;Maximum length of the Tx queue for the physical NICs. Increase if Tx packet drops are seen in uplink port to the pNIC.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GuestTxCopyBreak&lt;/td&gt;
&lt;td&gt;(64, 60, 4294967295)&lt;/td&gt;
&lt;td&gt;Packet header transmits smaller than this in bytes will be copied rather than mapped. More security and functionality than performance implications.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VmxnetTxCopySize&lt;/td&gt;
&lt;td&gt;(256, 0, 4294967295)&lt;/td&gt;
&lt;td&gt;Transmits smaller than this in bytes will be copied rather than mapped. Copying costs CPU but puts lets pressure on the Tx queue and doesn&amp;rsquo;t require completion.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VmxnetWinUDPTxFullCopy&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;Enable full copy of Windows vmxnet UDP Tx packets. Might disable to save CPU, especially for jumbo frames, at the cost of risking more packet drops.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NetTxDontClusterSize&lt;/td&gt;
&lt;td&gt;(0, 0, 8192)&lt;/td&gt;
&lt;td&gt;Tx packet size (in bytes) smaller than this are transmitted immediately (coalescing options are over-ruled for these packets). Used to ensure good latency for small packets.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceTxTimeout&lt;/td&gt;
&lt;td&gt;(4000, 1, 4294967295)&lt;/td&gt;
&lt;td&gt;The coalesce timeout in micro-seconds, or effectively the maximum latency without transmitting. Smaller values can reduce the packet latency at the cost of CPU. Risky to go below 1000.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceDefaultOn&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;Enable dynamic coalescing. Disable to test if issues are related to coalescing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceHandlerPcpu&lt;/td&gt;
&lt;td&gt;(1, 0, 128)&lt;/td&gt;
&lt;td&gt;pCPU that coalesce timeout handler runs on. May be important to set this if VM CPU pinning is used.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceTxQDepthCap&lt;/td&gt;
&lt;td&gt;(40, 0, 80)&lt;/td&gt;
&lt;td&gt;Maximum number of &amp;ldquo;normalized&amp;rdquo; Tx packets to coalesce. Reduce if Tx coalescing appears to be too aggressive.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceRxQDepthCap&lt;/td&gt;
&lt;td&gt;(40, 0, 80)&lt;/td&gt;
&lt;td&gt;Maximum number of &amp;ldquo;normalized&amp;rdquo; Rx packets to coalesce. Reduce if Rx coalescing appears to be too aggressive.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vmxnetThroughputWeight&lt;/td&gt;
&lt;td&gt;(0, 0, 255)&lt;/td&gt;
&lt;td&gt;How far to favor Tx throughput for vmxnet 2 &amp;#38; 3. &amp;ldquo;0&amp;rdquo; is dynamic, otherwise this is a weight where a lower value favors latency and a higher value favors throughput.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TcpipHeapSize&lt;/td&gt;
&lt;td&gt;(24, 24, 120)&lt;/td&gt;
&lt;td&gt;Initial size of the TCP/IP module heap in megabytes. May need to increase if there are many vmkernel connections (NFS, iSCSI, etc.).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TcpipDefLROMaxLength&lt;/td&gt;
&lt;td&gt;(16000, 1, 65535)&lt;/td&gt;
&lt;td&gt;Maximum length for the LRO aggregated packet for vmkernel connections. Increasing this reduces the number of acknowledgments, which improves efficiency but may increase latency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E1000TxZeroCopy&lt;/td&gt;
&lt;td&gt;(0)&lt;/td&gt;
&lt;td&gt;If disabled copy UDP or non-TSO Tx packets for e1000.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E1000TxTsoZeroCopy&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;If enabled do not copy TSO Tx packets for e1000.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E1000IntrCoalesce&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;Enable interrupt coalescing for e1000. Disabling can improve latency at the expense of CPU.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MaxPktRxListQueue&lt;/td&gt;
&lt;td&gt;(3500, 0, 200000)&lt;/td&gt;
&lt;td&gt;Maximum number of packets queued in vmkernel. Increasing this can reduce the number of dropped packets but at the cost of increased vmkernel memory and queuing latency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vmxnet3RSSHashCache&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;Enable RSS hash cache for vmxnet3 in Windows guests.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VmklnxLROEnabled&lt;/td&gt;
&lt;td&gt;(0)&lt;/td&gt;
&lt;td&gt;Enable large packets for recent Linux guests with vmxnet 2 &amp;#38; 3. Most likely to benefit hosts with small number of VMs with few sessions each, where each session has a heavy Rx load (more than 1 MB/sec). This is an experimental feature and has not been tested extensively.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VmklnxLROMaxAggr&lt;/td&gt;
&lt;td&gt;(6, 0, 24)&lt;/td&gt;
&lt;td&gt;Maximum aggregation count in number of packets for vmklinux LRO.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxi</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <pubDate>Wed, 14 Oct 2009 16:58:33 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10892</guid>
      <dc:date>2009-10-14T16:58:33Z</dc:date>
      <clearspace:dateToText>3 weeks, 1 day ago</clearspace:dateToText>
    </item>
    <item>
      <title>rename_vm.sh</title>
      <link>http://communities.vmware.com/docs/DOC-10857</link>
      <description />
      <category domain="http://communities.vmware.com/tags?communityID=2427">rename</category>
      <category domain="http://communities.vmware.com/tags?communityID=2427">vm</category>
      <pubDate>Wed, 07 Oct 2009 14:00:46 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10857</guid>
      <dc:date>2009-10-07T14:00:46Z</dc:date>
      <clearspace:dateToText>1 month, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>clone_vm2.sh</title>
      <link>http://communities.vmware.com/docs/DOC-10856</link>
      <description />
      <category domain="http://communities.vmware.com/tags?communityID=2427">cloning</category>
      <category domain="http://communities.vmware.com/tags?communityID=2427">vmware</category>
      <pubDate>Wed, 07 Oct 2009 13:57:43 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10856</guid>
      <dc:date>2009-10-07T13:57:43Z</dc:date>
      <clearspace:dateToText>1 month, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>VAM Doc</title>
      <link>http://communities.vmware.com/docs/DOC-10718</link>
      <description>This About Business Marketing.</description>
      <pubDate>Mon, 14 Sep 2009 12:52:51 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10718</guid>
      <dc:date>2009-09-14T12:52:51Z</dc:date>
      <clearspace:dateToText>2 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Memory Performance Chart Metrics in the vSphere Client</title>
      <link>http://communities.vmware.com/docs/DOC-10398</link>
      <description>The vSphere Client exposes several memory performance statistics for users to identify VM memory usage.&lt;br /&gt;
&lt;br /&gt;
Some of the important memory performance metrics follow. Each metric name appears under the &lt;b&gt;Measurement&lt;/b&gt; column of the Performance Chart Legend, as shown in the following screenshot:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/6421/Performance_Charts_VM.png" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/6421/Performance_Charts_VM.png" class="jive-image"  /&gt; &lt;br /&gt;
&lt;p /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Active:&lt;/b&gt; The amount of guest physical memory that is being used by the VM. Active memory may be different from what is seen inside the guest operating system. This is because the guest operating system generally has a more precise view about what memory is &amp;ldquo;active&amp;rdquo; than the hypervisor because it knows when applications allocate or deallocate memory. In addition, the sampling technique used by ESX often takes time to converge, so the memory usage measured in the guest operating system may be more accurate when the workload memory usage is fluctuating.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Shared:&lt;/b&gt; The amount of guest physical memory shared through transparent page sharing. This includes the memory shared with other VMs and the memory shared within the VM.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Consumed:&lt;/b&gt; The amount of host physical memory allocated to the VM, accounting for saving from memory sharing with other VMs. When multiple VMs share a host memory region, each VM is accounted to consume the shared memory proportionally based on the total references to that host memory. For example, if a VM has 100MB host memory equally shared with the other three VMs, the Consumed memory only accounts for 25MB. If the 100MB memory is only shared within the VM, the Consumed memory accounts for 100MB.  &lt;br clear="all" /&gt;  &lt;br clear="all" /&gt; 	Note that for a host that is not memory overcommitted, the Consumed memory represents a &amp;ldquo;high water mark&amp;rdquo; of the memory usage by the VM. It is possible that in the past, the VM was actively using a large amount of host physical memory but currently it is not. Because host memory is not overcommitted, the Consumed memory will not be shrunk through ballooning or swapping. Hence, the Consumed memory could be much higher than the Active memory when host memory is not overcommitted.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Granted:&lt;/b&gt; The amount of guest physical memory currently backed by the host physical memory. Due to memory sharing, the Granted memory is greater than or equal to the Consumed memory. For instance, assuming a guest allocates 100MB memory while the whole memory are zeroes, once all the zeroed pages are shared, the VM&amp;rsquo;s Granted memory is 100MB but the VM&amp;rsquo;s Consumed memory is only 4k.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Overhead:&lt;/b&gt; The extra host physical memory used by the ESX to run a VM. The Overhead memory has two components: 1) System wide overhead from VMkernel; 2) Additional overhead for each VM, including the space reserved for the VM frame buffer and various virtualization data structures. Since the Overhead memory always resides in host memory, ESX must reserve memory for it. Thus a VM&amp;rsquo;s memory reservation has two individual components: user-specified memory reservation and overhead memory reservation. For example, if the user specifies a 1GB reservation and the Overhead memory for the VM is 100MB, the VM&amp;rsquo;s memory reservation when powered on would be 1.1GB.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Balloon:&lt;/b&gt; The amount of guest physical memory that is currently reclaimed through the balloon driver.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Swapped:&lt;/b&gt; The amount of guest physical memory swapped out to the VM&amp;rsquo;s swap device by ESX.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Swapped in rate:&lt;/b&gt; The rate at which the host physical memory is being swapped in from the host swap device.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Swapped out rate:&lt;/b&gt; The rate at which the host physical memory is being swapped out to the host swap device.&lt;/li&gt;
&lt;/ul&gt;</description>
      <pubDate>Thu, 23 Jul 2009 22:05:34 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10398</guid>
      <dc:date>2009-07-23T22:05:34Z</dc:date>
      <clearspace:dateToText>4 months, 6 days ago</clearspace:dateToText>
    </item>
    <item>
      <title>Performance Troubleshooting for VMware vSphere 4 and ESX 4.0</title>
      <link>http://communities.vmware.com/docs/DOC-10352</link>
      <description>Performance problems can arise in any computing environment. Complex application behaviors, changing demands, and shared infrastructure can lead to problems arising in previously stable environments. Troubleshooting performance problems requires an understanding of the interactions between the software and hardware components of a computing environment. Moving to a virtualized computing environment adds new software layers and new types of interactions that must be considered when troubleshooting performance problems. &lt;br /&gt;
&lt;br /&gt;
The attached document is the first installment in a guide covering performance troubleshooting in a vSphere environment. It uses a guided approach to lead the reader through the observable manifestations of complex hardware/software interactions in order to identify specific performance problems. For each problem covered, it includes a discussion of the possible root-causes and solutions. Topics covered include performance problems arising from issues in the CPU, memory, storage, and network subsystems, as well as in the VM and ESX host configuration.  Guidance is given on relevant performance metrics to observe using the vSphere Client and esxtop in order to isolate specific performance issues.  &lt;br /&gt;
&lt;br /&gt;
This first installment of &lt;i&gt;Performance Troubleshooting for VMware vSphere 4&lt;/i&gt; covers performance troubleshooting on a single VMware ESX 4.0 host. It focuses on the most common performance problems which affect an ESX host. Future updates will add more detailed performance information, including troubleshooting information for more advanced problems and multi-host vSphere deployments.&lt;br /&gt;
&lt;br /&gt;
This is a living document. Reader comments, questions, and suggestions are encouraged.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">performance</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">performance_issues</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">troubleshooting</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">slow</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">problem</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vsphere_performance</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vsphere</category>
      <pubDate>Mon, 13 Jul 2009 14:03:44 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10352</guid>
      <dc:date>2009-07-13T14:03:44Z</dc:date>
      <clearspace:dateToText>2 months, 1 week ago</clearspace:dateToText>
      <clearspace:replyCount>4</clearspace:replyCount>
    </item>
    <item>
      <title>Storage Workload Characterization and Consolidation in Virtualized Enviornments</title>
      <link>http://communities.vmware.com/docs/DOC-10104</link>
      <description />
      <category domain="http://communities.vmware.com/tags?communityID=2629">oracle</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">exchange</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vscsistats</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">iscsi</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">san</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">nfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vmfs</category>
      <pubDate>Wed, 03 Jun 2009 00:21:48 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10104</guid>
      <dc:date>2009-06-03T00:21:48Z</dc:date>
      <clearspace:dateToText>5 months, 3 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>vscsiStats: Fast and Easy Disk Workload Characterization on VMware ESX Server</title>
      <link>http://communities.vmware.com/docs/DOC-10084</link>
      <description />
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">iscsi</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">fc</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">san</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">fibre</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">nas</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">nfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vmfs</category>
      <pubDate>Mon, 01 Jun 2009 22:53:39 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10084</guid>
      <dc:date>2009-06-01T22:53:39Z</dc:date>
      <clearspace:dateToText>5 months, 4 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Using vscsiStats for Storage Performance Analysis</title>
      <link>http://communities.vmware.com/docs/DOC-10095</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
esxtop is a great tool for performance analysis of all types.  However, with only latency and throughput statistics, esxtop will not provide the full picture of the storage profile.  Furthermore, esxtop only provides latency numbers for Fibre Channel and iSCSI storage.  Latency analysis of NFS traffic is not possible with esxtop.&lt;br /&gt;
&lt;br /&gt;
Since ESX 3.5, VMware has provided a tool specifically for profiling storage: vscsiStats.  vscsiStats collects and reports counters on storage activity.  Its data is collected at the virtual SCSI device level in the kernel.  This means that results are reported per VMDK (or RDM) irrespective of the underlying storage protocol.  The following data are reported in histogram form:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;IO size&lt;/li&gt;
&lt;li&gt;Seek distance&lt;/li&gt;
&lt;li&gt;Outstanding IOs&lt;/li&gt;
&lt;li&gt;Latency (in microseconds)&lt;/li&gt;
&lt;li&gt;More!&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Running vscsiStats&lt;/h1&gt;
vscsiStats collection and analysis requires two steps:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Start statistics collection.&lt;/li&gt;
&lt;li&gt;View accrued statistics.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Documentation on command-line parameters are available when running '/usr/lib/vmware/bin/vscsiStats -h'.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Starting and Stopping vscsiStats Collection&lt;/h2&gt;
The tool is started with the following command:&lt;br /&gt;
&lt;pre class="jive-pre"&gt;&lt;code class="jive-code jive-plain"&gt;/usr/lib/vmware/bin/vscsiStats -s -w &amp;lt;world_group_id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
This command starts the process that will accrue statistics.  The world group ID must be set to a running virtual machine.  The running VMs' IDs can be obtained by running '/usr/lib/vmware/bin/vscsiStats -l'.&lt;br /&gt;
&lt;br /&gt;
After about 30 minutes vscsiStats will stop running.  If the analysis is needed for a longer period, the start command should be repeated above in this window.  That will defer the timeout and termination by another 30 minutes.&lt;br /&gt;
&lt;br /&gt;
Since results are accrued and reported out in summary, the histograms will include data since collection was started.  To reset all counters to zero, run '/usr/lib/vmware/bin/vscsiStats -r'.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Viewing Statistics&lt;/h2&gt;
Counters are displayed by using the following command:&lt;br /&gt;
&lt;pre class="jive-pre"&gt;&lt;code class="jive-code jive-plain"&gt;/usr/lib/vmware/bin/vscsiStats -p &amp;lt;histo_type&amp;gt; [-c]
&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
The histogram type is used to specify either all of the statistics or one group of them.  Options include all, ioLength, seekDistance, outstandingIOs, latency, interarrival.&lt;br /&gt;
&lt;br /&gt;
Results can be produced in a more compact comma-delimited list by adding the optional "-c" above.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Using vscsiStats Results&lt;/h1&gt;
&lt;h2&gt;Use Case 1: Identifying Sequential IO&lt;/h2&gt;
Storage arrays can process sequential IO much faster than random IO.  You can therefore improve the performance of a sequential workload by placing it on a dedicated LUN to allow the array to optimize access.  vscsiStats can help you identify your sequential workloads even if you don't understand anything about the application in the VM.&lt;br /&gt;
&lt;br /&gt;
Take the following graph as example, which I generated by running '/usr/lib/vmware/bin/vscsiStats -p seekDistance':&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5910/random_write_histo.png" alt="random_write_histo.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5910/random_write_histo.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
This graph shows that most of the commands are being issued a great distance from the previous command.  It looks like all of the commands were 50,000 or more logical blocks away from the previous command.  When I looked at the raw data, I saw that over 99% of the commands were more than 128 blocks away from the previous command.  That's random access if I've ever seen it.  Here's the opposite example:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5908/sequential_write_histo.png" alt="sequential_write_histo.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5908/sequential_write_histo.png');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
In this case the logical block number (LBN) of each command is most frequently exactly one larger than the previous command.  That's the signature of a heavily sequential workload.  It shouldn't surprise you to learn that both of these profiles were generated by Iometer using random and sequential writes, respectively. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Use Case 2: Optimizing for IO Sizes&lt;/h2&gt;
The IO size is an important characteristic of storage profiles.  A variety of best practices have been provided by storage vendors to enable customers to tune their storage to a particular IO size.  As an example, it may make sense to optimize an array's stripe size to its average IO size.  vscsiStats can provide a histogram of IO sizes to help this process.  The following graph was generated by '/usr/lib/vmware/bin/vscsiStats -p ioLength':&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5911/io_size_4k.png" alt="io_size_4k.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5911/io_size_4k.png');return false;"/&gt; &lt;br /&gt;
&lt;br /&gt;
From these results I can see that about a quarter of the commands came in IOs smaller than 4k.  About half of the commands were sized to 4k commands.  The minute number of remaining IOs were larger than 4k.  This signature is common of a VMDK formatted to 4k blocks and supporting OS and application execution.  The storage array should be optimized for 4k blocks if this disk's performance is a priority. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Use Case 3: Storage Latency Analysis (Including NFS!)&lt;/h2&gt;
esxtop is a terrific tool for latency-based storage analysis.  Fibre Channel and iSCSI HBAs have device and kernel latencies in esxtop's storage panel.  Software iSCSI initiators will show up as vmhba32 (ESX 3.5 and earlier) and vmhba33 (ESX 4.0 and later.)  But esxtop does not provide latency statistics for NFS stores.&lt;br /&gt;
&lt;br /&gt;
Because vscsiStats collects its results where the guest interacts with the hypervisor, it is unaware of the storage implementation.  Latency statistics can be collected for all storage configurations with this tool.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5917/latency.png" alt="latency.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5917/latency.png');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
The above graph shows that the server in my office with a single direct-attached SCSI disk is performing as I would expect.  About half of all the operations are completing in under 5 ms.  The other half take 5-15 ms to complete.  A few commands took longer than 15 ms, but the number is so small that it doesn't concern me.  Similar results can be seen with NFS arrays.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;vscsiStats on ESXi&lt;/h1&gt;
vscsiStats can be installed on ESXi hosts after putting the host into tech support mode.  More information on this process is availalble on &lt;a class="jive-link-external" href="http://vpivot.com/2009/10/21/vscsistats-for-esxi/"&gt;Scott's blog on the subject on vPivot&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Additional Resources&lt;/h1&gt;
My colleagues Ajay Gulati, Chethan Kumar, and Irfan Ahmad presented at VPACT 09 &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-10104" title="This paper presents workload characterization study of three top-tier enterprise applications using VMware ESX server hypervisor. We further separate out different components (for example data, index and redo log in a database) of these workloads to understand their behavior in isolation.  We find that most workloads show highly random access patterns. Next, we study the impact of storage consolidation on workloads (both random and sequential) and their burstiness."&gt;Storage Workload Characterization and Consolidation in Virtualized Enviornments&lt;/a&gt;.  This paper serves as an excellent example of vscsiStats in action.&lt;br /&gt;
&lt;br /&gt;
I learned vscsiStats by reviewing Irfan's VMworld 2007 presentation (&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-10084" title="The presentation deck delivered by Irfan Ahman at VMworld 2007.  This details a powerful storage analysis tool that has been packaged since ESX 3.5."&gt;vscsiStats: Fast and Easy Disk Workload Characterization on VMware ESX Server&lt;/a&gt;) and playing with the tool.  Check out his presentation if you'd like more detail.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxi</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vscsistats</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">iscsi</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">nfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">san</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">nas</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vmfs</category>
      <pubDate>Fri, 29 May 2009 22:28:27 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10095</guid>
      <dc:date>2009-05-29T22:28:27Z</dc:date>
      <clearspace:dateToText>1 month, 6 days ago</clearspace:dateToText>
      <clearspace:replyCount>9</clearspace:replyCount>
    </item>
    <item>
      <title>Meet the Engineer Series: VMware Performance Advancements</title>
      <link>http://communities.vmware.com/docs/DOC-10070</link>
      <description>&lt;b&gt;&lt;i&gt;Real people, real faces, discussing VMware vSphere topics...&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;hr /&gt;
In this video, VMware's Chief Performance Architect discusses why you should seriously consider virtualizing all of your applications on VMware:&lt;br /&gt;
&lt;br /&gt;
{youtube}&lt;a class="jive-link-external" href="http://www.youtube.com/watch?v=I-W0ZWm5Jf4"&gt;http://www.youtube.com/watch?v=I-W0ZWm5Jf4&lt;/a&gt;{youtube}</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">performance_advancements</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualizing_applications</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">cpu_architecture</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">binary_translation</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">hyper_assist</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">io_stack</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vsphere_performance</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vsphere_videos</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vmware_meet_the_engineer</category>
      <pubDate>Thu, 21 May 2009 23:41:52 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-10070</guid>
      <dc:date>2009-05-21T23:41:52Z</dc:date>
      <clearspace:dateToText>6 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>ESX Monitor Modes</title>
      <link>http://communities.vmware.com/docs/DOC-9882</link>
      <description>VMware has supported Intel and AMD's virtualization assist since 2006.  Long before then we were using an all-software approach that we call binary translation (BT).  With the benefit of years of development and optimization, BT outperformed the early versions of hardware assist.  But as hardware assist evolved the use of these new features became more attractive.&lt;br /&gt;
&lt;br /&gt;
Because our support for hardware assist is rich and BT is heavily optimized, the monitor can benefit from using either technology in different situations.  The following tables detail the defaults in ESX 4.0, which can be changed through VM settings if desired. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Monitor Defaults with Intel Processors &lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;VM Configuration&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Core-i7 (Nehalem)&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;45nm Core2 with VT-x&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;65nm Core2 with VT-x and FlexPriority&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;65nm Core2 with VT-x and No FlexPriority&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;P4 with VT-x&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;EM64T without VT-x&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;No EM64T&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FT enabled&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;64-bit guests&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VMI enabled&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenServer, UnixWare, OS/2&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit Linux and 32-bit FreeBSD&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit Windows XP, Windows Vista, Windows Server 2003, Windows Server 2008&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows 2000, Windows NT, DOS, Windows 95, Windows 98, Netware, 32-bit Solaris&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All other 32-bit guests&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
(*) When we use BT on an Intel system with VT-x capability, we dynamically switch to VT-x if the guest enters long mode.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Monitor Defaults with AMD Processors &lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Configuration&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Barcelona, Phenom, and Newer&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;AMD64 pre-Barcelona&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;No AMD64&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FT enabled&lt;/td&gt;
&lt;td&gt;AMD-V + SPT&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;64-bit guests&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VMI enabled&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenServer, UnixWare, OS/2&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit Linux and 32-bit FreeBSD&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit Windows XP, Windows Vista, Windows Server 2003, Windows Server 2008&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows 2000, Windows NT, DOS, Windows 95, Windows 98, Netware, 32-bit Solaris&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All other 32-bit guests&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h2&gt;Legend&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;VT-x: Intel's virtualization hardware assist.&lt;/li&gt;
&lt;li&gt;EPT: &lt;i&gt;Extended Page Tables.&lt;/i&gt;  Intel's on-board, virtualization-aware memory management unit (MMU).&lt;/li&gt;
&lt;li&gt;EM64T: Intel's 64-bit extensions to the x86 architecture.&lt;/li&gt;
&lt;li&gt;SPT: &lt;i&gt;Shadow page tables.&lt;/i&gt;  ESX's software memory management unit (i.e., not EPT or RVI.)&lt;/li&gt;
&lt;li&gt;BT: &lt;i&gt;Binary translation.&lt;/i&gt;  ESX's software virtualization capability (i.e., not VT or AMD-V)&lt;/li&gt;
&lt;li&gt;AMD-V: AMD's virtualization hardware assist.&lt;/li&gt;
&lt;li&gt;RVI: &lt;i&gt;Rapid Virtualization indexing.&lt;/i&gt;  AMD's on-board, virtualization-aware memory management unit (MMU).&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">monitor</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esx</category>
      <pubDate>Tue, 28 Apr 2009 20:40:05 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-9882</guid>
      <dc:date>2009-04-28T20:40:05Z</dc:date>
      <clearspace:dateToText>5 months, 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>Storage Performance: VMFS and Protocols</title>
      <link>http://communities.vmware.com/docs/DOC-9696</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
VMware's customers are always asking us about the storage stack.  Without exception, the two most common questions about our storage system performance are:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Which storage protocol performs best?&lt;/li&gt;
&lt;li&gt;Does VMFS scale to meet the demands of many servers and VMs?&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
This document will contain a few of the points needed to help understand this issue.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Storage Protocols&lt;/h1&gt;
VMware published a &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/storage_protocol_perf.pdf"&gt;paper comparing storage protocols&lt;/a&gt; in 2008.  This paper detailed the two key characteristics of ESX's storage stack:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;The hypervisor is easily able to drive the storage connection to link speed.&lt;/li&gt;
&lt;li&gt;Configurations where protocol management happens in the HBA (Fibre Channel and HW iSCSI) are more CPU efficient.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
On the first note, take the following graph, taken from page three of the paper:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/5618/protocol_throughput.png" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/5618/protocol_throughput.png" class="jive-image"  /&gt;  &lt;br /&gt;
&lt;br /&gt;
Note that in this case all four test cases drive the storage to link speed.  That's 2 Gb/s with the Fibre Channel HBA and 1 Gb/s with the other three.  In short, if throughput is your goal, make decisions based on link speed.  If you check through the rest of the paper, you'll see that response time is similar for all of the configurations, as well.  But you will see slight differences in throughput in some of the protocols.&lt;br /&gt;
&lt;br /&gt;
This brings us to the second point from above: less work is done by the CPU when protocol management can be off-loaded to the HBA.  This means that FC and HW iSCSI HBAs will have additional CPU cycles for the VMs' work.  It can also explain the slight differences in throughput in the other graphs in the paper.  The efficiency results quoted in the paper are here:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9696-5-5619/protocol_efficiency.png" alt="protocol_efficiency.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9696-5-5619/protocol_efficiency.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
The increased overheads of running software iSCSI or NFS are due to the VMkernel managing those protocols.  It's worth noting that the proliferation of iSCSI in the enterprise has led VMware to spend considerable effort to improve the efficiency of SW iSCSI.  Expect its efficiency to improve dramatically in the following releases. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;VMFS Scalability&lt;/h1&gt;
Many in the industry erroneously believe that VMFS won't scale as storage demands grow.  Often SCSI reservations and disk locking are cited as the technical-sounding but vaguely-supported reason for this claim.  It's worth sampling data from our &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/scalable_storage_performance.pdf"&gt;scalable storage performance paper&lt;/a&gt; to debunk this myth.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9696-5-5623/vmfs_scalability.png" alt="vmfs_scalability.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9696-5-5623/vmfs_scalability.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
This chart is a favorite in our world-wide tours as we address VMFS scalability.  It's was first introduced in a VMFS scalability blog article that went live in February of 2008.  It shows the results of using 64 hosts to generate a variety of traffic on a single VMFS volume.  And it's a wealth of information on VMFS and storage access patterns.  For instance:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;The aggregate number of random writes, in cyan in the middle, maintains perfectly flat linear scalability as the host count grows from 1 to 64.&lt;/li&gt;
&lt;li&gt;The aggregate number of random reads is initially limited by the few disks being accessed but ultimately matches the throughput of random writes as many disks come to bear to serve the large number of random reads.&lt;/li&gt;
&lt;li&gt;The sequential read activity, which highlights the strengths of today's arrays, demonstrates the largest total throughput which only slightly drops as the array manages so many connections.&lt;/li&gt;
&lt;li&gt;But the sequential read activity drops off dramatically as hosts are added.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
This last example showing degradation in aggregate sequential read capability is an artifact of the workload that is very important to database administrators: multiple sequential reads approximate random activity.  Why is this?  As many hosts request more and more sequential data, the array interleaves these requests to maintain response times.  This means that the sequential accesses get "shuffled" which results in a random access pattern.&lt;br /&gt;
&lt;br /&gt;
In short, VMFS has no scalability problems as many hosts drive tremendous amounts of traffic to a single volume.  If the data isn't convincing enough, consider the following: there are no SCSI reservations used during normal data access.  This means that there are no scalability limitations as a result of virtual machine storage access.  A word of caution, though: the file system is locked during administrative operations that change the metadata on the volume.  This means that virtual machine creation or destruction can will result in file system locks.  Perform these operations off of peak hours.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vmfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">iscsi</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">fc</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">nfs</category>
      <pubDate>Fri, 13 Mar 2009 00:12:37 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-9696</guid>
      <dc:date>2009-03-13T00:12:37Z</dc:date>
      <clearspace:dateToText>7 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for IBM Lotus Domino</title>
      <link>http://communities.vmware.com/docs/DOC-9671</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
This page provides the best practices for virtualizing IBM Lotus Domino using VMware Infrastructure. This list is based on my &lt;a class="jive-link-external" href="http://www.vmworld.com/docs/DOC-2252"&gt;VMworld 2008 session EA2348&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;General Recommendations &lt;/h1&gt;
Use newer hardware &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Supports latest hardware assist technologies, larger on-processor cache&lt;/li&gt;
&lt;li&gt;64-bit may not perform better with older hardware&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Use VMware ESX , that uses bare-metal or hypervisor architecture &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;You can start with VMware ESXi - the free version.&lt;/li&gt;
&lt;li&gt;Do not use the VMware Workstation or Server, that use the hosted architecture.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
VMware ESX allows you the choice of virtualization technology best suited for your workload &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Hardware Assist (AMD, Intel) (both CPU and MMU virtualization) if your hardware supports it&lt;/li&gt;
&lt;li&gt;Paravirtualization (if you use SLES for your Domino deployment)&lt;/li&gt;
&lt;li&gt;Binary Translation&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Migrate to latest version of ESX &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;E.g. ESX 3.5 defaults to 2nd Generation Hardware Assist if available, has several I/O Performance improvements&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Lotus Domino: Plan to migrate to version 8.0 &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Significant performance improvements, specially disk I/O&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Provide Redundancy to the ESX host &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Power supplies, HBAs, NICs, Network and SAN switches&lt;/li&gt;
&lt;li&gt;E.g. NIC teaming, HBA multi-pathing&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Leverage VMotion, Storage VMotion, DRS and HA for higher Domino availability &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;VM configuration &lt;/h1&gt;
64-bit OS recommended &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;VI3 supports all x86 OSs that Domino supports: Windows, SLES, RHEL&lt;/li&gt;
&lt;li&gt;Improved memory limits in 64-bit OS helps cache more data, and thus avoid disk IO. Reduces response times, and hence increasing the number of users&lt;/li&gt;
&lt;li&gt;Increase VM memory when running in 64-bit guest OS&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
64-bit may not perform better with older hardware &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;E.g. 64-bit Windows more sensitive to onboard L2/L3 chip caches&lt;/li&gt;
&lt;li&gt;Microsoft reports 10-15% degradation with older hardware&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Guest Operating System: &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Windows: Use 2003 SP2
&lt;ul&gt;
&lt;li&gt;Microsoft eliminated most APIC TPR accesses, improves virtual performance&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Linux: Use 2.6.18-53.1.4 kernel or later to use divider patch
&lt;ul&gt;
&lt;li&gt;Some older Linux versions have a 1Khz timer rate&lt;/li&gt;
&lt;li&gt;Put divider=10 on the end of the kernel line in grub.conf and reboot&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
VM Time Synchronization &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Use VMware Tools time synchronization within the virtual machine&lt;/li&gt;
&lt;li&gt;Enable ESX server NTP daemon to sync with external stratum NTP source (VMware Knowledge Base ID# 1339)&lt;/li&gt;
&lt;li&gt;Disable OS Time Service
&lt;ul&gt;
&lt;li&gt;Windows: w32time service&lt;/li&gt;
&lt;li&gt;Linux: NTP daemon&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Storage &lt;/h1&gt;
Storage configuration is absolutely critical; most performance problems traced to this &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Number of spindles, RAID configuration, drive speed, controller cache settings, queue depths &amp;ndash; all make a big difference&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Align partitions &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;For VMFS, use Virtual Center to create partitions&lt;/li&gt;
&lt;li&gt;Check &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/esx3_partition_align.pdf"&gt;Recommendations for Aligning VMFS Partitions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Use separate, dedicated LUNs for OS/Domino, data and transaction logs &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Separate the IO at physical disk level, not simply logical LUNs&lt;/li&gt;
&lt;li&gt;Make sure these LUNs have enough spindles to support the IO demands&lt;/li&gt;
&lt;li&gt;Fewer spindles or too many VMDK files on single VMFS LUN can substantially increase disk IO latencies&lt;/li&gt;
&lt;li&gt;Check &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/scalable_storage_performance.pdf"&gt;Scalable Storage Performance&lt;/a&gt; to understand the details&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
RAID configuration &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;RAID 1+0 for Data, RAID 0 for Log&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Cache settings &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Write policy to "write back&amp;ldquo;, read policy to "read ahead&amp;ldquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Queue Depths &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Increase to 255&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Storage Protocol: Fibre Channel or iSCSI &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Link speed typically limits the scalability, NOT VMware ESX&lt;/li&gt;
&lt;li&gt;Link speed is maintained up to 32 virtual machines for each storage connection option&lt;/li&gt;
&lt;li&gt;For details check &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/storage_protocol_perf.pdf"&gt;Comparison of Storage Protocol Performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Storage Partition: VMFS or RDM &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;VMFS is recommended
&lt;ul&gt;
&lt;li&gt;Leverage templates and quick provisioning&lt;/li&gt;
&lt;li&gt;Fewer LUNs means you don&amp;rsquo;t have to watch Heap&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Performance difference between VMFS and RDM not significant
&lt;ul&gt;
&lt;li&gt;For details check &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf"&gt;Performance Characterization of VMFS and RDM Using a SAN&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
VI3 supports latest storage technologies: leverage these if you have already invested or plan to invest &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Fibre channel &amp;ndash; 8Gbps connectivity&lt;/li&gt;
&lt;li&gt;ISCSI &amp;ndash; 10GigE network connectivity, Jumbo Frames&lt;/li&gt;
&lt;li&gt;Infiniband support&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Virtual CPUs &lt;/h1&gt;
The number of vCPUs per VM depends on the number of users to be supported &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Start with uni-processor, may be enough&lt;/li&gt;
&lt;li&gt;Try not to over-provision vCPUs in the guest CPU&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Verify CPU compatibility for VMotion &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Memory &lt;/h1&gt;
Increasing memory to avoid disk I/O is most technique to improve performance &lt;br /&gt;
More available memory = more Lotus Domino Cache &lt;br /&gt;
&lt;br /&gt;
Increase NSF_DbCache_Maxentries value &lt;br /&gt;
&lt;br /&gt;
Leverage the higher VI 3.5 support 64GB memory limit per VM in VI 3.5 when using 64-bit guest OS for Domino &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;64-bit OSs can take advantage of larger memory limits for file caching&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Leverage NUMA optimizations in VI3 &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;When using NUMA, try to fit the VM within a single node to avoid latencies accessing memory on remote nodes&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Networking &lt;/h1&gt;
Use dedicated NICs based on the network traffic &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;E.g. separate NICs for mail and replication traffic&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Use NIC Teaming &amp;#38; VLAN Trunking &lt;br /&gt;
&lt;br /&gt;
Use Enhanced VMXNET driver with TSO and Jumbo Frames support &lt;br /&gt;
&lt;br /&gt;
Enable TCP transmit coalescing &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Check the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/specweb_perf_final.pdf"&gt;SPECweb paper&lt;/a&gt; for details.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Co-located VMs outperform physical 1Gbps network speed &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Resource Management &lt;/h1&gt;
Use proportional and absolute mechanisms to control VM priorities &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Shares, reservations, and limits for CPU and memory&lt;/li&gt;
&lt;li&gt;Shares for virtual disks&lt;/li&gt;
&lt;li&gt;Traffic shaping for network&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Faster migration resulting in better load balancing when using &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Smaller VMs&lt;/li&gt;
&lt;li&gt;Lesser memory reservations for VMs&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Affinity rules for VM placement &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;E.g. Directory, Mail Server VMs on same ESX&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Deployment &lt;/h1&gt;
Virtualization Assessment &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Capacity Planner&lt;/li&gt;
&lt;li&gt;Benchmark against Information Warehouse&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Easy migration &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;VMware Converter &amp;ndash; both hot and cold cloning&lt;/li&gt;
&lt;li&gt;Start with RDM to point to existing data/ transaction log LUNs, but move to VMFS later&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Easier change management and quicker provisioning &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Templates and clones for easy provisioning&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">performance</category>
      <pubDate>Mon, 09 Mar 2009 22:19:19 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-9671</guid>
      <dc:date>2009-03-09T22:19:19Z</dc:date>
      <clearspace:dateToText>8 months, 5 days ago</clearspace:dateToText>
    </item>
    <item>
      <title>Interpreting esxtop Statistics</title>
      <link>http://communities.vmware.com/docs/DOC-9279</link>
      <description>&lt;h1&gt;Table of Contents&lt;/h1&gt;
&lt;b&gt;Section 1. Introduction&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;Section 2. CPU&lt;/b&gt;&lt;br /&gt;
&lt;i&gt;Section 2.1 Worlds and Groups&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Section 2.2 Global Statistics&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Section 2.3 World Statistics&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;Section 3. Memory&lt;/b&gt;&lt;br /&gt;
&lt;i&gt;Section 3.1 Machine Memory and Guest Physical Memory&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Section 3.2 Global Statistics&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Section 3.3 Group Statistics&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;Section 4 Disk&lt;/b&gt;&lt;br /&gt;
&lt;i&gt;Section 4.1 Adapter, Device, VM screens&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Section 4.2 Disk Statistics&lt;/i&gt;&lt;br /&gt;
Section 4.2.1 I/O Throughput Statistics&lt;br /&gt;
Section 4.2.2 Latency Statistics&lt;br /&gt;
Section 4.2.3 Queue Statistics&lt;br /&gt;
Section 4.2.4 Error Statistics&lt;br /&gt;
Section 4.2.5 PAE Statistics&lt;br /&gt;
Section 4.2.6 Split Statistics&lt;br /&gt;
&lt;i&gt;Section 4.3 Batch Mode Output&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;Section 5 Network&lt;/b&gt;&lt;br /&gt;
&lt;i&gt;Section 5.1 Port&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;Section 5.2 Port Statistics&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;Section 6. Interrupt&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;Section 7. Batch Mode&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Section 1. Introduction&lt;/h1&gt;
Esxtop allows monitoring and collection of data for all system resources: CPU, memory, disk and network. When used interactively, this data can be viewed on different types of screens; one each for CPU statistics, memory statistics, network statistics and disk adapter statistics. In addition to the disk adapter statistics in earlier versions, starting with ESX3.5, disk statistics at the device and VM level are also available. Starting with ESX 4.0, esxtop has an interrupt statistics screen. In the batch mode, data can be redirected to a file for offline uses. &lt;br /&gt;
&lt;br /&gt;
Many esxtop statistics are computed as rates, e.g. CPU statistics %USED. A rate is computed based on the refresh interval, the time between successive snapshots. For example, &lt;i&gt;%USED = ( CPU used time at snapshot 2 - CPU used time at snapshot 1 ) / time elapsed between snapshots&lt;/i&gt;. The default refresh interval can be changed by the command line option "&lt;i&gt;-d&lt;/i&gt;", or the interactive command &lt;i&gt;'s'&lt;/i&gt;. The return key can be pressed to force a refresh. &lt;br /&gt;
&lt;br /&gt;
In each screen, data is presented at different levels of aggregation. It is possible to drill down to expanded views of this data. Each screen provides different expansion options. &lt;br /&gt;
&lt;br /&gt;
It is possible to select all or some fields for which data collection is done. In the case of interactive use of esxtop, the order in which the selected fields are displayed can be selected.&lt;br /&gt;
&lt;br /&gt;
In the following sections, this document will describe the esxtop statistics shown by each screen and their usage.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Section 2. CPU &lt;/h1&gt;
&lt;h2&gt;Section 2.1 Worlds and Groups &lt;/h2&gt;
Esxtop uses worlds and groups as the entities to show CPU usage. A &lt;b&gt;world&lt;/b&gt; is an ESX Server VMkernel schedulable entity, similar to a process or thread in other operating systems. A &lt;b&gt;group&lt;/b&gt; contains multiple worlds.&lt;br /&gt;
&lt;br /&gt;
Let's use a VM as an example. A powered-on VM has a corresponding group, which contains multiple worlds. In ESX 4.0, there is one vcpu (hypervisor) world corresponding to each VCPU of the VM. The guest activities are represented mostly by the vcpu worlds. (In ESX 3.5, esxtop shows a vmm world and a vcpu world for each VCPU. The guest activities are represented mostly by the vmm worlds.) Besides the vcpu worlds, there are other assisting worlds, such as a MKS world and a VMX world. The MKS world assists mouse/keyboard/screen virtualization. The VMX world assists the vcpu worlds (the hypervisor). The usage of the VMX world is out of the scope of this document. In ESX 4.0, there is only one vmx world. (In ESX 3.5, there are two vmx worlds for each VM.)&lt;br /&gt;
&lt;br /&gt;
There are other groups besides VM groups. Let's go through a few examples: &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;The "idle" group is the container for the idle worlds, each of which corresponds to one PCPU.&lt;/li&gt;
&lt;li&gt;The "system" group contains the VMKernel system worlds.&lt;/li&gt;
&lt;li&gt;The "helper" group contains the helper worlds that assist VMKernel operations.&lt;/li&gt;
&lt;li&gt;In classic ESX, the "console" group is for the console OS, which runs ESX management processes. In ESXi, these ESX management processes are running as user worlds directly on VMKernel. So, on an ESXi box you can see much more groups than on a classic ESX, but not the "console" group.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Note that groups can be organized in a hierarchical manner in ESX. However, esxtop shows, in a flat form, the groups that contain some worlds. More detailed discussion on the groups are out of the scope. &lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why can't we find any vmm worlds for a VM in ESX 4.0?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: Before ESX 4.0, each VCPU has two worlds "vmm" and "vcpu". In ESX 4.0, cpu scheduler merges their statistics to one vcpu world. So, CPU stats won't show vmm worlds. This is not a problem.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Section 2.2 Global Statistics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"up time"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The elapsed time since the server has been powered on.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"number of worlds"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The total number of worlds on ESX Server.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"CPU load average"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The arithmetic mean of CPU loads in 1 minute, 5 minutes, and 15 minutes, based on 6-second samples. CPU load accounts the run time and ready time for all the groups on the host. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"PCPU(%)"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage CPU utilization per physical CPU.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if PCPU% is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It means that you are using lots of resource. (a) If all of the PCPUs are near 100%, it is possible that you are overcommiting your cpu resource. You need to check RDY% of the groups in the system to verify cpu overcommitment. Refer to RDY% below. (b) If some PCPUs stay near 100%, but others are not, there might&lt;/i&gt; &lt;i&gt;be an imbalance issue. Note that you'd better monitor the system for a few minutes to verify whether the same PCPUs are using ~100% CPU. If so, check VM CPU affinity settings.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"used total"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Sum( PCPU(%) ) / number of PCPUs&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"LCPU(%)"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage CPU utilization per logical CPU. The CPU used percentages for the logical CPUs belonging to a package add up to 100%. This line is displayed only if hyper-threading is present and enabled.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"CCPU(%)"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Percentages of total CPU time as reported by the ESX Service Console. "us" is for percentage user time, "sy" is for percentage system time, "id" is for percentage idle time and "wa" is for percentage wait time. "cs/sec" is for the context switches per second recorded by the ESX Service Console.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What's the difference of CCPU% and the console group stats?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: CCPU% is measured by the COS. "console" group CPU stats is measured by VMKernel. The stats are related, but not the same.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Section 2.3 World Statistics&lt;/h2&gt;
A group statistics is the sum of world statistics for all the worlds contained in that group. So, this section focuses on worlds. You may apply the description to the group as well, unless stated otherwise. &lt;br /&gt;
&lt;br /&gt;
ESX can make use of the Hyperthreading technology, so, the performance counters takes Hyperthreading into consideration as well. But, to simplify this document, we will ignore HT related issues. Please refer to "Resource Management Guide" for more details.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%USED"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage physical CPU time accounted to the world. If a system service runs on behalf of this world, the time spent by that service (i.e. %SYS) should be charged to this world. If not, the time spent (i.e. %OVRLP) should not be charged against this world. See notes on %SYS and %OVRLP.&lt;br /&gt;
&lt;br /&gt;
%USED = %RUN + %SYS - %OVRLP&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Is it possible that %USED of a world is greater than 100%?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: Yes, if the system service runs on a different PCPU for this world. It may happen when your VM has heavy I/O.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: For an SMP VM, why does VCPU 0 have higher CPU usage than others?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: The system services are accounted to VCPU 0. You may see higher %USED on VCPU 0 than others, although the run time (%RUN) are balanced for all the VCPUs. This is not a problem for CPU scheduling, but only the way VMKernel does the CPU accounting.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What is the maximum %USED for a VM group?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: The group stats is the sum of the worlds. So, the maximum %USED = NWLD * 100%. NWLD is the number of worlds in the group.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Typically, worlds other than VCPU worlds are waiting for events most of time, not costing too much CPU cycles. Among all the worlds, VCPU worlds represent best the guest. Therefore, %USED for a VM group usually do not exceed Number of VCPUs * 100%.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %USED of a VM is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: The VM is using lots of CPU resource. You may expand to worlds to see what worlds are using most of them.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%SYS"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of time spent by system services on behalf of the world. The possible system services are interrupt handlers, bottom halves, and system worlds. &lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %SYS is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It usually means that your VM has heavy I/O.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Are %USED and %SYS similar to user time and system time in Linux?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: No. They are totally different. For Linux OS, user (system) time for a process is the time spent in user (kernel) mode. For ESX, %USED is for the accounted time and %SYS is for the system service time.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%OVRLP"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of time spent by system services on behalf of other worlds. In more detail, let's use an example. &lt;br /&gt;
&lt;br /&gt;
When World 'W1' is running, a system service 'S' interrupts 'W1' and services World 'W2'. The time spent by 'S', annotated as 't', is included in the run time of 'W1'. We use %OVRLP of 'W1' to show this time. This time 't' is accounted to %SYS of 'W2', as well. &lt;br /&gt;
&lt;br /&gt;
Again, let's take a look at "%USED = %RUN + %SYS - %OVRLP". For 'W1', 't' is included in %RUN and %OVRLP, not in %SYS. By subtracting %OVRLP from %RUN, we do not account 't' in %USED of 'W1'. For 'W2', 't' is included in %SYS, not in %RUN or %OVRLP. By adding %SYS, we accounted 't' to %USED of 'W2'.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %OVRLP of a VM is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It usually means the host has heavy I/O. So, the system services are busy handling I/O. Note that %OVRLP of a VM group may or may not be spent on behalf of this VM. It is the sum of %OVRLP for all the worlds in this group.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%RUN"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of total scheduled time for the world to run. &lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What is the difference between %USED and %RUN?&lt;/i&gt;&lt;br /&gt;
A: %USED = %RUN + %SYS - %OVRLP. (%USED takes care of the system service time.) Details above.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %RUN of a VM is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: The VM is using lots of CPU resource. It does not necessarily mean the VM is under resource constraint. Check the description of %RDY below, for determining CPU contention.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%RDY"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of time the world was ready to run. &lt;br /&gt;
&lt;br /&gt;
A world in a run queue is waiting for CPU scheduler to let it run on a PCPU. %RDY accounts the percentage of this time. So, it is always smaller than 100%.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I know CPU resource is under contention?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: %RDY is a main indicator. But, it is not sufficient by itself.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;If a "CPU Limit" is set to a VM's resource settings, the VM will be deliberately held from scheduled to a PCPU when it uses up its allocated CPU resource. This may happen even when there is plenty of free CPU cycles. This time deliberately held by scheduler is shown by "%MLMTD", which will be describe next. Note that %RDY includes %MLMTD. For, for CPU contention, we will use "%RDY - %MLMTD". So, if "%RDY - %MLMTD" is high, e.g., larger than 20%, you may experience CPU contention.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;What is the recommended threshold? Well, it depends. As a try, we could start with 20%. If your application speed in the VM is OK, you may tolerate higher threshold. Otherwise, lower.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do we break down 100% for the world state times?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: A world can be in different states, either scheduled to run, ready to run but not scheduled, or not ready to run (waiting for some events).&lt;/i&gt; &lt;br /&gt;
100% = %RUN + %READY + %CSTP + %WAIT.&lt;br /&gt;
&lt;i&gt;Check the description of %CSTP and %WAIT below.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %RDY of a VM is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It means the VM is possibly under resource contention. Check "%MLMTD" as well. If "%MLMTD" is high, you may raise the "CPU limit" setting for the VM. If "%RDY - %MLMTD" is high, the VM is under CPU contention.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%MLMTD"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of time the world was ready to run but deliberately wasn't scheduled because that would violate the "CPU limit" settings.&lt;br /&gt;
&lt;br /&gt;
Note that %MLMTD is included in %RDY.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %MLMTD of a VM is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: The VM cannot run because of the "CPU limit" setting. If you want to improve the performance of this VM, you may increase its limit. However, keep in mind that it may reduce the performance of others.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%CSTP"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of time the world spent in ready, co-deschedule state. This co-deschedule state is only meaningful for SMP VMs. Roughly speaking, ESX CPU scheduler deliberately puts a VCPU in this state, if this VCPU advances much farther than other VCPUs.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %CSTP is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It usually means the VM workload does not use VCPUs in a balanced fashion. The VCPU with high %CSTP is used much more often than the others. Do you really need all those VCPUs? Do you pin the guest application to the VCPUs?&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%WAIT"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of time the world spent in wait state.&lt;br /&gt;
&lt;br /&gt;
This %WAIT is the total wait time. I.e., the world is waiting for some VMKernel resource. This wait time includes I/O wait time, idle time and among other resources. Idle time is presented as %IDLE.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I know the VCPU world is waiting for I/O events?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: %WAIT - %IDLE can give you an estimate on how much CPU time is spent in waiting I/O events. This is an estimate only, because the world may be waiting for resources other than I/O.&lt;/i&gt; &lt;i&gt;Note that we should only do this for VMM worlds, not the other kind of worlds. Because VMM worlds represent the guest behavior the best. For disk I/O, another alternative is to read the disk latency stats which we will explain in the disk section.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I know the VM group is waiting for I/O events?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: For a VM, there are other worlds besides the VCPUs, such as a mks world and a VMX world. Most of time, the other worlds are waiting for events. So, you will see ~100% %WAIT for those worlds. If you want to know whether the guest is waiting for I/O events, you'd better expand the group and analyze the VCPU worlds as stated above.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Since %IDLE makes no sense to the worlds other than VCPUs, we may use the group stats to estimate the guest I/O wait by "%WAIT - %IDLE - 100% * (NWLD - NVCPU)". Here, NWLD is the number of worlds in the group; NVCPU is the number of VCPUs. This is a very rough estimate, due to two reasons. (1) The world may be waiting for resources other than I/O. (2) We assume the other assisting worlds are not active, which may not be true.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Again, for disk I/O, another alternative is to read the disk latency stats which we will explain in the disk section.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why do I always see a high %WAIT for VMX/mks worlds?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: This is normal. That means there are not too much activities on them.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why do I see a high %WAIT for a VM group?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: For a VM, there are other worlds besides the VCPUs, such as a mks world and VMX worlds. These worlds are waiting for events most of time.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%IDLE"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of time the VCPU world is in idle loop. Note that %IDLE is included in %WAIT. Also note that %IDLE only makes sense to VCPU world. The other worlds do not have idle loops, so, %IDLE is zero for them.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%SWPWT"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of time the world is waiting for the ESX VMKernel swapping memory. The %SWPWT (swap wait) time is included in the %WAIT time. This is a new statistics added in ESX 4.0.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why do I see a high %SWPWT for a VM group?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: The VM is swapping memory.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Section 3. Memory &lt;/h1&gt;
&lt;h2&gt;Section 3.1 Machine Memory and Guest Physical Memory&lt;/h2&gt;
It is important to note that some statistics refer to guest physical memory while others refer to machine memory. "&lt;b&gt;Guest physical memory&lt;/b&gt;" is the virtual-hardware physical memory presented to the VM. "&lt;b&gt;Machine memory&lt;/b&gt;" is actual physical RAM in the ESX host. Let's use the following figure to explain. In the figure, two VMs are running on an ESX host, where each block represents 4 KB of memory and each color represents a different set of data on a block.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9279-1-4857/memory.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9279-1-4857/memory.JPG" class="jive-image"  /&gt; &lt;br /&gt;
&lt;br /&gt;
Inside each VM, the guest OS maps the virutal memory to its physical memory. ESX Kernel maps the guest physical memory to machine memory. Due to ESX Page Sharing technology, guest physical pages with the same content can be mapped to the same machine page.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Section 3.2 Global Statistics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;"&lt;b&gt;MEM overcommit avg&lt;/b&gt;"&lt;/li&gt;
&lt;/ul&gt;
Average memory overcommit level in 1-min, 5-min, 15-min (EWMA).&lt;br /&gt;
&lt;br /&gt;
Memory overcommit is the ratio of total requested memory and the "managed memory" minus 1. VMKernel computes the total requested memory as a sum of the following components: (a) VM configured memory (or memory limit setting if set), (b) the user world memory, (c) the reserved overhead memory. (Overhead memory will be discussed in more detail for "OVHD" and "OVHDMAX" in Section 3.3.)&lt;br /&gt;
&lt;br /&gt;
"managed memory" will be defined in "VMKMEM" section.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if overcommit is not 0?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It means that total requested guest physical memory is more than the machine memory available. This is fine, because ballooning and page sharing allows memory overcommit.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;This metric does not necessarily mean that you will have performance issues. Use "SWAP" and "MEMCTL" to find whether you are experiencing memory problems.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What's the meaning of overcommit?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: See above description for details. Roughly speaking, it reflects the ratio of requested memory and the available memory.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"PMEM" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The machine memory statistics for the host.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"total"&lt;/b&gt;: the total amount of machine memory in the server. It is the machine memory reported by BIOS.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"cos"&lt;/b&gt; : the amount of machine memory allocated to the ESX Service Console. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"vmk"&lt;/b&gt; : the amount of machine memory being used by the ESX VMKernel. "vmk" includes kernel code section, kernel data and heap, and other VMKernel management memory. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"other"&lt;/b&gt;: the amount of machine memory being used by everything other than the ESX Service Console and ESX VMKernel. "other" contains not only the memory used by VM but also the user worlds that run directly on VMKernel.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"free"&lt;/b&gt; : the amount of machine memory that is free.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why is total not the same as RAM size plugged in my memory slots?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: This is because some memory range is not available for use. It is fine, if the difference is small. If the difference is big, there might be some hardware issue. Check your BIOS.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why can't I find the cos part?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: COS is only available in classic ESX. You are using ESXi.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I break down the total memory?&lt;/i&gt;&lt;br /&gt;
A: total = cos + vmk + other + free&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Which one contains the memory used by VMs?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: "other" contains the machine memory that backs guest physical memory of VMs. Note that "other" also includes the overhead memory.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I know my "free" memory is low? Is it a problem if it is low?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: You could use the "state" field, which will be explained next, to see whether the free memory is low. Basically, it is fine if you do not experience memory swapping or ballooning. Check "SWAP" and "MEMCTL" to find whether you are experiencing memory problems.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"VMKMEM" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The machine memory statistics for VMKernel. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"managed"&lt;/b&gt;: the total amount of machine memory managed by VMKernel. VMKernel "managed" memory can be dynamically allocated for VM, VMKernel, and User Worlds. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"minfree"&lt;/b&gt;: the minimum amount of machine memory that VMKernel would like to keep free. This is because VMKernel needs to keep some amount of free memory for critical uses. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"rsvd"&lt;/b&gt; : the amount of machine memory that is currently reserved. "rsvd" is the sum of three parts: (a) the reservation setting of the groups; (b) the overhead reservation of the groups; (c) "minfree". &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"ursvd"&lt;/b&gt; : the amount of machine memory that is currently unreserved. It is the memory available for reservation.&lt;br /&gt;
&lt;br /&gt;
Please note that the VM admission control is done at resource pool level. So, this statistics is not used directly by admission control. "ursvd" can be used &lt;br /&gt;
as a system level indicator. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"state"&lt;/b&gt; : the free memory state. Possible values are high, soft, hard and low. The memory "state" is "high", if the free memory is greater than or equal to 6% of "total" - "cos". If is "soft" at 4%, "hard" at 2%, and "low" at 1%. So, high implies that the machine memory is not under any pressure and low implies that the machine memory is under pressure. &lt;br /&gt;
&lt;br /&gt;
While the host's memory state is not used to determine whether memory should be reclaimed from VMs (that decision is made at the resource pool level), it can affect what mechanisms are used to reclaim memory if necessary. In the high and soft states, ballooning is favored over swapping. In the hard and low states, swapping is favored over ballooning.&lt;br /&gt;
&lt;br /&gt;
Please note that "minfree" is part of "free" memory; while "rsvd" and "ursvd" memory may or may not be part of "free" memory. "reservation" is different from memory allocation.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why is "managed" memory less than the sum of "vmk", "other" and "free" in the PMEM line? Is it normal?&lt;/i&gt;&lt;br /&gt;
+A: It is normal, just the way we do accounting. A more precise definition for "managed" is the free memory after VMKernel initialization. So, this amount of memory can be dynamically allocated for use of VMs, VMKernel, and user worlds. "managed" = "some part of vmk" + "other" + "free".+&lt;br /&gt;
&lt;br /&gt;
+So, "managed" &amp;lt; "vmk" + "other" + "free". Or, in an equivalent form, "managed" &amp;lt; "total" - "cos".+&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I break down the managed memory in terms of reservation?&lt;/i&gt;&lt;br /&gt;
A: "managed" = "rsvd" + "ursvd" + "vmkernel usage"&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;VMKernel machine memory manager needs to use some part of memory, which should not be subject to reservation, so, it is not in "rsvd", nor in "ursvd". In the above equation, we put this part under "vmkernel usage". Unfortunately, it is not shown directly in esxtop.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Note that the vmkernel usage in managed memory is part of "vmk".&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if "ursvd" is low?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: VMKernel admission control prohibits a VM PowerOn operation, if it cannot meet the memory reservation of that VM. The memory reservation includes the reservation setting, a.k.a. "min", and the monitor overhead memory reservation. Note that even if "min" is not set, VMKernel still needs to reserve some amount&lt;/i&gt; &lt;br /&gt;
&lt;i&gt;of memory for monitor uses.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;So, it is possible that even though you have enough free memory, a new VM cannot power on due to the violation of memory reservation.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why do I fail admission control even though "ursvd" is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: The VM admission control is done at resource pool level. Please check the "min" setting of all its parent resource pools.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why is "managed" greater than the sum of "rsvd" and "ursvd"? Is it normal?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It is normal. See above question. VMKernel may use some of the managed memory. It is not accounted in "rsvd" and "ursvd".&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What is the meaning of "state"?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: See the description of "state" above.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I know my ESX box is under memory pressure?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It is usually safe to say the ESX box is under memory pressure, if "state" is "hard" or "low". But, you need also check "SWAP" and "MEMCTL" to find whether you are experiencing memory problems. Basically, if there is not enough free memory and ESX are experiencing swapping or ballooning, ESX box is under memory pressure.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Note that ballooning does not have as big performance hit as swapping does. Ballooning may cause guest swapping. ESX swapping means host swapping.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Also note that A VM may be swapping or ballooning, even though there is enough free memory. This is due to the reservation setting.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"COSMEM" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The memory statistics reported by the ESX Service Console. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"free"&lt;/b&gt; : the amount of idle machine memory.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"swap_t"&lt;/b&gt;: the total swap configured.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"swap_f"&lt;/b&gt;: the amount of swap free.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"r/s"&lt;/b&gt; : the rate at which memory is swapped in from disk.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"w/s"&lt;/b&gt; : the rate at which memory is swapped out to disk.&lt;br /&gt;
&lt;br /&gt;
Note that these stats essentially come from the COS proc nodes.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if I see a high r/s or w/s?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: Your console OS is swapping. It is highly likely that your COS free memory is low. You may either configure more memory for COS and restart your ESX box, or stop some programs running inside your COS.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why can't I see this COSMEM line?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: You are using ESXi not classic ESX.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"NUMA" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The ESX NUMA statistics. For each NUMA node there are two statistics: (1) the "total" amount of machine memory managed by ESX; (2) the amount of machine memory currently "free".&lt;br /&gt;
&lt;br /&gt;
Note that ESX NUMA scheduler optimizes the uses of NUMA feature to improve guest performance. Please refer to "Resource Management Guide" for details.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why can't I see this NUMA line?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: You are not using a NUMA machine, or your BIOS disables it.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why is the sum of NUMA memory not equal to "total" in the PMEM line?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: The PMEM "total" is the memory reported by BIOS, while the NUMA "total" is the memory managed by VMKernel machine memory manager. There are two major parts of memory seen by BIOS but not given to machine memory manager: (1) COS uses, and (2) VMKernel uses during early initialization.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;So, Sum("NUMA total") &amp;lt; "PMEM total" - "cos".&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Note that the free memory on all the nodes can be added up as the "free" memory in the PMEM line.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"PSHARE" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The ESX page-sharing statistics. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"shared"&lt;/b&gt;: the amount of guest physical memory that is being shared.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"common"&lt;/b&gt;: the amount of machine memory that is common across World(s).&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"saving"&lt;/b&gt;: the amount of machine memory that is saved due to page-sharing.&lt;br /&gt;
&lt;br /&gt;
The monitor maps guest physical memory to machine memory. VMKernel selects to map guest physical pages with the same content to the same machine page. In other words, those guest physical pages are sharing the same machine page. This kind of sharing can happen within the same VM or among the VMs.&lt;br /&gt;
&lt;br /&gt;
Since each VM's "shared" memory measures guest physical memory, the host's "shared" memory may be larger than the total amount of machine memory if memory is overcommitted. "saving" illustrates the effectiveness of page sharing for saving machine memory.&lt;br /&gt;
&lt;br /&gt;
"shared" = "common" + "saving".&lt;br /&gt;
&lt;br /&gt;
Note that esxtop only shows the pshare stats for VMs, excluding the pshare stats for user worlds. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SWAP" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The ESX swap usage statistics. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"curr"&lt;/b&gt; : the current swap usage. This is the total swapped machine memory of all the groups. So, it includes VMs and user worlds.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"target"&lt;/b&gt;: the swap usage expected to be. This is the total swap target of all the groups. So, it includes VMs and user worlds.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"r/s"&lt;/b&gt; : the rate at which machine memory is swapped in from disk.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"w/s"&lt;/b&gt; : the rate at which machine memory is swapped out to disk.&lt;br /&gt;
&lt;br /&gt;
Note that swap here is host swap, not guest swap inside the VM.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if "curr" is not the same as "target"?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It means ESX will swap memory to meet the swap target. Note that the actual swapping is done at the group level. So, you should check "SWCUR" and "SWTGT" for each group. We will discuss this in the next section.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Is it bad if "r/s" is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: Yes, it is very bad. This usually means that you have memory resource contention. Because swapin is synchronous, it will hurt guest performance a lot.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Do two things: (1) Check your "free" memory or "state" as mentioned above. If free memory is low, you need to move VMs to other hosts or add more memory to the host. (2) If free memory is not low, check your resource setting of your VMs or user worlds. You may have set a low "limit", which causes swapping.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Is it bad if "w/s" is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: Yes, it is also very bad. This usually means that you have memory resource contention. Do the similar actions as mentioned above.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"MEMCTL" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The memory balloon statistics. &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"curr"&lt;/b&gt; : the total amount of physical memory reclaimed by balloon driver. This is the total ballooned memory by the VMs.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"target"&lt;/b&gt;: total amount of ballooned memory expected to be. This is the total ballooned targets of the VMs.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;"max"&lt;/b&gt; : the maximum amount of physical memory reclaimable.&lt;br /&gt;
&lt;br /&gt;
Note that ballooning may or may not lead to guest swapping, which is decided by the guest OS.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if "curr" is not the same as "target"?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It means ESX will balloon memory to meet the balloon target. Note that the actual ballooning is done for the VM group. So, you should check "MCTLSZ" and "MCTLTGT" for each group. We will discuss this in the next section.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I know the host is ballooning memory?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: If the "curr" is changing, you can know it is ballooning. Since ballooning is done at VM level, a better way is to monitor "MCTLSZ" for each group. We will discuss this in the next section.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Is it bad if we have lots of ballooning activities?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: Usually it is fine. Ballooning tends to take unused memory from one VM and make them available for others. The possible side effects are (a) reducing the memory cache used by guest OS, (b) guest swapping. In either cases, it may hurt guest performance. Please note that (a) and (b) may or may not happen, depending on your workload inside VM.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;On the other hand, under memory contention, ballooning is much better than swapping in terms of performance.&lt;/i&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Section 3.3 Group Statistics&lt;/h2&gt;
Esxtop shows the groups that use memory managed by VMKernel memory scheduler. These groups can be used for VMs or purely for user worlds running directly on VMKernel. You may see many pure user world groups on ESXi, not on classic ESX.&lt;br /&gt;
&lt;br /&gt;
Tip: use 'V' command to show only the VM groups.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"MEMSZ" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
For a VM, it is the amount of configured guest physical memory.&lt;br /&gt;
&lt;br /&gt;
For a user world, it includes not only the virtual memory that is backed by the machine memory, but also the reserved backing store size.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I break down "MEMSZ" of a VM?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: A VM's guest physical memory could be mapped to machine memory, reclaimed by balloon driver, or swapped to disk, or never touched. The guest physical memory can be "never touched", because (1) the VM has never used it since power on; or, (2) it was reclaimed by balloon driver before, but has not been used since the balloon driver releases it last time. This part of memory is not measured directly by VMKernel.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
"MEMSZ" = "GRANT" + "MCTLSZ" + "SWCUR" + "never touched"&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Please refer to "GRANT", "MCTLSZ", "SWCUR".&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"GRANT" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
For a VM, it is the amount of guest physical memory granted to the group, i.e., mapped to machine memory. The overhead memory, "OVHD" is not included in GRANT. The shared memory, "SHRD", is part of "GRANT". This statistics is added to esxtop in ESX 4.0.&lt;br /&gt;
&lt;br /&gt;
The consumed machine memory for the VM, not including the overhead memory, can be estimated as "GRANT" - "SHRDSVD". Please refer to "SHRDSVD".&lt;br /&gt;
&lt;br /&gt;
For a user world, it is the amount of virtual memory that is backed by machine memory.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why is "GRANT" less than "MEMSZ"?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: Some guest physical memory has never been used, or is reclaimed by balloon driver, or is swapped out to the VM swap file. Note that this kind of swap is host swap, not the guest swap by the guest OS.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
"MEMSZ" = "GRANT" + "MCTLSZ" + "SWCUR" + "never touched"&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I know how much machine memory is consumed by this VM?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: GRANT accounts the guest physical memory, it may not be the same as the mapped machine memory, due to page sharing.&lt;/i&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;The consumed machine memory can be estimated as "GRANT" - "SHRDSVD". Please note that this is an estimate. Please refer to "SHRDSVD".&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Note that overhead memory, "OVHD", is not part of the above consumed machine memory.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SZTGT" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The amount of machine memory to be allocated. (TGT is short for "target".) Note that "SZTGT" includes the overhead memory for a VM.&lt;br /&gt;
&lt;br /&gt;
This is an internal counter, which is computed by ESX memory scheduler. Usually, there is no need to worry about this. Roughly speaking, "SZTGT" of all the VMs is computed based on the resource usage, available memory, and the "limit/reservation/shares" settings. This computed "SZTGT" is compared against the current memory consumption plus overhead memory for a VM to determine the swap and balloon target, so that VMKernel may balloon or swap appropriate amount &lt;br /&gt;
of memory to meet its memory demand. Please refer to "Resource Management Guide" for details.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How come my "SZTGT" is larger than "MEMSZ"?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: "SZTGT" includes the overhead memory, while "MEMSZ" does not. So, it is possible for "SZTGT" be larger than "MEMSZ".&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I use "SZTGT"?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: This is an internal counter. You don't need to use it.&lt;/i&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;This counter is used to determine future swapping and ballooning activities. Check "SWTGT" and "MCTLTGT".&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"TCHD" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The amount of guest physical memory recently used by the VM, which is estimated by VMKernel statical sampling.&lt;br /&gt;
&lt;br /&gt;
VMKernel estimates active memory usage for a VM by sampling a random subset of the VM's memory resident in machine memory to detect the number of memory reads and writes. VMKernel then scales this number by the size of VM's configured memory and averages it with previous samples. Over time, this average will approximate the amount of active memory for the VM.&lt;br /&gt;
&lt;br /&gt;
Note that ballooned memory is considered inactive, so, it is excluded from "TCHD".&lt;br /&gt;
&lt;br /&gt;
Because sampling and averaging takes time, "TCHD" won't be exact, but becomes more accurate over time. &lt;br /&gt;
&lt;br /&gt;
VMKernel memory scheduler charges the VM by the sum of (1) the "TCHD" memory and (2) idle memory tax. This charged memory is one of the factors that memory scheduler uses for computing the "SZTGT".&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What is the difference between "TCHD" and working set estimate by guest OS?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: "TCHD" is the working set estimated by VMKernel. This number may be different from guest working set estimate. Sometimes the difference may be big, because (1) guest OS uses a different working set estimate algorithm, (2) guest OS has a different view of active guest physical memory, due to ballooning and host swapping,&lt;/i&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How is "TCHD" used?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: "TCHD" is a working set estimate, which indicates how actively the VM is using its memory. See above for the internal use of this counter.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%ACTV"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Percentage of active guest physical memory, current value.&lt;br /&gt;
&lt;br /&gt;
"TCHD" is actually computed based on a few parameters, coming from statistical sampling. The exact equation is out of scope of this document. Esxtop shows some of those parameters, %ACTV, %ACTVS, %ACTVF, %ACTVN. Here, this document provides simple descriptions without further discussion.&lt;br /&gt;
&lt;br /&gt;
%ACTV reflects the current sample.&lt;br /&gt;
%ACTVS is an EWMA of %ACTV for long term estimate.&lt;br /&gt;
%ACTVF is an EWMA of %ACTV for short term estimate.&lt;br /&gt;
%ACTVN is a predict of what %ACTVF will be at next sample.&lt;br /&gt;
&lt;br /&gt;
Since they are very internal to VMKernel memory scheduler, we do not discuss their usage here.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%ACTVS"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Percentage of active guest physical memory, slow moving average. See above.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%ACTVF"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Percentage of active guest physical memory, fast moving average. See above.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%ACTVN"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Percentage of active guest physical memory in the near future. This is an estimated value. See above.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"MCTL?"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Memory balloon driver is installed or not. &lt;br /&gt;
&lt;br /&gt;
If not, install VMware tools which contains the balloon driver.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"MCTLSZ" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The amount of guest physical memory reclaimed by balloon driver.&lt;br /&gt;
&lt;br /&gt;
This can be called "balloon size". A large "MCTLSZ" means lots of this VM's guest physical memory is "stolen" to decrease host memory pressure. This usually is not a problem, because balloon driver tends to smartly steal guest physical memory that cause little performance problems.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: How do I know the VM is ballooning?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: If "MCTLSZ" is changing, balloon driver is actively reclaiming or releasing memory. I.e., the VM is ballooning. Please note that the ballooning rate for a short term can be estimated by the change of "MCTLSZ", assuming it is either increasing or decreasing. But, for a long term, we cannot do it this way, because that monotonically increase/decrease assumption may not hold.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Does ballooning hurt VM performance?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: If guest working set is smaller than guest physical memory after ballooning, guest applications won't observe any performance degradation. Otherwise, it may cause guest swapping and hurt guest application performance.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Please check what causes ballooning and take appropriate actions to reduce memory pressure. There are two possible reasons: (1) The host does not have enough machine memory for use. (2) Memory used by the VM reaches the "limit" setting of itself or "limit" of the resource pools that contain this VM. In either case, ballooning is necessary and preferred over swapping.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"MCTLTGT" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The amount of guest physical memory to be kept in balloon driver. (TGT is short for "target".)&lt;br /&gt;
&lt;br /&gt;
This is an internal counter, which is computed by ESX memory scheduler. Usually, there is no need to worry about this.&lt;br /&gt;
&lt;br /&gt;
Roughly speaking, "MCTLTGT" is computed based on "SZTGT" and current memory usage, so that the VM can balloon appropriate amount of memory. If "MCTLTGT" is greater than "MCTLSZ", VMKernel initiates inflating the balloon immediately, causing more VM memory to be reclaimed. If "MCTLTGT" is less than "MCTLSZ", VMKernel will deflate the balloon when the guest is requesting memory, allowing the VM to map/consume additional memory if it needs it. Please refer to "Resource Management Guide" for details.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why is it possible for "MCTLTGT" to be less than "MCTLSZ" for a long time?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: If "MCTLTGT" is less than "MCTLSZ", VMKernel allows the balloon to deflate. But, balloon deflation happens lazily until the VM requests new memory. So, it is possible for "MCTLTGT" to be less than "MCTLSZ" for a long time, when the VM is not requesting new memory.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"MCTLMAX" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The maximum amount of guest physical memory reclaimable by balloon driver.&lt;br /&gt;
&lt;br /&gt;
This value can be set via vmx option "sched.mem.maxmemctl". If not set, it is determined by the guest operating system type. "MCTLTGT" will never be larger than "MCTLMAX".&lt;br /&gt;
&lt;br /&gt;
If the VM suffers from ballooning, "sched.mem.maxmemctl" can be set to a smaller value to reduce this possibility. Remember that doing so may result in host swapping during resource contention.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SWCUR" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Current swap usage.&lt;br /&gt;
&lt;br /&gt;
For a VM, it is the current amount of guest physical memory swapped out to the backing store. Note that it is the VMKernel swapping not the guest OS swapping.&lt;br /&gt;
&lt;br /&gt;
It is the sum of swap slots used in the vswp file or system swap, and migration swap. Migration swap is used for a VMotioned VM to hold swapped out memory on the destination host, in case the destination host is under memory pressure.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if "SWCUR" of my VM is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It means the VM's guest physical memory is not resident in machine memory, but on disk. If those memory will not be used in the near future, it is not an issue. Otherwise, those memory will be swapped in for guest's use. In that case, you will see some swap-in activities via "SWR/s", which may hurt the VM's performance.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SWTGT" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The expected swap usage. (TGT is short for "target".)&lt;br /&gt;
&lt;br /&gt;
This is an internal counter, which is computed by ESX memory scheduler. Usually, there is no need to worry about this.&lt;br /&gt;
&lt;br /&gt;
Roughly speaking, "SWTGT" is computed based on "SZTGT" and current memory usage, so that the VM can swap appropriate amount of memory. Again, note that it is the VMKernel swapping not the guest swapping. If "SWTGT" is greater than "SWCUR", VMKernel starts swapping immediately, causing more VM memory to be swapped out. If "SWTGT" is less than "SWCUR", VMKernel will stop swapping. Please refer to "Resource Management Guide" for details.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why is it possible for "SWTGT" to be less than "SWCUR" for a long time?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: Since swapped memory stays swapped until the VM accesses it, it is possible for "SWTGT" be less than "SWCUR" for a long time.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SWR/s" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Rate at which memory is being swapped in from disk. Note that this stats refers to the VMKernel swapping not the guest swapping.&lt;br /&gt;
&lt;br /&gt;
When a VM is requesting machine memory to back its guest physical memory that was swapped out to disk, VMKernel reads in the page. Note that the swap-in operation is synchronous.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if SWR/s is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It is very bad for VM's performance. Because swap-in is synchronous, the VM needs to wait until the requested pages are read into machine memory. This happens when VMKernel swapped out the VM's memory before and the VM needs them now. Please refer to "SWW/s".&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SWW/s" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Rate at which memory is being swapped out to disk. Note that this stats refers to the VMKernel swapping not the guest swapping.&lt;br /&gt;
&lt;br /&gt;
As discussed in "SWTGT", if "SWTGT" is greater than "SWCUR", VMKernel will swap out memory to disk. It happens usually in two situations. (1) The host does not have enough machine memory for use. (2) Memory used by the VM reaches the "limit" setting of itself or "limit" of the resource pools that contain this VM.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if SWW/s is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: It is very bad for VM performance. Please check the above two reasons and fix your problem accordingly.&lt;/i&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;If this VM is swapping out memory due to resource contention, it usually means VMKernel does not have enough machine memory to meet memory demands from all the VMs. So, it will swap out mapped guest physical memory pages to make room for the recent requests.&lt;/i&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SHRD" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Amount of guest physical memory that are shared.&lt;br /&gt;
&lt;br /&gt;
VMKernel page sharing module scans and finds guest physical pages with the same content and backs them with the same machine page. "SHRD" accounts the total guest physical pages that are shared by the page sharing module.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"ZERO" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Amount of guest physical zero memory that are shared. Thisis an internal counter.&lt;br /&gt;
&lt;br /&gt;
A zero page is simply the memory page that is all zeros. If a zero guest physical page is detected by VMKernel page sharing module, this page will be backed by the same machine page on each NUMA node. Note that "ZERO" is included in "SHRD". &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SHRDSVD" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Estimated amount of machine memory that are saved due to page sharing.&lt;br /&gt;
&lt;br /&gt;
Because a machine page is shared by multiple guest physical pages, we only charge "1/ref" page as the consumed machine memory for each of the guest physical pages, where "ref" is the number of references. So, the saved machine memory will be "1 - 1/ref" page."SHRDSVD" estimates the total saved machine memory for the VM.&lt;br /&gt;
&lt;br /&gt;
The consumed machine memory by the VM can be estimated as "GRANT" - "SHRDSVD".&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"COWH" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Amount of guest physical hint pages for page sharing. This is an internal counter.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"OVHDUW" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Amount of overhead memory reserved for the vmx user world of a VM group. This is an internal counter.&lt;br /&gt;
&lt;br /&gt;
"OVHDUW" is part of "OVHDMAX".&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"OVHD" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Amount of overhead memory currently consumed by a VM.&lt;br /&gt;
&lt;br /&gt;
"OVHD" includes the overhead memory consumed by the monitor, the VMkernel and the vmx user world. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"OVHDMAX" (MB)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Amount of reserved overhead memory for the entire VM.&lt;br /&gt;
&lt;br /&gt;
"OVHDMAX" is the overhead memory a VM wants to consume in the future. This amount of reserved overhead memory includes the overhead memory reserved by the monitor, the VMkernel, and the vmx user world. Note that the actual overhead memory consumption is less than "OVHDMAX". "OVHD" &amp;lt; "OVHDMAX". &lt;br /&gt;
&lt;br /&gt;
"OVHDMAX" can be used as a conservative estimate of the total overhead memory.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Section 4 Disk&lt;/h1&gt;
&lt;h2&gt;Section 4.1 Adapter, Device, VM screens&lt;/h2&gt;
The ESX storage stack adds a few layers of code between a virtual machine and bare hardware. All virtual disks in virtual machines are seen as virtual SCSI disks. The ESX storage stack allows these virtual disks to be located on any of the multiple storage options available.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9279-6-4855/scsi.JPG" alt="scsi.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9279-6-4855/scsi.JPG');return false;"/&gt; &lt;br /&gt;
&lt;br /&gt;
For performance analysis purposes, an IO request from an application in a virtual machine traverses through multiple levels of queues, each associated with a resource, in the guest OS, the VMkernel and the physical storage. (Note that physical storage could be an FC- or IP- SAN or disk array.) Each queue has an associated latency, dictated by its size and whether the IO load is low or high, which affects the throughput and latency seen by applications inside VMs.&lt;br /&gt;
&lt;br /&gt;
Esxtop shows the storage statistics in three different screens: adapter screen, device screen, and vm screen. Interactive command &lt;i&gt;'d'&lt;/i&gt; can be used to switch to the adapter screen, &lt;i&gt;'u'&lt;/i&gt; for the device screen, and 'v' for the vm screen. &lt;br /&gt;
&lt;br /&gt;
The main difference in the data seen in these three screens is the level at which it is aggregated, even though these screens have similar counters. By default, data is rolled up to the highest level possible for each screen. (1) On the adapter screen, by default, the statistics are aggregated per storage adapter but they can also be expanded to display data per storage channel, target, path or world using a LUN. See interacitive commands, &lt;i&gt;'e', 'E', 'P', 'a', 't', 'l'&lt;/i&gt;, for the expand operations. (2) On the device screen, by default, statistics are aggregated per storage device. Statistics can also be viewed per path, world, or partition. See interactive commands, &lt;i&gt;'e', 'p', 't'&lt;/i&gt;, for the expand operations. (3) On the VM screen, statistics are aggregated on a per-group basis by default. One VM has one corresponding group, so they are equivalent to per-VM statistics. You can use interactive command &lt;i&gt;'V'&lt;/i&gt; to show only statistics related to VMs. Statistics can also be expanded so that a row is displayed for each world or a per-world-per-device basis. See interactive commands, &lt;i&gt;'e' and 'l'&lt;/i&gt;. &lt;br /&gt;
&lt;br /&gt;
Please refer to esxtop man page for the details of the interactive commands.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Section 4.2 Disk Statistics&lt;/h2&gt;
Due to the similarities in the counters of the three disk screens, this section discusses the counters without distinguishing the screens. Similar to other esxtop screens, the storage counters are also organized in different sets, each of which contains related counters. The counters can be selected as a set by selecting the appropriate field option in esxtop. If esxtop is used in batch mode, make sure that the esxtop configuration file includes all counters of interest.&lt;br /&gt;
&lt;br /&gt;
Each group of counters in the following subsections corresponds to a particular field option. &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Section 4.2.1 I/O Throughput Statistics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;CMDS/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Number of commands issued per second. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;READS/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Number of read commands issued per second. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;WRITES/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Number of write commands issued per second. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;MBREAD/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Megabytes read per second. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;MBWRTN/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Megabytes written per second. &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Section 4.2.2 Latency Statistics&lt;/h3&gt;
This group of counters report latency values measured at three different points in the ESX storage stack. In the context of the figure below, the latency counters in esxtop report the Guest, ESX Kernel and Device latencies. These are under the labels GAVG, KAVG and DAVG, respectively. Note that GAVG is the sum of DAVG and KAVG counters. &lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9279-1-4856/latency.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9279-1-4856/latency.JPG" class="jive-image"  /&gt; &lt;br /&gt;
&lt;br /&gt;
Note that esxtop shows the latency statistics for different objects, such as adapters, devices, paths, and worlds. They may not perfectly match with each other, since their latencies are measured at the different layers of the ESX storage stack. To do the correlation, you need to be very familiar with the storage layers in ESX Kernel, which is out of our scope.&lt;br /&gt;
&lt;br /&gt;
Latency values are reported for all IOs, read IOs and all write IOs. All values are averages over the measurement interval. &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;All IOs: KAVG/cmd, DAVG/cmd, GAVG/cmd, QAVG/cmd&lt;/li&gt;
&lt;li&gt;Read IOs: KAVG/rd, DAVG/rd, GAVG/rd, QAVG/rd&lt;/li&gt;
&lt;li&gt;Write IOs: KAVG/wr, DAVG/wr, GAVG/wr, QAVG/wr&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;GAVG&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
This is the round-trip latency that the guest sees for all IO requests sent to the virtual storage device. &lt;br /&gt;
&lt;br /&gt;
GAVG should be close to the R metric in the figure. &lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What is the relationship between GAVG, KAVG and DAVG?&lt;/i&gt;&lt;br /&gt;
A: GAVG = KAVG + DAVG&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;KAVG&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
These counters track the latencies due to the ESX Kernel's command.&lt;br /&gt;
&lt;br /&gt;
The KAVG value should be very small in comparison to the DAVG value and should be close to zero. When there is a lot of queuing in ESX, KAVG can be as high, or even higher than DAVG. If this happens, please check the queue statistics, which will be discussed next.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;DAVG&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
This is the latency seen at the device driver level. It includes the roundtrip time between the HBA and the storage.&lt;br /&gt;
&lt;br /&gt;
DAVG is a good indicator of performance of the backend storage. If IO latencies are suspected to be causing performance problems, DAVG should be examined. Compare IO latencies with corresponding data from the storage array. If they are close, check the array for misconfiguration or faults. If not, compare DAVG with corresponding data from points in between the array and the ESX Server, e.g., FC switches. If this intermediate data also matches DAVG values, it is likely that the storage is under-configured for the application. Adding disk spindles or changing the RAID level may help in such cases.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;QAVG&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The average queue latency. QAVG is part of KAVG.&lt;br /&gt;
&lt;br /&gt;
Response time is the sum of the time spent in queues in the storage stack and the service time spent by each resource in servicing the request. The largest component of the service time is the time spent in retrieving data from physical storage. If QAVG is high, another line of investigation is to examine the queue depths at each level in the storage stack.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Section 4.2.3 Queue Statistics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;AQLEN&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The storage adapter queue depth. This is the maximum number of ESX Server VMKernel active commands that the adapter driver is configured to support.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;LQLEN&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The LUN queue depth. This is the maximum number of ESX Server VMKernel active commands that the LUN is allowed to have. (Note that, in this document, the terminologies of LUN and Storage device can be used interchangeably.) &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;WQLEN&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The World queue depth. This is the maximum number of ESX Server VMKernel active commands that the World is allowed to have. Note that this is a per LUN maximum for the World.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;ACTV&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of commands in the ESX Server VMKernel that are currently active. This statistic is only applicable to worlds and LUNs. &lt;br /&gt;
&lt;br /&gt;
Please refer to %USD.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;QUED&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of commands in the VMKernel that are currently queued. This statistic is only applicable to worlds and LUNs. &lt;br /&gt;
&lt;br /&gt;
Queued commands are commands waiting for an open slot in the queue. A large number of queued commands may be an indication that the storage system is overloaded. A sustained high value for the QUED counter signals a storage bottleneck which may be alleviated by increasing the queue depth. Check that LOAD &amp;lt; 1 after increasing the queue depth. This should also be accompanied by improved performance in terms of increased cmd/s.&lt;br /&gt;
&lt;br /&gt;
Note that there are queues in different storage layers. You might want to check the QUED stats for devices, and worlds.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;%USD&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of queue depth used by ESX Server VMKernel active commands. This statistic is only applicable to worlds and LUNs.&lt;br /&gt;
&lt;br /&gt;
%USD = ACTV / QLEN * 100%&lt;br /&gt;
&lt;br /&gt;
For world stats, WQLEN is used as the denominator. For LUN (aka device) stats, LQLEN is used as the denominator. &lt;br /&gt;
&lt;br /&gt;
%USD is a measure of how many of the available command queue "slots" are in use. Sustained high values indicate the potential for queueing; you may need to adjust the queue depths for system&amp;rsquo;s HBAs if QUED is also found to be consistently &amp;gt; 1 at the same time. Queue sizes can be adjusted in a few places in the IO path and can be used to alleviate performance problems related to latency. For detailed information on this topic please refer to the VMware whitepaper entitled "Scalable Storage Performance".&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;LOAD&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The ratio of the sum of VMKernel active commands and VMKernel queued commands to the queue depth. This statistic is only applicable to worlds and LUNs.&lt;br /&gt;
&lt;br /&gt;
The sum of the active and queued commands gives the total number of outstanding commands issued by that virtual machine. The LOAD counter values is the ratio of this value with respect to the queue depth. If LOAD &amp;gt; 1, check the value of the QUED counter. &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Section 4.2.4 Error Statistics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;ABRTS/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of commands aborted per second. &lt;br /&gt;
&lt;br /&gt;
It can indicate that the storage system is unable to meet the demands of the guest operating system. Abort commands are issued by the guest when the storage system has not responded within an acceptable amount of time, e.g. 60 seconds on some windows OS&amp;rsquo;s. Also, resets issued by a guest OS on its virtual SCSI adapter will be translated to aborts of all the commands outstanding on that virtual SCSI adapter.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;RESETS/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of commands reset per second. &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Section 4.2.5 PAE Statistics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;PAECMD/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of PAE commands per second.&lt;br /&gt;
&lt;br /&gt;
It may point to hardware misconfiguration. When the guest allocates a buffer, the vmkernel assigns some machine memory, which might come from a &amp;ldquo;highmem&amp;rdquo; region. If you have a driver that is not PAE-aware, then this counter is updated if accesses to this memory region result in copies by the vmkernel into a lower memory location before issuing the request to the adapter. This might happen if you do not populate the DIMMs with low memory first, then you may artificially cause &amp;ldquo;highmem&amp;rdquo; memory accesses.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;PAECP/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of PAE copies per second. &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;Section 4.2.6 Split Statistics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;SPLTCMD/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of split commands per second. &lt;br /&gt;
&lt;br /&gt;
Commands can be split when they reach the vmkernel. This might impact perceived latency to the guest. The guest may be issuing commands of large block sizes which have to be broken down by the vmkernel. For ESX3.0.x, guest requests greater than 128KB are split into 128KB chunks. Since few applications do larger than 128KB ops, this is unlikely to be an issue. Splitting can also occur when IOs fall across partition boundaries but these are easily differentiated from the splitting as a result of the IO size.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;SPLTCP/s&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of split copies per second. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Section 4.3 Batch Mode Output&lt;/h2&gt;
Esxtop batch mode output can be loaded in perfmon directly. It uses a csv (comma separated values) format. The instance type can be ideitified via its name. Because there are quite a number of instances related to disk statistics, let's list a few examples below. You may easily match the format in your own environment.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;LUN (aka device): "\\&amp;lt;host&amp;gt;\Physical Disk(DEV-vmhba0:0:0)\&amp;lt;counter&amp;gt;"&lt;/li&gt;
&lt;li&gt;Partition: "\\&amp;lt;host&amp;gt;\Physical Disk(PN-vmhba0:0:0-1)\&amp;lt;counter&amp;gt;"&lt;/li&gt;
&lt;li&gt;Path: "\\&amp;lt;host&amp;gt;\Physical Disk(PH-vmhba0:C0:T0:L0)\&amp;lt;counter&amp;gt;"&lt;/li&gt;
&lt;li&gt;Per-World-Per-Device: "\\&amp;lt;host&amp;gt;\Physical Disk(WD-vmhba0:0:0-1024)\&amp;lt;counter&amp;gt;"&lt;/li&gt;
&lt;li&gt;Adapter: "\\&amp;lt;host&amp;gt;\Physical Disk(vmhba0)\&amp;lt;counter&amp;gt;"&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Section 5 Network&lt;/h1&gt;
&lt;h2&gt;Section 5.1 Port&lt;/h2&gt;
We arrange the network stats per port of a virtual switch. "PORT-ID" identifies the port and "DNAME" shows the virtual switch name. A port can be linked to a physical NIC as an uplink, or can be connected by a virtual NIC. "UPLINK" indicates whether the port is an uplink.&lt;br /&gt;
&lt;br /&gt;
If the port is an uplink, i.e., "UPLINK" is 'Y', "USED-BY" shows the physical NIC name. &lt;br /&gt;
&lt;br /&gt;
If the port is connected by a virtual NIC, i.e., "UPLINK" is 'N', "USED-BY" shows the port client name. (a) If the port is used by a virtual machine, the client name contains a world id and the VM name. The world id identifies the leader world of the VM group. Note that "vswif" is used by COS (on classic ESX). (b) If the port is used by VMKernel system, there is no world id. The client name can be used to identify the use of the port. To give two examples.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;"vmk" is a port used by vmkernel. Users can create vmk NICs for their uses, such as VMotion. On ESXi, there will be at least one vmk NIC to communicate with outside of the host.&lt;/li&gt;
&lt;li&gt;"Management" is a management port for a portset. This is internal. Usually no need to worry about it.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
For each non-uplink port, the NIC teaming policy determines which physical NIC is in charge of the port. "TEAM-PNIC" shows the physical NIC name, if valid. Please refer to NIC teaming documentation for details.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Section 5.2 Port Statistics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"SPEED" (Mbps)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The link speed in Megabits per second. This information is only valid for a physical NIC.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"FDUPLX"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
'Y' implies the corresponding link is operating at full duplex. 'N' implies it is not. This information is only valid for a physical NIC.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"UP"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
'Y' implies the corresponding link is up. 'N' implies it is not. This information is only valid for a physical NIC.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"PKTTX/s"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of packets transmitted per second.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"PKTRX/s"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The number of packets received per second.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"MbTX/s" (Mbps)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The MegaBits transmitted per second.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"MbRX/s" (Mbps)&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The MegaBits received per second.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: Why does MbRX/s not match PKTRX/s for different workloads?&lt;/i&gt; &lt;br /&gt;
&lt;i&gt;A: This is because the packet size may not be the same. The average packet size can be computed as follows: average_packet_size = MbRX/s / PKTRX/s . A large packet size may improve CPU efficiency of processing the packet. However, it may potentially increase latency.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%DRPTX"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of transmit packets dropped.&lt;br /&gt;
&lt;br /&gt;
"%DRPTX" = "dropped Tx packets" / ("success Tx packets" + "dropped Tx packets")&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %DRPTX is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: This usually means the network transmit performance is bad. Please check whether the phsycial NICs are fully utilizing their capacity. You probably need physical NICs with better performance. Or, you may add more physical NICs and use a good NIC teaming load balancing policy.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"%DRPRX"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
The percentage of receive packets dropped.&lt;br /&gt;
&lt;br /&gt;
"%DRPRX" = "dropped Rx packets" / ("success Rx packets" + "dropped Rx packets")&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Q: What does it mean if %DRPRX is high?&lt;/i&gt;&lt;br /&gt;
&lt;i&gt;A: This usally means the network recieve performance is bad. Try to give more CPU resource to the impacted VM, or increase the ring buffer size.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;"ACTN/s"&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
Number of actions per second. The actions here are VMkernel actions. It is an internal counter. We won't discuss it further here.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Section 6. Interrupt &lt;/h1&gt;
Interrupt screens are under development for our next release.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Section 7. Batch Mode&lt;/h1&gt;
Esxtop batch mode output uses a csv (comma separated values) format. The first line contains the names of the performance counters and their instances. Each of the following lines contains the performance data for those counter instances in one snapshot. &lt;br /&gt;
&lt;br /&gt;
One way to read the batch mode output file is to load it in Windows perfmon. (1) Run perfmon; (2) Type "Ctrl + L" to view log data; (3) Add the file to the "Log files" and click OK; (4) Choose the counters to show the performance data. Each batch mode counter has a category name (listed as a performance object in perfmon) and a counter name (listed in the counter list in perfmon).&lt;br /&gt;
&lt;br /&gt;
The counter names in esxtop batch mode are different from the ones in interactive mode listed in the sections above. The tables below describe their relationships. The first column is the interactive mode counter name; the second column is the batch mode counter category; the last column is the batch mode counter name.&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Table 7-1 CPU Batch Mode Counters&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;th&gt; Counter Name      &lt;/th&gt;
&lt;th&gt; Batch Mode Category  &lt;/th&gt;
&lt;th&gt; Batch Mode Counter Name           &lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; CPU load average   &lt;/td&gt;
&lt;td&gt;  Physical Cpu Load    &lt;/td&gt;
&lt;td&gt;  Cpu Load (1 Minute Avg)           &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;  Cpu Load (5 Minute Avg)           &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;  Cpu Load (15 Minute Avg)          &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PCPU(%)            &lt;/td&gt;
&lt;td&gt;  Physical Cpu         &lt;/td&gt;
&lt;td&gt;  % Processor Time                  &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; LCPU(%)            &lt;/td&gt;
&lt;td&gt;  Logical Cpu          &lt;/td&gt;
&lt;td&gt;  % Processor Time                  &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; CCPU(%) us         &lt;/td&gt;
&lt;td&gt;  Console Physical Cpu &lt;/td&gt;
&lt;td&gt;  % User Time                       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; CCPU(%) sy         &lt;/td&gt;
&lt;td&gt;  Console Physical Cpu &lt;/td&gt;
&lt;td&gt;  % System Time                     &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; CCPU(%) id         &lt;/td&gt;
&lt;td&gt;  Console Physical Cpu &lt;/td&gt;
&lt;td&gt;  % Idle Time                       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; CCPU(%) wa         &lt;/td&gt;
&lt;td&gt;  Console Physical Cpu &lt;/td&gt;
&lt;td&gt;  % I/O Wait Time                   &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; CCPU(%) cs/sec     &lt;/td&gt;
&lt;td&gt;  Console Physical Cpu &lt;/td&gt;
&lt;td&gt;  % Context Switches/sec            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %USED              &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % Used                            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %SYS               &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % System                          &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %OVRLP             &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % Overlap                         &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %RUN               &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % Run                             &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %RDY               &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % Ready                           &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %MLMTD             &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % Max Limited                     &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %CSTP              &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % CoStop                          &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %WAIT              &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % Wait                            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %IDLE              &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % Idle                            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %SWPWT             &lt;/td&gt;
&lt;td&gt;  Group Cpu (or Vcpu)  &lt;/td&gt;
&lt;td&gt;  % Swap Wait                       &lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Table 7-2 Memory Batch Mode Counters&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;th&gt; Counter Name      &lt;/th&gt;
&lt;th&gt; Batch Mode Category  &lt;/th&gt;
&lt;th&gt; Batch Mode Counter Name           &lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MEM overcommit avg &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Memory Overcommit (1 Minute Avg)  &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;  Memory Overcommit (5 Minute Avg)  &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;  Memory Overcommit (15 Minute Avg) &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PMEM total         &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Machine MBytes                    &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PMEM cos           &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Console MBytes                    &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PMEM vmk           &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Kernel MBytes                     &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PMEM other         &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  NonKernel MBytes                  &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PMEM free          &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Free MBytes                       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; VMKMEM managed     &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Kernel Managed MBytes             &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; VMKMEM minfree     &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Kernel MinFree MBytes             &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; VMKMEM rsvd        &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Kernel Reserved MBytes            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; VMKMEM ursvd       &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Kernel Unreserved MBytes          &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; VMKMEM state       &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Kernel State (0: high, 1: soft, 2:hard, 3: low) &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; COSMEM free        &lt;/td&gt;
&lt;td&gt;  Console Memory       &lt;/td&gt;
&lt;td&gt;  Free MBytes                       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; COSMEM swap_t      &lt;/td&gt;
&lt;td&gt;  Console Memory       &lt;/td&gt;
&lt;td&gt;  Swap Total MBytes                 &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; COSMEM swap_f      &lt;/td&gt;
&lt;td&gt;  Console Memory       &lt;/td&gt;
&lt;td&gt;  Swap Free MBytes                  &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; COSMEM r/s         &lt;/td&gt;
&lt;td&gt;  Console Memory       &lt;/td&gt;
&lt;td&gt;  Swap MBytes Read/sec              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; COSMEM w/s         &lt;/td&gt;
&lt;td&gt;  Console Memory       &lt;/td&gt;
&lt;td&gt;  Swap MBytes Write/sec             &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; NUMA               &lt;/td&gt;
&lt;td&gt;  Numa Node            &lt;/td&gt;
&lt;td&gt;  Total MBytes                      &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;  Free MBytes                       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PSHARE shared      &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  PShare Shared MBytes              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PSHARE common      &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  PShare Common MBytes              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PSHARE saving      &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  PShare Savings MBytes             &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SWAP curr          &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Swap Used MBytes                  &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SWAP target        &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Swap Target MBytes                &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SWAP r/s           &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Swap MBytes Read/sec              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SWAP w/s           &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Swap MBytes Write/sec             &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MEMCTL curr        &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Memctl Current MBytes             &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MEMCTL target      &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Memctl Target MBytes              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MEMCTL max         &lt;/td&gt;
&lt;td&gt;  Memory               &lt;/td&gt;
&lt;td&gt;  Memctl Max MBytes                 &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MEMSZ              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Memory Size MBytes                &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; GRANT              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Memory Granted Size MBytes        &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SZTGT              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Target Size MBytes                &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; TCHD               &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Touched MBytes                    &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %ACTV              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  % Active Estimate                 &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %ACTVS             &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  % Active Slow Estimate            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %ACTVF             &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  % Active Fast Estimate            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %ACTVN             &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  % Active Next Estimate            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MCTL?              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Memctl?                           &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MCTLSZ             &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Memctl MBytes                     &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MCTLTGT            &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Memctl Target MBytes              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MCTLMAX            &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Memctl Max MBytes                 &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SWCUR              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Swapped MBytes                    &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SWTGT              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Swap Target MBytes                &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SWR/s              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Swap Read MBytes/sec              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SWW/s              &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Swap Written MBytes/sec           &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SHRD               &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Shared MBytes                     &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; ZERO               &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Zero MBytes                       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SHRDSVD            &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Shared Saved MBytes               &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; COWH               &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Copy On Write Hint MBytes         &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; OVHDUW             &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Overhead UW MBytes                &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; OVHD               &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Overhead MBytes                   &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; OVHDMAX            &lt;/td&gt;
&lt;td&gt;  Group Memory         &lt;/td&gt;
&lt;td&gt;  Overhead Max MBytes               &lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Table 7-3 Disk Batch Mode Counters&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;th&gt; Counter Name      &lt;/th&gt;
&lt;th&gt; Batch Mode Category  &lt;/th&gt;
&lt;th&gt; Batch Mode Counter Name           &lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; CMDS/s             &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Commands/sec                      &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; READS/s            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Reads/sec                         &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; WRITES/s           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Writes/sec                        &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MBREAD/s           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  MBytes Read/sec                   &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MBWRTN/s           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  MBytes Written/sec                &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; KAVG/cmd           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Kernel MilliSec/Command   &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; DAVG/cmd           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Driver MilliSec/Command   &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; GAVG/cmd           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Guest MilliSec/Command    &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; QAVG/cmd           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Queue MilliSec/Command    &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; KAVG/rd            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Kernel MilliSec/Read      &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; DAVG/rd            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Driver MilliSec/Read      &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; GAVG/rd            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Guest MilliSec/Read       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; QAVG/rd            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Queue MilliSec/Read       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; KAVG/wr            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Kernel MilliSec/Write     &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; DAVG/wr            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Driver MilliSec/Write     &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; GAVG/wr            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Guest MilliSec/Write      &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; QAVG/wr            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Average Queue MilliSec/Write      &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; AQLEN              &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Adapter Q Depth                   &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; LQLEN              &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Lun Q Depth                       &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; DQLEN              &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Device Q Depth                    &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; WQLEN              &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  World Q Depth                     &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; ACTV               &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Active Commands                   &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; QUED               &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Queued Commands                   &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %USD               &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  % Used                            &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; LOAD               &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Load                              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; LOAD               &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Load                              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; ABRTS/s            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Aborts/sec                        &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; RESETS/s           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Resets/sec                        &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PAECMD/s           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  PAE Commands/sec                  &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PAECP/s            &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  PAE Copies/sec                    &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SPLTCMD/s          &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Split Commands/sec                &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SPLTCP/s           &lt;/td&gt;
&lt;td&gt;  Physical Disk        &lt;/td&gt;
&lt;td&gt;  Split Copies/sec                  &lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Table 7-4 Network Batch Mode Counters&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;th&gt; Counter Name      &lt;/th&gt;
&lt;th&gt; Batch Mode Category  &lt;/th&gt;
&lt;th&gt; Batch Mode Counter Name           &lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; SPEED              &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  Link Speed (Mb/s)                 &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; FDUPLX             &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  Full Duplex?                      &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; UP                 &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  Link Up?                          &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PKTTX/s            &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  Packets Transmitted/sec           &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; PKTRX/s            &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  Packets Received/sec              &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MbTX/s             &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  MBits Transmitted/sec             &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; MbRX/s             &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  MBits Received/sec                &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %DRPTX             &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  % Outbound Packets Dropped        &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; %DRPRX             &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  % Received Packets Dropped        &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt; ACTN/s             &lt;/td&gt;
&lt;td&gt;  Network Port         &lt;/td&gt;
&lt;td&gt;  Actions Posted/sec                &lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">performance</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxi</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">network</category>
      <pubDate>Wed, 31 Dec 2008 22:48:19 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-9279</guid>
      <dc:date>2008-12-31T22:48:19Z</dc:date>
      <clearspace:dateToText>9 months, 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>6</clearspace:replyCount>
    </item>
    <item>
      <title>Best Practices for SQL Server</title>
      <link>http://communities.vmware.com/docs/DOC-8964</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
At VMworld 2008 in Las Vegas several of us in our virtual performance team met with a variety of customers to talk about Microsoft SQL Server. We already had a large base of customers running very many SQL Server DBs on our products and we wanted to collect information on the challenges posed in the process of virtualizing this critical workload. We were pleased to see that ESX Server handled SQL VMs with excellent performance. But, for many customers, the first efforts at virtualizing SQL didn't yield high-performing SQL VM.  After careful investigation and many, many discussions we've started to put together the puzzle as to where SQL Server performance problems come from.  This page will document these common problems, borrowing slides from our presentations on the subject.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Virtualizing SQL: The Checklist&lt;/h1&gt;
We've talked with dozens of customers in the past months to document the issues that resulted in poor SQL performance. Happily, none of the issues were due to underlying technologies. Here is a list of issues and an explanation of the impacts. These items are roughly listed in the order of decreasing likelihood of occurrence. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 1: Configure Storage Correctly&lt;/h2&gt;
Storage configuration problems are the number one cause of SQL performance issues.  Usually these problems arise because the DBA requests a virtual disk of the VI admin, the VI admin places the VMDK on a LUN that may or may not meet the DBA's performance needs.  For instance:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;VMs' VMDK files placed on VMFS volumes without enough spindles.&lt;/li&gt;
&lt;li&gt;Many VMDK files placed on a single VMFS volume which could use more spindles.&lt;/li&gt;
&lt;li&gt;Database and log files placed on the same LUN which, you guessed it, could use more spindles.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
This may be obvious to some, but this problem occurs again and again.  The VI administrator should be aware of a few technical items that can help understand and avoid this problem:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Based on the IO demands of the DB files, a certain number of spindles should be guaranteed to this file.  This means that its VMDK must be placed on a VMFS volume to accout for the SQL Server's demands and all of the other demands on that volume.&lt;/li&gt;
&lt;li&gt;Mixing sequential activity (such as log file update) and random activity (such as database access) results in random behavior.  This means that the LUN configuration in the pre-virtual physical environment may not be sufficient for the consolidated environment.  This is discussed some in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9696"&gt;Storage Performance: VMFS and Protocols&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;When storage isn't meeting the SQL Server's demands, the device latency or kernel latency (queueing time) will increase.  Read up on these counters in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5490"&gt;Storage Performance Analysis and Monitoring&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Item 2: Use Recent Hardware&lt;/h2&gt;
&lt;br /&gt;
Often companies that are dipping their metaphorical toes into&lt;br /&gt;
virtualization want to run proof-of-concept (POC) experiments to verify&lt;br /&gt;
that the virtual platform can meet their performance expectations. But&lt;br /&gt;
its surprising how many times these experiments are run on older,&lt;br /&gt;
poorly-performing hardware. Presumably the shiny, new systems were in&lt;br /&gt;
use for production applications so only the mothballed, cobweb-covered&lt;br /&gt;
servers from a previous generation were available for the POC. This&lt;br /&gt;
causes many problems.  Check out this slide from a talk on SQL Server at VMworld Europe 2009:&lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5626/newer_hardware.png" alt="newer_hardware.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5626/newer_hardware.png');return false;"/&gt;  &lt;br /&gt;
&lt;p /&gt;
The slide points out a couple of things. First, the larger caches and shorter pipelines on newer Intel processors results in a considerable drops in performance overheads.  Second, the latency of the VMEXIT instruction, which determines the amount of time it takes to transition from the VM to the VMkernel, has shrunk by a large amount with subsequent generations of hardware.  And don't forget the other additions from Intel and AMD such as hardware assisted memory management and IO virtualization. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 3: Follow SQL Server Best Practices&lt;/h2&gt;
&lt;br /&gt;
Microsoft has kindly provided a &lt;a class="jive-link-external" href="http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/storage-top-10.mspx"&gt;web page of best practices for SQL Storage configuration&lt;/a&gt;. These be practices should still be followed when configuring your virtual SQL deployments!&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 4: Configure VM Identically to Native and Run The Right Test&lt;/h2&gt;
&lt;p /&gt;
For many SQL Server POCs the goal is to measure the VM's ability to perform, with respect to the virtual platform. If this comparison is to be performed, its critical that the VM be configured identically to the physical hardware. Obviously this means that the VM should be run on the same hardware using identically configured LUNs. Its also important to ensure that the VM has the same number of vCPUs and amount of memory as the physical baseline. This means restricting the number of pCPUs and amount of memory with NUMPROC and MAXMEM, respectively, in boot.ini.&lt;br /&gt;
&lt;br /&gt;
It also means that the test being applied should be understood.  If a benchmark is chosen that uses a very small database, the content will be cached and the storage system won't be used.  This can skew the results and produce recommendations not consistent with production deployments.  Here is another slide from the same VMworld Europe 2009 presentation detailing some of what we know about the SQL Server benchmarking alternatives:&lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5628/sql_benchmarks.png" alt="sql_benchmarks.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5628/sql_benchmarks.png');return false;"/&gt; &lt;br /&gt;
&lt;p /&gt;
We at VMware prefer DVD Store.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 5: Use VMware's ESX Server&lt;/h2&gt;
&lt;br /&gt;
VMware's hosting products, VMware Server, VMware Workstation, and even VMware Fusion, are all capable of running SQL Server. But if the database is going to be run in production on enterprise-class hardware, use VMware's enterprise-class hypervisor: ESX Server.  These products are not often confused by the initiated but rogue members of large companies often run off-the-books proof-of-concept experiments on VMware's hosted products.  When they produce results they don't like, the results get spread throughout the company which can slow the virtual deployment.&lt;br /&gt;
&lt;p /&gt;
Consider the following data, again from the VMworld Europe 2009 SQL Server presentation:&lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5629/vmmark_esx_server.png" alt="vmmark_esx_server.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5629/vmmark_esx_server.png');return false;"/&gt;  &lt;br /&gt;
&lt;p /&gt;
This information is getting a bit dated now, as it was performed years ago on ESX Server 3.0.  But the point stands: before believing results claiming that "VMware cannot run SQL Server" its worth investigating the platform used to generate the results. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 6: Understand Memory Management and Configure Correctly&lt;/h2&gt;
Database performance is heavily dependent on the amount of memory available. Almost without exception, providing more memory to SQL Server will improve performance. However, if that memory is coming from a host that is already over-committed or is being provided through workarounds to 32-bit limitations, performance may suffer. Here are a few keys for SQL Server memory management:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;If more than 3 GB is desired, use 64-bit versions of the OS and application.&lt;/li&gt;
&lt;li&gt;If memory is over-committed on the box, set reservations for performance-critical SQL Server VMs to guarantee that those VMs' memory isn't ballooned or swapped out.&lt;/li&gt;
&lt;li&gt;If SQL Server's "lock pages in memory" parameter has been set, provide set the VM's reservations to the amount of memory in the VM. This setting can adversely interfere with ESX Server's balloon driver. Setting reservations will stop the balloon driver from inflating into the VM's memory space.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h2&gt;Item 7: Align Disk Partitions&lt;/h2&gt;
This item is really a special but very important case of item two, follow best practices. Partition alignment can impact storage performance which can be critical to some SQL Server VMs' performance. See VMware's &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/esx3_partition_align.pdf"&gt;paper on partition alignment&lt;/a&gt; for more information on this.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Whitepapers&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/SQL_Server_consolidation.pdf"&gt;SQL Server Workload Consolidation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/SQLServerWorkloads.pdf"&gt;SQL Server Performance in a VMware Infrastructure 3 Environment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/benchmarking_micrsoft_sql_vmware_esx_server_wp.pdf"&gt;Benchmarking Microsoft SQL Server Using VMware ESX Server 3.5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.dell.com/downloads/global/solutions/vmware_1955.pdf"&gt;VMware VMotion Performance on the Dell PowerEdge 1955 Blade Server&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">sql</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">windows</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">cpu</category>
      <pubDate>Mon, 01 Dec 2008 20:45:05 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-8964</guid>
      <dc:date>2008-12-01T20:45:05Z</dc:date>
      <clearspace:dateToText>8 months, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Ready Time</title>
      <link>http://communities.vmware.com/docs/DOC-7390</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
Ready time is as important as it is confusing.  I'm going to collect a few thoughts on ready time in this collection point with the hopes that some of the confusion around this important part of virtual system performance can be eliminated. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Details&lt;/h1&gt;
Stated simply, ready time is the amount of time a VM wants to run but has not be provided CPU resources on which to execute.  Somewhat confusingly, ready time is reported in two different values between esxtop and VirtualCenter.  In esxtop is reported in an easily-consumed percentage format.  A number of 5% means the VM spent 5% of its last sample period waiting for available CPU resources.  In VirtualCenter ready time is reported as a time measurement.  In VC's real-time data, which produces sample values every 20,000 ms, a number of 1,000 ms is reported for a 5% ready time.&lt;br /&gt;
&lt;br /&gt;
There is so much more to know about ready time that I'm not going to reproduce here.  Read the &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/esx3_ready_time.pdf"&gt;whitepaper on the subject&lt;/a&gt;  for more details.  There have been no changes in the details on ready time since ESX 3.0 that make that paper out-of-date. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Interpreting Ready Time Values&lt;/h1&gt;
The most common question we get on ready time is, "what ready time numbers constitute a problem?"  While there is no easy answer to this, we can offer some guidance on the acceptable values.  But before I lay that out, let me say that ready time should &lt;i&gt;not&lt;/i&gt; be the ultimate measurement of system performance.  As always, user experience and latency should be.  There are some situations where user experience is horrible on a system with no load and virtually zero ready time.  This could happen with a mis-configured array, as an example.  And occasionally we see aggressively-consolidated hosts showing very high ready times that are meeting user needs.  There are no absolutes with ready time.&lt;br /&gt;
&lt;br /&gt;
But, there are a few general regions into which ready time values can be binned.  Note that these ready time values are per vCPU.  esxtop reports ready time for a VM once its been summed up across all vCPUs.  That means that 5% ready on each of four vCPUs will be reported as 20% ready at the VM level.  This is the high end of a very light amount of ready time.&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Value, per vCPU&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;r == 0%&lt;/td&gt;
&lt;td&gt;This doesn't happen.  The very presence of a hypervisor between the operating system and the hardware means that there is a non-zero ready time on all operations.  But on healthy systems this number is so small that end-users don't know their workload has been virtualized.  See the next section.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0% &amp;lt; r &amp;lt;= 5%&lt;/td&gt;
&lt;td&gt;This is the "normal" region for ready time.  Very small single digit numbers result in a minimal impact to user experience.  If performance problems exist on the system and ready time falls into this region, your problems lie elsewhere.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5% &amp;lt; r &amp;lt;= 10%&lt;/td&gt;
&lt;td&gt;In this region ready time is starting to be worth watching.  Most systems function healthily with ready time in this region but highly sensitive measurements may be suffering.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10% &amp;lt; r&lt;/td&gt;
&lt;td&gt;While some systems continue to meet expectations, double-digit ready time percentages often mean some action is required to address performance issues.  See the last section for guidance.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
Again, remember that VirtualCenter performance numbers must be re-calculated to percentages to find the category on the above table.  But since VC reports ready time per vCPU, no special arithmetic is needed to account for the number of vCPUs in the VM (as is needed with esxtop.)&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Causes and Correction&lt;/h1&gt;
There are two general areas that can cause unnecessarily high ready times:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Overloaded hosts.&lt;/li&gt;
&lt;li&gt;Excessive use of SMP.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Host Overloading&lt;/h2&gt;
The most common cause of high ready time is trying to get too much work out of too little hardware.  Consider the following simple case: on a hypothetical system with only one physical CPU, if two 1-way VMs are fully loaded by their users then each wants to have an entire CPU.  Because only one is available, ESX will time share that resource and give each of them only 50% of the CPU.  As a result, each VM will spend 50% of its time waiting for the processor.  This would be reported as 50% ready time.&lt;br /&gt;
&lt;br /&gt;
Often this condition is observable when ready time is high and total host CPU utilization is also very high.  The only fix for this is to back off the load on the system.  VMs should be migrated off or processor resources should be increased. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Excessive SMP&lt;/h2&gt;
In ESX Server 2.5, SMP guests had to be &lt;i&gt;co-scheduled&lt;/i&gt; to start at the exact same moment.  If a 2-way VM was ready to run but only one physical core was available, the VM would not be scheduled until a second core was freed up.  This would increase its ready time.  In ESX Server 3.0 and later versions, relaxed co-scheduling was introduced which meant that a subset of a VM's vCPUs could be scheduled ahead of others.  However, guest operating systems still require some degree of co-scheduling which means that the relaxation isn't absolute.  In short, increasing vCPUs still puts some burden on the scheduler to try and co-schedule the vCPUs that can increase ready time.  This is one ready why VMware advises only allocating vCPUs to VMs that are using them.  Read &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-4960"&gt;Co-scheduling SMP VMs in VMware ESX Server&lt;/a&gt;  for more information on co-scheduling.&lt;br /&gt;
&lt;br /&gt;
This condition is manifested by hosts that have sub-optimal CPU utilization and lots of SMP VMs.  A host may have a dozen 4-way VMs with each showing high ready time but only be at an aggregate 40% CPU utilization.  This is a clear sign that the scheduler is spending a great deal of time managing unneeded vCPUs.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">smp</category>
      <pubDate>Wed, 27 Aug 2008 17:50:36 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-7390</guid>
      <dc:date>2008-08-27T17:50:36Z</dc:date>
      <clearspace:dateToText>9 months, 3 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>Large Memory Pages</title>
      <link>http://communities.vmware.com/docs/DOC-6912</link>
      <description>In ESX Server 3.5 VMware introduced support for large memory pages in the guest. Large memory pages, an architecteral feature available in x86 microprocessors for decades, can be used to improve performance on workloads that make use of them. With CPU, hypervisor, OS, and application support, throughputs can go up and CPU utilization can go down. Since applications such as Oracle databases and Java have been using large pages on Linux and Windows for years, the introduction of this support on ESX Server allows for increased gains in performance over previous virtual installs. VMware is currently the only virtualization vendor to support large pages.&lt;br /&gt;
&lt;br /&gt;
VMware's support for large memory pages is detailed in the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/large_pg_performance.pdf"&gt;Large Page Performance&lt;/a&gt; performance study. That paper includes data on throughput gains in SPECjbb. The results are duplicated here:&lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-6912-3-3334/specjbb_lp.JPG" alt="specjbb_lp.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-6912-3-3334/specjbb_lp.JPG');return false;"/&gt;&lt;br /&gt;
&lt;p /&gt;
&lt;br /&gt;
At LinuxWorld 2008 VMware presented further data on the value of large memory pages with Oracle databases. Here is a chart showing those gains with VMware binary translation (BT): &lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3335/swingbench_lp_bt.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3335/swingbench_lp_bt.JPG" class="jive-image"  /&gt;&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
Data also shared at LinuxWorld on the value of large memory pages with AMD Rapid Virtualization Indexing (RVI; formerly called NPT): &lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-6912-2-3336/swingbench_lp_rvi.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-6912-2-3336/swingbench_lp_rvi.JPG" class="jive-image"  /&gt;&lt;br /&gt;
&lt;p /&gt;
The gains derived from the presence of large pages in these Oracle/Swingbench results are atypical.  Large pages have been documented in numerous locations to provide benefits between 5-20% on most database applications.  The increases shown here (of up to 350%) are due to the specialized configuration which is less about demonstrating real-world application performance and more about stressing the underlying configuration to uncover strengths and weaknesses.</description>
      <pubDate>Fri, 08 Aug 2008 16:02:31 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-6912</guid>
      <dc:date>2008-08-08T16:02:31Z</dc:date>
      <clearspace:dateToText>1 year, 3 months ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>Storage Queues and Performance</title>
      <link>http://communities.vmware.com/docs/DOC-6490</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
VMware recently published a paper titled &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/scalable_storage_performance.pdf"&gt;Scalable Storage Performance&lt;/a&gt;  that delivered a wealth of information on storage with respect to the ESX Server architecture.  This paper contains details about the storage queues that are a mystery to many of VMware's customers and partners.   I wanted to start a wiki article on some aspects of this paper that may be interesting to storage enthusiasts and performance freaks.&lt;br /&gt;
&lt;h1&gt;Two Important Queues&lt;/h1&gt;
Let's use the following figure as a starting point for this discussion.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3142/strorage_queues.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3142/strorage_queues.JPG" class="jive-image"  /&gt; &lt;br /&gt;
&lt;br /&gt;
For the purposes of this paper, I'm going to call the two different queue types the "kernel queue" and the "device driver queue".  The device driver queue is specified in the device itself and has historically been configured through Linux-like module commands in the console operating system.  More on that in "Changing Queue Depth" below.  The kernel queue should be thought of as infinitely long, for all practical purposes.  Any time the device driver queue gets full, commands to the storage will queue up in the kernel.&lt;br /&gt;
&lt;br /&gt;
Note that each LUN gets its own queue.  This means that when you change the queue depth in the device driver, you're changing the queue depths for many queues.  The underlying device (HBA) is going to have a hard limit on the number of active commands it will allow at one time.  This should be considered when setting queue depth.  If your HBA can support only 2,000 active commands but it is addressing 40 LUNs, a specified queue depth of 64 won't allow that many commands to all LUNs.  This being due to the fact that 64*40 = 2,560--which is more than the 2,000 maximum commands.  In practice this is rarely a concern, though, as rarely are so many LUNs being simultaneously addressed through a single HBA and so many outstanding commands being issued to these LUNs.&lt;br /&gt;
&lt;h2&gt;Device Driver Queue Function&lt;/h2&gt;
The device driver queue is used for a low-level interaction with the storage device.  It controls how many active, or "in flight", commands there can be at any one time.  This is effectively the concurrency of the storage stack.  Set the device queue to 1 and each storage command becomes sequential: each one must complete before the next starts.&lt;br /&gt;
&lt;br /&gt;
But if the device queue is left at its default of 32, as an example, 32 commands will be concurrently processed by the storage system.  All 32 will be shipped off to the storage device by the kernel and new commands are queued when completions arrive.&lt;br /&gt;
&lt;h2&gt;Kernel Queue Function&lt;/h2&gt;
The kernel queue can be thought of as kind of an overflow queue for the device driver queues.  But it's not just an overflow queue.  ESX Server contains all kinds of cool optimizations to get the most out of your storage. And these features apply to commands in the kernel queue only. Here are some examples of features provided to commands queued at the kernel queues:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Multi-pathing for failover and load balancing.&lt;/li&gt;
&lt;li&gt;Prioritization of storage activities based on VM and cluster shares.&lt;/li&gt;
&lt;li&gt;Optimizations to improve efficiency for long sequential operations.&lt;/li&gt;
&lt;/ol&gt;
There are others, as well.&lt;br /&gt;
&lt;h1&gt;Impacts of Queue Depths&lt;/h1&gt;
So, increasing queue depths in the device driver can greatly improve the performance of the storage at the device level. Decreasing the device driver queue will result in increases in usage of the kernel queues.  This decreases the device efficiency, but introduces opportunities for optimizations across multiple VMs and devices.  So, what's the right ratio of these two depths?  We think that the sweet spot lies with a depth 32 device driver queue.  That's why we've set 32 as the default device driver queue length.&lt;br /&gt;
&lt;br /&gt;
But your configuration and workloads may benefit from a change to this default queue depth.  I'll refer you to the aforementioned &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/scalable_storage_performance.pdf"&gt;storage paper&lt;/a&gt;  for information on when you might want to change the driver queue depth.  I'll just point out a couple of broad observations here:&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;With fewer, very high IO VMs on a host, larger queues at the device driver will improve performance.&lt;/li&gt;
&lt;li&gt;As the VM count grows and storage performance features--like shares, load balancing, failover, etc.--become more important, the default queue depth is best.&lt;/li&gt;
&lt;li&gt;With too many servers each having too large of device queues, your storage array could easily be overloaded and see its performance suffer.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Improving Storage Performance&lt;/h1&gt;
Now that we've covered how storage queuing works, you may be wondering how you can monkey around with these queue sizes for optimal performance.  I can tell you as someone that has been involved with many, many performance analysis projects that changing queue size is rarely a fix to an acute storage performance problem.  You should first go through the analysis techniques in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5490"&gt;Storage Performance Analysis and Monitoring&lt;/a&gt;.  That may or may not lead to changing queue depths.&lt;br /&gt;
&lt;br /&gt;
But, in the event that you do end up changing queue depths...&lt;br /&gt;
&lt;h2&gt;Changing Queue Depth&lt;/h2&gt;
We have a &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1267"&gt;helpful knowledge base article&lt;/a&gt;  that describes the process of changing the device driver queue.  Unfortunately, as of today (7/24/08) this document only describes how to change queues through the console operating system.  No information is provided for ESXi.  I've contacted the KB owner and will have that document updated ASAP.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <pubDate>Thu, 24 Jul 2008 00:51:16 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-6490</guid>
      <dc:date>2008-07-24T00:51:16Z</dc:date>
      <clearspace:dateToText>1 year, 4 months ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
    </item>
    <item>
      <title>Guest-based Performance Measurement</title>
      <link>http://communities.vmware.com/docs/DOC-5661</link>
      <description>Because VMware products provide a virtual interface to the hardware, traditional performance instrumentation that is based on measuring hardware resources may not be accurate.  As a result, Perfmon (in Windows) and top (in UNIX variants) will not provide accurate measurements of CPU utilization.  The problems seen as a result of usage of traditional in-guest performance measurements come from three areas:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;They are unaware of work being performance by the virtualization software, they will not have complete information on the resources being used by the virtualization software.  This includes memory management, scheduling, and other support processes like the service console in ESX.&lt;/li&gt;
&lt;li&gt;The way in which guest OSes account time is different and ineffective in a virtual machine.&lt;/li&gt;
&lt;li&gt;Their visibility into available CPU resources is based on the fraction of the CPU that they have been provided by the virtualization software.&lt;/li&gt;
&lt;/ol&gt;
Items two and three are covered in more detail in &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/2032"&gt;a KB on the subject&lt;/a&gt;.  Performance analysis on virtual deployments should always use host-based tools.  On ESX Server, this means esxtop or VirtualCenter.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <pubDate>Tue, 03 Jun 2008 16:50:47 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5661</guid>
      <dc:date>2008-06-03T16:50:47Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>vCenter Performance Counters</title>
      <link>http://communities.vmware.com/docs/DOC-5600</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
The following table of vCenter (VC) performance counters lists the counters with a description of their purpose.  This page has been updated for vSphere 4, so the counter levels will differ slightly on older versions of VC.&lt;br /&gt;
&lt;br /&gt;
Remember, with the exception of ready time, statistic levels one and two are the only ones needed for 99% of the performance monitoring and analysis out there.  Don't spend many of your own cycles worrying about levels three and four!&lt;br /&gt;
&lt;br /&gt;
For information on enabling VC to display and archive these counters see the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding vCenter Performance Statistics&lt;/a&gt; article.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Understanding vCenter Measurement Windows&lt;/h1&gt;
Before you continue, you should know that all total count metrics reported by VC are reported over the sample window.  When you're looking at live stats, this sample window is 20 seconds.  When you're looking at archive stats, it will depend on the interval duration.  That duration could be five minutes, 30 minutes, two hours, or one day.&lt;br /&gt;
&lt;br /&gt;
This causes a lot of confusion when comparing esxtop results to live VC results to archived VC results.  As an example, ready time might be reported as 10% in esxtop.  In live VC results this amount of ready time would be reported as 2000 ms (10% of the 20s window.)  In one day archive results, the same number would be reported as 30,000 ms (10% of the five minute interval duration.)  All of these numbes reflect the same amount of ready time.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;CPU Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;cpu.ready.summation&lt;/td&gt;
&lt;td&gt;Ready time is the time spend waiting for CPU(s) to become available in the past update interval.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;cpu.usagemhz.average&lt;/td&gt;
&lt;td&gt;The CPU utilization.  The maximum possible value here is the frequency of the processors times the number of cores.  As an example, a VM using 4000 MHz  on a system with four 2 GHz processors is using 50% of the CPU (4000 / (4 * 2000) = 0.5)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;cpu.usage.average&lt;/td&gt;
&lt;td&gt;The CPU utilization.  This value is reported with 100% representing all processor cores on the system.  As an example, a 2-way VM using 50% of a four-core system is completely using two cores.&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;cpu.reservedCapacity.average&lt;/td&gt;
&lt;td&gt;CPU Reserved Capacity&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;cpu.idle.summation&lt;/td&gt;
&lt;td&gt;CPU Idle&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;cpu.swapwait.summation&lt;/td&gt;
&lt;td&gt;Swap wait time is time that the world spent waiting for memory to be swapped in.  When the VM is waiting for memory, it is not doing work.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.system.summation&lt;/td&gt;
&lt;td&gt;System time is the time spent in VMkernel during the last update interval.  This does not include guest code execution.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.wait.summation&lt;/td&gt;
&lt;td&gt;Wait time is the time spent waiting for hardware or VMkernel lock thread locks during the last update interval.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.extra.summation&lt;/td&gt;
&lt;td&gt;CPU extra is the time above the statically calculated entitlement. Entitlement is the share of processing time that a VM should get as a result of its vCPU count and assigned shares. &lt;i&gt;You should not use or care about this counter in any of your own analysis.&lt;/i&gt;&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.used.summation&lt;/td&gt;
&lt;td&gt;CPU Used&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.guaranteed.latest&lt;/td&gt;
&lt;td&gt;Guaranteed time is reported as the amount of the reservation time that the VM used in the past update interval.  As an example, if 2000 MHz have been reserved for the VM on an four-way, 2 GHz host, that's 25% of the CPU resource.  In a 20s update interval, there are 80,000 ms available on this four-way system.  That means 20,000 ms of time has been reserved.  If a VM used only half of its available cycles, the guaranteed time is 10,000 ms.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usage.none&lt;/td&gt;
&lt;td&gt;CPU Usage (None)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usage.minimum&lt;/td&gt;
&lt;td&gt;CPU Usage (Minimum)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usage.maximum&lt;/td&gt;
&lt;td&gt;CPU Usage (Maximum)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usagemhz.none&lt;/td&gt;
&lt;td&gt;CPU Usage in MHz (None)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usagemhz.minimum&lt;/td&gt;
&lt;td&gt;CPU Usage in MHz (Minimum)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usagemhz.maximum&lt;/td&gt;
&lt;td&gt;CPU Usage in MHz (Maximum)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Memory Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.consumed.average&lt;/td&gt;
&lt;td&gt;The amount of machine memory that is in use by the VM. While a VM may&lt;br /&gt;
			have been configured to use 4 GB of RAM, as an example, it might have&lt;br /&gt;
			only touched half of that. Of the 2 GB left, half of that might be&lt;br /&gt;
			saved from memory sharing. That would result in 1 GB of consumed memory.&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.overhead.average&lt;/td&gt;
&lt;td&gt;The memory used by the VMkernel to maintain and execute the VM.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.swapinrate.average&lt;/td&gt;
&lt;td&gt;The swap in rate reports the rate at which a VM's memory is being swapped in from disk.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.swapoutrate.average&lt;/td&gt;
&lt;td&gt;The swap out rate reports the rate at which a VM's memory is being swapped out to disk.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.usage.average&lt;/td&gt;
&lt;td&gt;The percentage of memory used as a percent of all available machine memory.  Available for host and VM.&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.average&lt;/td&gt;
&lt;td&gt;The amount of memory currently claimed by the balloon driver. This is&lt;br /&gt;
			not a performance problem, per se, but represents the host starting to&lt;br /&gt;
			take memory from less needful VMs for those with large amounts of&lt;br /&gt;
			active memory. But if the host is ballooning, check swap rates (swapin&lt;br /&gt;
			and swapout) which would be indicative of performance problems.&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.granted.average&lt;/td&gt;
&lt;td&gt;The amount of memory that was granted to the VM by the host.  Memory is not granted to the host until it is touched one time and granted memory may be swapped out or ballooned away if the VMkernel needs the memory.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.active.average&lt;/td&gt;
&lt;td&gt;The amount of memory used by the VM in the past small window of time.  This is the "true" number of how much memory the VM currently has need of.  Additional, unused memory may be swapped out or ballooned with no impact to the guest's performance.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.shared.average&lt;/td&gt;
&lt;td&gt;The average amount of shared memory.  Shared memory represents the entire pool of memory from which sharing savings are possible.  The amount of memory that this has been condensed to is reported in shared common memory.  So, total saving due to memory sharing equals shared memory minus shared common memory.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.zero.average&lt;/td&gt;
&lt;td&gt;The amount of zero pages in the guest.  Zero pages are not represented in machine memory so this results in 100% savings when mapping from the guest to the machine memory.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.unreserved.average&lt;/td&gt;
&lt;td&gt;Memory Unreserved (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapused.average&lt;/td&gt;
&lt;td&gt;The amount of swap memory currently in use.  A large amount of swap memory is not a performance problem.  This could be memory that the guest doesn't need.  Check the swap rates (swapin, swapout) to see if the guest is actively in need of more memory than is available.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapunreserved.average&lt;/td&gt;
&lt;td&gt;Memory Swap Unreserved (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.average&lt;/td&gt;
&lt;td&gt;The average amount of shared common memory.  Shared memory represents the entire pool of memory from which sharing savings are possible.  The amount of memory that this has been condensed to is reported in shared common memory.  So, total saving due to memory sharing equals shared memory minus shared common memory.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.heap.average&lt;/td&gt;
&lt;td&gt;Memory Heap (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.heapfree.average&lt;/td&gt;
&lt;td&gt;Memory Heap Free (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.state.latest&lt;/td&gt;
&lt;td&gt;Memory State&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapped.average&lt;/td&gt;
&lt;td&gt;Memory Swapped (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swaptarget.average&lt;/td&gt;
&lt;td&gt;Memory Swap Target (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapin.average&lt;/td&gt;
&lt;td&gt;The rate at which memory is being swapped in from disk.  A large number here represents a problem with lack of memory and a clear indication that performance is suffering as a result.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapout.average&lt;/td&gt;
&lt;td&gt;The rate at which memory is being swapped out to disk.  A large number here represents a problem with lack of memory and a clear indication that performance is suffering as a result.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.vmmemctltarget.average&lt;/td&gt;
&lt;td&gt;Memory Balloon Target (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.sysUsage.average&lt;/td&gt;
&lt;td&gt;Memory Used by vmkernel&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.reservedCapacity.average&lt;/td&gt;
&lt;td&gt;Memory Reserved Capacity&lt;/td&gt;
&lt;td&gt;megaBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.usage.none&lt;/td&gt;
&lt;td&gt;Memory Usage (None)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.usage.minimum&lt;/td&gt;
&lt;td&gt;Memory Usage (Minimum)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.usage.maximum&lt;/td&gt;
&lt;td&gt;Memory Usage (Maximum)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.granted.none&lt;/td&gt;
&lt;td&gt;Memory Granted (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.granted.minimum&lt;/td&gt;
&lt;td&gt;Memory Granted (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.granted.maximum&lt;/td&gt;
&lt;td&gt;Memory Granted (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.active.none&lt;/td&gt;
&lt;td&gt;Memory Active (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.active.minimum&lt;/td&gt;
&lt;td&gt;Memory Active (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.active.maximum&lt;/td&gt;
&lt;td&gt;Memory Active (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.shared.none&lt;/td&gt;
&lt;td&gt;Memory Shared (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.shared.minimum&lt;/td&gt;
&lt;td&gt;Memory Shared (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.shared.maximum&lt;/td&gt;
&lt;td&gt;Memory Shared (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.zero.none&lt;/td&gt;
&lt;td&gt;Memory Zero (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.zero.minimum&lt;/td&gt;
&lt;td&gt;Memory Zero (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.zero.maximum&lt;/td&gt;
&lt;td&gt;Memory Zero (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.unreserved.none&lt;/td&gt;
&lt;td&gt;Memory Unreserved (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.unreserved.minimum&lt;/td&gt;
&lt;td&gt;Memory Unreserved (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.unreserved.maximum&lt;/td&gt;
&lt;td&gt;Memory Unreserved (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapused.none&lt;/td&gt;
&lt;td&gt;Memory Swap Used (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapused.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap Used (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapused.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap Used (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapunreserved.none&lt;/td&gt;
&lt;td&gt;Memory Swap Unreserved (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapunreserved.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap Unreserved (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapunreserved.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap Unreserved (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.none&lt;/td&gt;
&lt;td&gt;Memory Shared Common (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.minimum&lt;/td&gt;
&lt;td&gt;Memory Shared Common (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.maximum&lt;/td&gt;
&lt;td&gt;Memory Shared Common (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heap.none&lt;/td&gt;
&lt;td&gt;Memory Heap (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heap.minimum&lt;/td&gt;
&lt;td&gt;Memory Heap (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heap.maximum&lt;/td&gt;
&lt;td&gt;Memory Heap (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heapfree.none&lt;/td&gt;
&lt;td&gt;Memory Heap Free (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heapfree.minimum&lt;/td&gt;
&lt;td&gt;Memory Heap Free (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heapfree.maximum&lt;/td&gt;
&lt;td&gt;Memory Heap Free (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapped.none&lt;/td&gt;
&lt;td&gt;Memory Swapped (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapped.minimum&lt;/td&gt;
&lt;td&gt;Memory Swapped (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapped.maximum&lt;/td&gt;
&lt;td&gt;Memory Swapped (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swaptarget.none&lt;/td&gt;
&lt;td&gt;Memory Swap Target (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swaptarget.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap Target (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swaptarget.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap Target (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapin.none&lt;/td&gt;
&lt;td&gt;Memory Swap In (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapin.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap In (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapin.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap In (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapout.none&lt;/td&gt;
&lt;td&gt;Memory Swap Out (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapout.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap Out (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapout.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap Out (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.none&lt;/td&gt;
&lt;td&gt;Memory Balloon (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.minimum&lt;/td&gt;
&lt;td&gt;Memory Balloon (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.maximum&lt;/td&gt;
&lt;td&gt;Memory Balloon (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctltarget.none&lt;/td&gt;
&lt;td&gt;Memory Balloon Target (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctltarget.minimum&lt;/td&gt;
&lt;td&gt;Memory Balloon Target (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctltarget.maximum&lt;/td&gt;
&lt;td&gt;Memory Balloon Target (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.overhead.none&lt;/td&gt;
&lt;td&gt;Memory Overhead (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.overhead.minimum&lt;/td&gt;
&lt;td&gt;Memory Overhead (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.overhead.maximum&lt;/td&gt;
&lt;td&gt;Memory Overhead (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.consumed.none&lt;/td&gt;
&lt;td&gt;Memory Consumed (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.consumed.maximum&lt;/td&gt;
&lt;td&gt;Memory Consumed (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.consumed.minimum&lt;/td&gt;
&lt;td&gt;Memory Consumed (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sysUsage.none&lt;/td&gt;
&lt;td&gt;Memory Used by vmkernel&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sysUsage.maximum&lt;/td&gt;
&lt;td&gt;Memory Used by vmkernel&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sysUsage.minimum&lt;/td&gt;
&lt;td&gt;Memory Used by vmkernel&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Disk Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;disk.maxTotalLatency&lt;/td&gt;
&lt;td&gt;The highest reported total latency (device and kernel times) in the sample window.&lt;/td&gt;
&lt;td&gt;milliseconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;disk.usage.average&lt;/td&gt;
&lt;td&gt;Average disk throughput over the sample period.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.read.average&lt;/td&gt;
&lt;td&gt;Average disk throughput due to read operaitons over the sample period.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.write.average&lt;/td&gt;
&lt;td&gt;Average disk throughput due to write operations over the sample period.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.commands.summation&lt;/td&gt;
&lt;td&gt;Disk Commands Issued&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.commandsAborted.summation&lt;/td&gt;
&lt;td&gt;The number of aborts that have occurred in the last window of time. Abort commands are issued by the guest when the storage system has not responded within an acceptable amount of time (as defined by the guest OS or application.)&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.busResets.summation&lt;/td&gt;
&lt;td&gt;Disk Bus Resets&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.deviceReadLatency.average&lt;/td&gt;
&lt;td&gt;Device read latency.  This is the time the physical device from the HBA to the platter takes to service an IO request.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.kernelReadLatency.average&lt;/td&gt;
&lt;td&gt;Kernel read latency.  This is the time the VMkernel takes to service an IO.  This is the time between the guest OS and the device.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.totalReadLatency.average&lt;/td&gt;
&lt;td&gt;Total read latency.  The sum of the device and kernel read latencies.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.queueReadLatency.average&lt;/td&gt;
&lt;td&gt;Queue Read Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.deviceWriteLatency.average&lt;/td&gt;
&lt;td&gt;Device write latency. This is the time the physical device from the HBA to the platter takes to service an IO request.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.kernelWriteLatency.average&lt;/td&gt;
&lt;td&gt;Kernel write latency.  This is the time the VMkernel takes to service an IO.  This is the time between the guest OS and the device.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.totalWriteLatency.average&lt;/td&gt;
&lt;td&gt;Total write latency.  The sum of the device and kernel write latencies.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.queueWriteLatency.average&lt;/td&gt;
&lt;td&gt;Queue Write Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.deviceLatency.average&lt;/td&gt;
&lt;td&gt;Physical Device Command Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.kernelLatency.average&lt;/td&gt;
&lt;td&gt;Kernel Disk Command Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.queueLatency.average&lt;/td&gt;
&lt;td&gt;Queue Command Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;disk.numberRead.summation&lt;/td&gt;
&lt;td&gt;The number of IO read operations in the previous sample period.  Note that these operations may be variable sized up to 64 KB.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;disk.numberWrite.summation&lt;/td&gt;
&lt;td&gt;The number of IO write operations in the previous sample period.  Note that these operations may be variable sized up to 64 KB.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;disk.totalLatency.average&lt;/td&gt;
&lt;td&gt;This is the average total latency over the sample window.  Total latency is the sum of kernel and device latency for both read and write commands.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;disk.write.average&lt;/td&gt;
&lt;td&gt;Disk Write Rate&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;disk.usage.none&lt;/td&gt;
&lt;td&gt;Disk Usage (None)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;disk.usage.minimum&lt;/td&gt;
&lt;td&gt;Disk Usage (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;disk.usage.maximum&lt;/td&gt;
&lt;td&gt;Disk Usage (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Network Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;net.usage.average&lt;/td&gt;
&lt;td&gt;Network Usage (Average)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;net.droppedRx.summation&lt;/td&gt;
&lt;td&gt;The number of received packets that were dropped over the sample period.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;net.droppedTx.summation&lt;/td&gt;
&lt;td&gt;The number of transmitted packets that were dropped over the sample period.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;net.received.average&lt;/td&gt;
&lt;td&gt;Average network throughput for received traffic.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;net.transmitted.average&lt;/td&gt;
&lt;td&gt;Average network throughput for transmitted traffic.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;net.packetsRx.summation&lt;/td&gt;
&lt;td&gt;Network Packets Received&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;net.packetsTx.summation&lt;/td&gt;
&lt;td&gt;Network Packets Transmitted&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;net.usage.none&lt;/td&gt;
&lt;td&gt;Network Usage (None)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;net.usage.minimum&lt;/td&gt;
&lt;td&gt;Network Usage (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;net.usage.maximum&lt;/td&gt;
&lt;td&gt;Network Usage (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Other Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;sys.uptime.latest&lt;/td&gt;
&lt;td&gt;Uptime&lt;/td&gt;
&lt;td&gt;second&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;sys.heartbeat.summation&lt;/td&gt;
&lt;td&gt;Heartbeat&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.cpufairness.latest&lt;/td&gt;
&lt;td&gt;CPU Fairness&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.memfairness.latest&lt;/td&gt;
&lt;td&gt;Memory Fairness&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.effectivecpu.average&lt;/td&gt;
&lt;td&gt;Effective CPU Resources&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.effectivemem.average&lt;/td&gt;
&lt;td&gt;Effective Memory Resources&lt;/td&gt;
&lt;td&gt;megaBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.failover.latest&lt;/td&gt;
&lt;td&gt;Current failover level&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;sys.resourceCpuUsage.average&lt;/td&gt;
&lt;td&gt;Resource CPU Usage (Average)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;managementAgent.memUsed.average&lt;/td&gt;
&lt;td&gt;Memory Used (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;managementAgent.swapUsed.average&lt;/td&gt;
&lt;td&gt;Memory Swap Used (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;managementAgent.swapIn.average&lt;/td&gt;
&lt;td&gt;Memory Swap In (Average)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;managementAgent.swapOut.average&lt;/td&gt;
&lt;td&gt;Memory Swap Out (Average)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actav1.latest&lt;/td&gt;
&lt;td&gt;CPU Active (1 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actpk1.latest&lt;/td&gt;
&lt;td&gt;CPU Active (1 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runav1.latest&lt;/td&gt;
&lt;td&gt;CPU Running (1 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actav5.latest&lt;/td&gt;
&lt;td&gt;CPU Active (5 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actpk5.latest&lt;/td&gt;
&lt;td&gt;CPU Active (5 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runav5.latest&lt;/td&gt;
&lt;td&gt;CPU Running (5 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actav15.latest&lt;/td&gt;
&lt;td&gt;CPU Active (15 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actpk15.latest&lt;/td&gt;
&lt;td&gt;CPU Active (15 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runav15.latest&lt;/td&gt;
&lt;td&gt;CPU Running (15 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runpk1.latest&lt;/td&gt;
&lt;td&gt;CPU Running (1 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.maxLimited1.latest&lt;/td&gt;
&lt;td&gt;CPU Throttled (1 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runpk5.latest&lt;/td&gt;
&lt;td&gt;CPU Running (5 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.maxLimited5.latest&lt;/td&gt;
&lt;td&gt;CPU Throttled (5 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runpk15.latest&lt;/td&gt;
&lt;td&gt;CPU Running (15 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.maxLimited15.latest&lt;/td&gt;
&lt;td&gt;CPU Throttled (15 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.sampleCount.latest&lt;/td&gt;
&lt;td&gt;Group CPU Sample Count&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.samplePeriod.latest&lt;/td&gt;
&lt;td&gt;Group CPU Sample Period&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;sys.resourceCpuUsage.none&lt;/td&gt;
&lt;td&gt;Resource CPU Usage (None)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;sys.resourceCpuUsage.maximum&lt;/td&gt;
&lt;td&gt;Resource CPU Usage (Maximum)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;sys.resourceCpuUsage.minimum&lt;/td&gt;
&lt;td&gt;Resource CPU Usage (Minimum)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <pubDate>Fri, 30 May 2008 00:15:21 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5600</guid>
      <dc:date>2008-05-30T00:15:21Z</dc:date>
      <clearspace:dateToText>2 months, 2 days ago</clearspace:dateToText>
      <clearspace:replyCount>5</clearspace:replyCount>
    </item>
    <item>
      <title>Time-based Measurements in Virtual Machines</title>
      <link>http://communities.vmware.com/docs/DOC-5581</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
All benchmarking relies on accurate time keeping so production can be measured with respect to the passage of time.  Because hosted and hypervisor products virtualize the hardware timer, minute fluctuations in guest time keeping can occur.  Details are provided on this in many locations including the following:&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/vmware_timekeeping.pdf"&gt;Timekeeping in Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/WS6_Performance_Tuning_and_Benchmarking.pdf"&gt;Workstation 6.0 Performance Tuning and Benchmarking (page 16)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
During benchmarking these fluctuations in time can cause unexpected results.  Performance measurements can appear inflated if time slows down while work occurs, depressed if time accelerates while work is occurring, or somewhere in between.  This topic will provide some details on this phenomenon.&lt;br /&gt;
&lt;br /&gt;
If the hypervisor (or host operating system for a hosted product) is busy with other tasks it may stall slightly when delivering timer interrupts to the VM. This means that the guest timer appears to run slow.  VMware products will correct for these deviations by pushing time back to its correct position but these "slow downs" and "catch ups" may occur at different points in a benchmark.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Artificially High Results&lt;/h1&gt;
Consider the case where a great number of IO operations are measured over a long period of time.  If a benchmark wants to run 10,000 IOPS and the native system takes 10ms to process each operation, the benchmark would measure 10ms on a native system.  However, if the benchmarking is run on a virtual system and the virtualization software is busy servicing the IO operation instead of updating time, time may not progress properly during the operation.  Although 10ms or more would have passed, perhaps the VM was only informed of the passing of 9ms.  In this case, the operation appeared to run faster on the VM than the native system.&lt;br /&gt;
&lt;br /&gt;
For benchmarks where many operations are measured of a large time window, this isn't a problem.  On our 10K operation benchmark, if time were started before operation one and stopped after operation 10,000, a fluctuation of 1ms will make no difference.  After all, that's only a 1ms inaccuracy on a sequence that would take 100s to run.&lt;br /&gt;
&lt;br /&gt;
However, if the benchmark measured each operation individually and summarized them all to product a result, each individual 1ms inaccuracy would be summed over the entire run.  The benchmark would report that the average IO length was 9ms even though observation of wall time would still show the passage of 100s during the 10,000 IO operation run.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">timekeeping</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <pubDate>Thu, 29 May 2008 18:05:22 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5581</guid>
      <dc:date>2008-05-29T18:05:22Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Benchmarking</title>
      <link>http://communities.vmware.com/docs/DOC-5520</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
Many VMware users wish to perform analysis on their own virtual deployments.  This page will collect information on setting up and executing your own tests to analyze performance.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;General Best Practices&lt;/h1&gt;
&lt;i&gt;Always measure performance from a native (non-virtual) system.&lt;/i&gt;  Be aware that time measurements in virtual machines can be subject to minute fluctuations.  Many benchmarks produce results by summing times from large number of small operations so these small inaccuracies can be compiled to produce a large error.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5581"&gt;Time-based Measurements in Virtual Machines&lt;/a&gt;  for more information on this subject.  The only way to guarantee correct measurement is to run the measurement tool on a native system.  This is easy for client-server test architectures but may require clever architecture for in-guest testing.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Always ensure apples-to-apples comparison.&lt;/i&gt;  Make sure that the benchmark or application under test are both constrained by the same resources.  For instance, if the virtual machine was configured with 512M of RAM and two virtual CPUs, restrict the native system to the same resources if a virtual-to-native comparison is desired.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Collect accurate host-based performance statistics.&lt;/i&gt;  Guest OS performance metrics (such as CPU utilization) are not accurate.  Use VirtualCenter or esxtop to collect accurate performance counters during the test.  See the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt;  for more information on analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Application Benchmarking&lt;/h1&gt;
Microsoft Exchange.&lt;br /&gt;
&lt;br /&gt;
Microsoft SQL Server. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Subsystem Benchmarking&lt;/h1&gt;
&lt;h2&gt;Storage&lt;/h2&gt;
Internally at VMware we've used Iometer for a variety of storage analyses.  See the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3961"&gt;Storage System Performance Analysis with Iometer&lt;/a&gt; for more information.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <pubDate>Wed, 28 May 2008 19:12:13 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5520</guid>
      <dc:date>2008-05-28T19:12:13Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for Oracle</title>
      <link>http://communities.vmware.com/docs/DOC-5505</link>
      <description>While Oracle makes a lot of software for a bunch of purposes, the only best practice guidance available today is for Oracle databases.  We'll build out pages dedicated to other Oracle products if and when other best practices are available.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Real Time Scheduling for Oracle DBs on Linux&lt;/h1&gt;
&lt;blockquote&gt;&lt;i&gt;The following nugget came from a database expert in performance engineering.  Due to the possibility of this change screwing up your DBs it was only with great reluctance that he even committed this to ink.  You're taking your own job in your hands if you try this with your production DBs.  But, for those of you with a test/dev Oracle DB and an unrepentant need for speed, give this a try.  --drummonds&lt;/i&gt;&lt;/blockquote&gt;
Traditional Unix/Linux timeshare scheduler policies attempt to provide good interactive response times by favoring a process that has just had an I/O request completed over a running process. This can create havoc on a high transaction-rate database server, but can be avoided by manipulating the scheduling policies for the database processes.&lt;br /&gt;
&lt;br /&gt;
The scheduler policies date back to the 1970s when we would be running an editor session simultaneously with nroff (a text formatting program that often ran for minutes to process a document) on a single-processor system. Allowing the long-running process to execute without preemption would mean unacceptably long response times for interactive applications. So, the scheduler decays the priority of a process as it runs and accumulates CPU time. Furthermore, when a timeshare priority process goes to sleep waiting for I/O completion, it sleeps at a stronger priority than any running process with a timeshare priority. This ensures, for example, that a text editor user will get immediate feedback for every character typed on the keyboard even if the system is busy running CPU hungry tasks.&lt;br /&gt;
&lt;br /&gt;
The problem with this scheme comes to the forefront when we run an application such as Oracle on a modern computer system. In an online transaction processing environment, the database management system threads/processes typically run for a short period of time, in the order of milliseconds or tens of milliseconds, then issue a disk I/O or send a network packet. A typical large system may issue many thousands of disk accesses per second. Every time an I/O completes, the scheduler makes the issuing process (or thread) runable, and at a stronger priority than all running timeshare processes. So, we are guaranteed a preemption unless the system can find an idle CPU. And, the preempted process goes to the end of the run queue for its priority level.&lt;br /&gt;
&lt;br /&gt;
Now, if a preemption occurs, and the running process was holding an important database resource, say, a latch, all other processes that may need that latch, possibly including the preemptor process itself, will go into a spin loop waiting for the latch, and will eventually put themselves to sleep until the process holding the latch runs again and releases the latch. This is very expensive in term of CPU usage. The spinning and the context switches will drive up the CPU utilization without an increase in the system throughput.&lt;br /&gt;
&lt;br /&gt;
An even worse phenomenon can occur in a large system with a very high I/O rate, where the problem may actually exhibit itself as excess CPU idle due to processes putting themselves to sleep waiting for latches and not waking up when the latch is released. Another possible symptom is the thundering herd problem: once the process holding the latch runs and releases it, a large number of processes become runable, and chaos and inefficiency follows.&lt;br /&gt;
&lt;br /&gt;
The simplest solution is to maintain a constant priority for the DBMS processes, even when sleeping. This way, we allow the running process to voluntarily give up the CPU, at which time it has supposedly released all latches. Because of the nature of OLTP workloads, processes will voluntarily give up the CPU after a few, or at most tens of, milliseconds. So avoiding involuntary preemptions will fix the problem.&lt;br /&gt;
&lt;br /&gt;
One way of achieving this is using the real-time priority feature. We are not really interested in running the database processes at a strong priority. We only use real time priorities because of an additional feature that this scheduling policy offers: the priority of a real time process does not change when the process sleeps on I/O. As a result when an I/O completion occurs, a process becomes runable at the same priority as the currently running process, and is put at the end of the run queue. We avoid the involuntary preemption and the associated spinning and sleeping costs.&lt;br /&gt;
&lt;br /&gt;
Here is a sample RHEL4.4 script that accomplishes this. &lt;b&gt;This script will run the Oracle processes at a stronger priority than all timeshare priority processes and could result in starvation of other applications. It should be used only when you are certain that running these processes at a strong priority won't deny resources to other applications.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;for DAEMON_PID in `ps -u oracle -f|grep -v grep|grep ora_|awk '{print $2}'` &lt;br clear="all" /&gt;		 do &lt;br clear="all" /&gt;		 sudo chrt --rr -p 82 --pid ${DAEMON_PID} &lt;br clear="all" /&gt;		 done &lt;br clear="all" /&gt;		 LGWR_PID=`ps -u oracle -f|grep -v grep|grep lgwr|awk '{print $2}'` &lt;br clear="all" /&gt;		 sudo chrt --rr -p 83 --pid ${LGWR_PID} &lt;br clear="all" /&gt;		 LSNR_PID=`ps -u oracle -f|grep -v grep|grep tnslsnr|awk '{print $2}'` &lt;br clear="all" /&gt;		 sudo chrt --rr -p 81 --pid ${LSNR_PID} &lt;br clear="all" /&gt;		 for SHADOW_PID in `ps -u oracle -f|grep -v grep|grep -v -E 'ora_|tnslsnr'|awk '{print $2}'` &lt;br clear="all" /&gt;		 do &lt;br clear="all" /&gt;		 sudo chrt --rr -p 81 --pid ${SHADOW_PID} &lt;br clear="all" /&gt;		 done&lt;/blockquote&gt;
To avoid starving the Oracle daemons, we run them at a stronger priority than the shadow processes, with particular attention paid to the logwriter.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">linux</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">oracle</category>
      <pubDate>Wed, 28 May 2008 00:01:33 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5505</guid>
      <dc:date>2008-05-28T00:01:33Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for IIS</title>
      <link>http://communities.vmware.com/docs/DOC-5504</link>
      <description>&lt;br /&gt;
Microsoft Internet Information Services (IIS) web server is a common target for virtualization.  This page will collect best practices for installation and configuration of IIS VMs for maximum performance.&lt;br /&gt;
&lt;p /&gt;
No IIS-specific information has yet been provided by the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5502"&gt;Best Practices for Web Servers&lt;/a&gt;  page contains hits common to all web servers.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">iis</category>
      <pubDate>Tue, 27 May 2008 23:26:59 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5504</guid>
      <dc:date>2008-05-27T23:26:59Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for Apache</title>
      <link>http://communities.vmware.com/docs/DOC-5503</link>
      <description>Apache web servers are a common target for virtualization in almost every data center.  This page will collect best practices for setting up and configuring Apache for best performance.&lt;br /&gt;
&lt;br /&gt;
No Apache-specific information has yet been provided.  But tips are available on the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5502"&gt;Best Practices for Web Servers&lt;/a&gt; page.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">apache</category>
      <pubDate>Tue, 27 May 2008 23:24:29 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5503</guid>
      <dc:date>2008-05-27T23:24:29Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for Web Servers</title>
      <link>http://communities.vmware.com/docs/DOC-5502</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;br /&gt;
This page is a collection point for ESX Server configuration options that can maximize web server performance regardless of the choice of web server.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;TCP Transmit Coalescing  &lt;/h1&gt;
&lt;br /&gt;
In a recent &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/specweb_perf_final.pdf"&gt;whitepaper published on SPECweb&lt;/a&gt;  performance on VMware ESX Server, the use of TCP transmit coalescing was described to improve web server performance.  The general idea behind transmit coalescing is to buffer TCP transmits at the ESX Server for a brief period of time to allow multiple packets to be transmitted at one time.  This introduces a very slight increase in latency but can provide a dramatic increase in efficiency.&lt;br /&gt;
&lt;p /&gt;
Transmit coalescing can be turned on with the following steps:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Using the VMware Infrastructure Client, choose the ESX Server host on which the virtual machine is deployed.&lt;/li&gt;
&lt;li&gt;Click the &lt;b&gt;Configuration&lt;/b&gt; tab.&lt;/li&gt;
&lt;li&gt;Click &lt;b&gt;Advanced Settings&lt;/b&gt; in the Software panel.&lt;/li&gt;
&lt;li&gt;Click the &lt;b&gt;Net&lt;/b&gt; tab.&lt;/li&gt;
&lt;li&gt;Edit the &lt;b&gt;Net.vmxnetThroughputWeight&lt;/b&gt; value to 128, then click OK.&lt;/li&gt;
&lt;li&gt;Reboot the virtual machine.&lt;/li&gt;
&lt;/ol&gt;
Details and performance results are provided in the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/specweb_perf_final.pdf"&gt;SPECweb paper&lt;/a&gt; .&lt;br /&gt;
&lt;h1&gt;Resources&lt;/h1&gt;
&lt;br /&gt;
Apache Best Practices&lt;br /&gt;
&lt;p /&gt;
IIS Best Practices</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">apache</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">iis</category>
      <pubDate>Tue, 27 May 2008 23:10:36 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5502</guid>
      <dc:date>2008-05-27T23:10:36Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>VMkernel Scheduler</title>
      <link>http://communities.vmware.com/docs/DOC-5501</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
Details on the ESX Server scheduler are commonly requested when I engage customers and partners.  People want to know more about how the scheduler works, when SMP should be used, and what the deal is with SMP co-scheduling.  This page will answer these questions and others as they arise in the forum or the discussion portion of this page.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Terminology and Architecture&lt;/h1&gt;
In VMware parlance, the monitor is the part of our products that provides a virtual interface to the guest operating systems.  The VMkernel is the part of our products that manages interactions with the devices, handles memory allocation, and schedules access to the CPU resources, among other things.  This is shown in the following figure. &lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3049/multi_mode_monitor.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3049/multi_mode_monitor.JPG" class="jive-image"  /&gt;&lt;br /&gt;
&lt;br /&gt;
This document will provide information on one part of the VMkernel: the scheduler.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Performance Scaling and the Scheduler&lt;/h1&gt;
It is a critical requirement for enterprise deployments that an operating system provide fast and fair access to the underlying resources.  As a critical part of this design, the scheduler has undergone countless engineer-years of development to guarantee that this requirement is met.  We've now released dozens of papers showing linear scaling of workloads as vCPU count is scaled up within a single VM and VM count is scaled up within a single host.  Here are a few such papers that contain supporting data.&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/Oracle_Scaling_in_ESX_Server.pdf"&gt;VM scaling&lt;/a&gt;  as demonstrated by Oracle databases.&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/db2_scalability_wp_vi3.pdf"&gt;VM and vCPU scaling&lt;/a&gt;  under IBM DB2 load.&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/SQLServerWorkloads.pdf"&gt;VM and vCPU scaling&lt;/a&gt;  with SQL Server running in the VM.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
The scheduler's ability to fairly scale up to and beyond totally committed CPU resources is no accident.  In fact, in a conversation I had with a QA manager I was assured that the VMkernel's scheduler would fairly distribute CPU resources to all VMs at least up to 4x CPU overcommitment.  Of course, on a system with the CPU over-committed by 4x each VM will only run at 1/4 native speed but the scheduler keeps the VMs running at that performance.  Not one at 1/8 speed, one at 1/10 speed, and another at 1/4 speed.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;SMP and the Scheduler&lt;/h1&gt;
As ESX Server supports uniprocessor (UP) and symmetric multiprocessor (SMP) VMs, the fair-and-fast requirement for the scheduler must be upheld in the presence of concurrently executing UP and SMP VMs.  Internal testing of this requirement shows fair scheduling even in the presence of concurrently executing 1-way, 2-way, and 4-way VMs.&lt;br /&gt;
&lt;br /&gt;
In fact, the ability to fairly execute under such environments is a very tricky problem for a scheduler.  We've run analysis on competitors' products and found that the ability to fairly balance differently-sized VMs is something of which ESX Server alone is capable.  Stay tuned in the coming months as we back this claim up with performance data. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Cell Size&lt;/h2&gt;
&lt;br /&gt;
One construct that assists the scheduler in optimally placing VMs on a heavily utilized system is a cell.  A cell is a logical grouping of a subset of CPU cores in the system.  In ESX 3 versions the cell size is equal to four.  Since the cell is statically assigned to physical cores, this means that each four-core processor is in exactly one cell.  When only dual-core processors are present, a cell is comprised of two sockets.  The most important thing to know about cells is the following:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;&lt;i&gt;A VM cannot span more than one cell.&lt;/i&gt;&lt;/blockquote&gt;
&lt;br /&gt;
This means that four-way VMs run on only one socket at a time in systems with quad-core CPUs.  For this case, the number of options presented to the scheduler is equal to the number of sockets.  In future versions of ESX we plan to increase the cell size to eight.  In some cases (such as systems with hexa-core CPUs) a modification of the cell size can improve performance.  See &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1007361"&gt;KB article 1007361&lt;/a&gt; for more information. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;UP or SMP?&lt;/h2&gt;
When and if to use SMP is a common question from VMware users.  The simple answer to this is to only use SMP when needed.  Why only use SMP when needed?  There are two reasons:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;SMP schedulers are less efficient than UP schedulers.  This is a simple experiment that can be confirmed with trivial benchmarks like Netperf or Passmark.  On UP systems (either virtual or native) the UP hardware abstraction layer (HAL) will provide marginally better results than the SMP HAL.&lt;/li&gt;
&lt;li&gt;Even when unused, virtualization of idle vCPUs requires resources by the kernel.  Memory is needed to maintain data structures and CPU resources are needed to virtualize the idle system.  The amount of work needed to support an idle CPU varies greatly but usually is in the realm of 1-2% of a single CPU core.&lt;/li&gt;
&lt;li&gt;The work required to deliver timer interrupts increases quadratically with the number of vCPUs, like RHEL5, the number of timing interrupts delivered by the VMkernel can be quite high.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5252"&gt;Red Hat Enterprise Linux&lt;/a&gt;  for more information on this issue.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h2&gt;What About Co-scheduling? &lt;/h2&gt;
Back in the days of ESX Server 2.5, SMP VMs had to have their vCPUs co-scheduled at the same instant to begin running.  Because only 2-way VMs were supported at this time, that meant that two CPU cores had to be available simultaneously to launch a 2-way VM.  On a server with a total of only two cores, this meant that the VM could not be launched concurrently with any other process on the server.  This would include the service console, the web interface, or any other process.&lt;br /&gt;
&lt;br /&gt;
This requirement was reduced in ESX Server 3.0 through a process called relaxed co-scheduling.  Effectively SMP VMs can have their vCPUs scheduled at slightly different times and idle vCPUs didn't necessarily have to be scheduled concurrently with running vCPUs.  More details on this are available in the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-4960"&gt;Co-scheduling SMP VMs in VMware ESX Server&lt;/a&gt; page.  &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;NUMA Considerations&lt;/h2&gt;
Support for non-uniform memory access (NUMA) architectures was introduced in ESX Server 2.  This meant that the scheduler became aware that memory was not uniform across each CPU.  Each CPU node had access to its own local memory and a larger pool of remote memory (which was divided as local memory for the other CPU nodes.)  Memory access to local memory is much faster than remote memory so the scheduler should favor the placement of processes on nodes that held the processes' memory.&lt;br /&gt;
&lt;br /&gt;
Subsequent generations of ESX Server continued to optimize for the use of NUMA memory.  This included placement of vCPUs next to needed memory and startup of VMs at NUMA nodes with resources available for execution.  All of this is transparently handled by the scheduler but it should be noted that the newer your version of ESX Server, the better its NUMA scheduling is.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">numa</category>
      <pubDate>Tue, 27 May 2008 22:56:02 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5501</guid>
      <dc:date>2008-05-27T22:56:02Z</dc:date>
      <clearspace:dateToText>1 year, 4 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>8</clearspace:replyCount>
    </item>
    <item>
      <title>Network Performance Analysis and Monitoring</title>
      <link>http://communities.vmware.com/docs/DOC-5500</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
This page is a living, up-to-date version of the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_analysis_methods_tn.pdf"&gt;performance analysis methods whitepaper&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Check Utilization&lt;/h1&gt;
esxtop will provide network information on the network screen which is displayed with the &amp;lsquo;n' key.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5500-5-3052/esxtop-network-main.JPG" alt="esxtop-network-main.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5500-5-3052/esxtop-network-main.JPG');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
The following properties of this screen are worth particular attention:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Each row represents one of several relevant network items on the server: a physical NIC (vmnicX), a virtual switch interface (vswifX), a VM (contains the VM name), the VMkernel network stack (vmk-tcpip-A.B.C.D), and others.&lt;/li&gt;
&lt;li&gt;The network items are organized by the virtual switch to which they are attached.  The virtual switch name is listed under the DNAME column.&lt;/li&gt;
&lt;li&gt;Network traffic on the hypervisor's iSCSI initiator will show up on the VMkernel network row which will contain the name "vmk-tcpip-A.B.C.D", where A.B.C.D is the VMkernel IP address.&lt;/li&gt;
&lt;li&gt;Network traffic on an iSCSI initiators that were configured in the guest will show up on the vNIC displayed using the VM's name on the network panel.&lt;/li&gt;
&lt;li&gt;Total throughput for each item can be observed by summing the total transmitted data (MbTX/s) and received data (MbRX/s) for each item.  As the physical hardware becomes saturated transmitted and received packets will start to be dropped (%DRPTX and %DRPRX, respectively) which, depending on protocol, may result in a retransmission at a later time.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Evaluate the Data&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Does the physical NIC's reported speed and duplex setting match the expectation of the hardware?  Hardware connectivity issues may result in a NIC autonegotiating to a lower speed or half duplex mode.&lt;/li&gt;
&lt;li&gt;Is there a significant load on the appropriate network items?  For instance, is a network-intensive load in a guest actually generating the network activity on its vNIC that is expected?  Are storage-intensive loads generating traffic on the vNIC or vmkNIC when the hypervisor or guest initiators are used?&lt;/li&gt;
&lt;li&gt;Verify that the network traffic is flowing on appropriate NICs. A typical ESX host may have network traffic generated by VMs, network traffic from iSCSI protocol, VMotion related network traffic and service console associated network activity. It is recommended to have to separate NICs to handle these different network packets.&lt;/li&gt;
&lt;li&gt;During periods of saturation, is the total throughput (MbTX/s summed with MbRX/s) matching expectations?  Either the guest or the other end of the communication link may be throttling the performance.&lt;/li&gt;
&lt;li&gt;Are packets being dropped?  When overworked the hardware will refuse packets which get reported as dropped transmitted (%DRPTX) and received (%DRPRX) packets.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Correct the System&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Make sure that the hardware is configured to run at its maximum capability.  This means verifying that 1 Gb NICs are not autonegotiating down to 100 Mb/s for having been connected to an older switch.  Similarly, ensure that NICs are running in full duplex mode.&lt;/li&gt;
&lt;li&gt;When network throughput seems lower than expected, apply traditional network diagnosis techniques to investigate every link in the connection.  Low throughput at the ESX Server is not necessarily due to server configuration.&lt;/li&gt;
&lt;li&gt;Verify that VMware Tools is installed on the guests and TSO, Jumbo Frames, and 10 Gb Ethernet are enabled, where possible.&lt;/li&gt;
&lt;li&gt;Bond multiple physical NICs to virtual switches with high utilization.&lt;/li&gt;
&lt;li&gt;Provide separate virtual switches their own physical NICs and separate network-intensive VMs on their own vSwitches.&lt;/li&gt;
&lt;li&gt;If VMs running on the same ESX Server communicate with each other, connect them to a dedicated virtual switch so that all network transfers occur in memory and not packets are shipped over the wire.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;VMFS and RDM Considerations&lt;/h1&gt;
ESX Server supports the mapping of physical LUNs to virtual machines via a method called raw device mapping (RDM).  RDM eliminates VMFS from the stack which is incorrectly believed to be a source of performance problems.  Removing VMFS reduces the total number of addressable LUNs, eliminates the ability to perform storage migrations (storage VMotion), and greatly increases the effort required for simplified maintenance activities provided by site recovery manager.  And the performance benefits derived from the removal of VMFS are negligible.&lt;br /&gt;
&lt;br /&gt;
See the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf"&gt;performance characteristics of VMFS and RDM whitepaper&lt;/a&gt; for more information on this subject.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Resources&lt;/h1&gt;
The top-level performance analysis page: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
VirtualCenter performance counters: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
esxtop performance counters: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <pubDate>Tue, 27 May 2008 21:05:20 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5500</guid>
      <dc:date>2008-05-27T21:05:20Z</dc:date>
      <clearspace:dateToText>1 year, 4 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Storage Performance Analysis and Monitoring</title>
      <link>http://communities.vmware.com/docs/DOC-5490</link>
      <description>&lt;br /&gt;
This document is a living, wiki version of the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_analysis_methods_tn.pdf"&gt;performance analysis methods whitepaper&lt;/a&gt; .  That document will ultimately be replaced with this one.&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
Storage often bounds the performance of enterprise workloads.  More so than CPU or memory performance investigation, traditional means of analysis continue to be sound for storage performance in virtual deployments.  This section will introduce the tools for identifying heavily-used resources and VMs that have high demands of their storage system.  Traditional correction methods will then apply.&lt;br /&gt;
&lt;br /&gt;
iSCSI storage using software initiators is not covered in this section.  When accessed through the hypervisor's iSCSI initiator or an in-guest initiator traffic will show up on the VMkernel network or the VM's network stack.  Check the Network section for more information.&lt;br /&gt;
&lt;h1&gt;Navigating esxtop&lt;/h1&gt;
As before, esxtop is the best place to start when investigating potential performance issues.  To view the disk adapter information in esxtop, hit the &amp;lsquo;d' key once it is running.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5490-10-3051/esxtop-disk-main.jpg" alt="esxtop-disk-main.jpg" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5490-10-3051/esxtop-disk-main.jpg');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
On ESX Server 3.5, the storage system can be displayed per VM (using &amp;lsquo;v') or per storage device (using &amp;lsquo;u').  But the same counters are displayed on each.  Look at the following items: &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;For each of the three storage views:
&lt;ul&gt;
&lt;li&gt;On the adapter view (&amp;lsquo;d'), each physical HBA is displayed on a row of its own with the appropriate adapter name.  This short name may be checked against the more descriptive data provided through the Virtual Infrastructure Client to identify the hardware type.&lt;/li&gt;
&lt;li&gt;On ESX Server 3.5's VM disk view (&amp;lsquo;v'), each row represents a group of worlds on the ESX Server.  Each VM will have its own row and rows will be displayed for the console, system, and other less-important (from a storage perspective) worlds.  The groups' IDs (GID) match those on the CPU screen and can be expanded by pressing &amp;lsquo;e'.&lt;/li&gt;
&lt;li&gt;On ESX Server 3.5's disk device view (&amp;lsquo;u'), each device is displayed on its own row.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;As with the other system screens, the disk displays can have groups expanded for more detailed information:
&lt;ul&gt;
&lt;li&gt;The HBAs listed on the adapter display can be expanded with the &amp;lsquo;E' key to show worlds that are using those HBAs.  By finding a VM's world ID the activity due to that world can be seen on the expanded line with the matching world ID (WID) column.&lt;/li&gt;
&lt;li&gt;The worlds for each VM can be displayed by expanding the VM row on the VM disk view with the &amp;lsquo;e' key.&lt;/li&gt;
&lt;li&gt;The disk devices on the device display can be expanded to show usage by each world on the host.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Relevant Counters&lt;/h1&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Type&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;VirtualCenter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;esxtop&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Details&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queued Disk Commands&lt;/td&gt;
&lt;td&gt;disk.queueLatency.average&lt;/td&gt;
&lt;td&gt;QUED&lt;/td&gt;
&lt;td&gt;Queued commands are queued in the kernel queue.  They are awaiting an open slot in the device driver queue.  A large number of queued commands means a heavily loaded storage system.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for information on queues.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queue Usage&lt;/td&gt;
&lt;td&gt;&lt;i&gt;Not available&lt;/i&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;%USD&lt;/td&gt;
&lt;td&gt;This counter tracks the percentage of the device driver queue that is in use.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for info on this queue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command Rate&lt;/td&gt;
&lt;td&gt;disk.commands.summation &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;ACTV&lt;/td&gt;
&lt;td&gt;VirtualCenter reports the number of commands that have been issued in the previous sample period.  esxtop provides a live look at the number of commands that are being processed at any one time.  Consider these counters a snapshot of activity.  But don't consider any number here "too much" until large queues start developing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HBA Load&lt;/td&gt;
&lt;td&gt;&lt;i&gt;Not available&lt;/i&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;LOAD&lt;/td&gt;
&lt;td&gt;In esxtop the LOAD counter tracks how full the device queues are.  Once LOAD exceeds one, commands will start to queue in the kernel.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for information on these queues.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage Device Latency&lt;/td&gt;
&lt;td&gt;disk.deviceReadLatency&lt;br /&gt;
&lt;br /&gt;
			disk.deviceWriteLatency&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;DAVG/cmd&lt;/td&gt;
&lt;td&gt;These counters track the latencies of the physical storage hardware.  This includes everything from the HBA to the platter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel Latency&lt;/td&gt;
&lt;td&gt;disk.kernelReadLatency&lt;br /&gt;
&lt;br /&gt;
			disk.kernelWriteLatency&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;KAVG/cmd&lt;/td&gt;
&lt;td&gt;These counters track the latencies due to the kernel's command processing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total Storage Latency&lt;/td&gt;
&lt;td&gt;&lt;i&gt;Not available&lt;/i&gt;&lt;/td&gt;
&lt;td&gt;GAVG/cmd&lt;/td&gt;
&lt;td&gt;This is the latency that the guest sees to the storage.  It is the um of the DAVG and KAVG stats.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aborts&lt;/td&gt;
&lt;td&gt;disk.commandsAborted.summation&lt;/td&gt;
&lt;td&gt;ABRTS/s&lt;/td&gt;
&lt;td&gt;These counters track SCSI aborts.  Aborts generally occur because the array is taking far too long to respond to commands.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;h1&gt;Evaluate the Data&lt;/h1&gt;
It is important to have a solid understanding of the storage architecture and equipment before attempting to analyze performance data.  Consider the following questions: &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Is the host or any of the guests swapping?  The guest's swap activity must be checked with traditional OS tools and the host can be checked with SWR/s and SWW/s counters detailed in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5430"&gt;Memory Performance Analysis and Monitoring&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Are commands being aborted?  This is a certain sign that the storage hardware is overloaded and unable to handle the requests in a manner in line with the host's expectations.  Corrective action could include hardware upgrades, storage redesign (increasing spindles on the RAID), or guest redesign.&lt;/li&gt;
&lt;li&gt;Is there a large queue?  While less dangerous than abortions, queued commands are similarly a sign that hardware upgrades or storage system redesign is necessary.&lt;/li&gt;
&lt;li&gt;Is the array responding at expected rates?  Storage vendors will provide latency statistics for their hardware that can be checked against the latency statistics in esxtop.  When the latency numbers are high, the hardware could be overworked by too many servers.  As examples, 2-5 ms latencies are usually a sign of a healthy storage system reading data on the array cache, 5-12 ms latencies reflecting a healthy storage architecture were data is being randomly read across the disk, and 15 ms latencies or greater possibly representing an over-utilized or misbehaving array.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h2&gt;Identifying a Slow Array &lt;/h2&gt;
&lt;p /&gt;
Its worth pausing at this moment to point out that 95% of all storage performance problems are not fixed in ESX.  Believe me, I (Scott) have been called into a dozen performance escalations where poor storage performance was blamed on the hypervisor and not a single one was being caused by ESX.  If you're seeing high latencies in VirtualCenter or esxtop to the storage device, its worth treating this problem as an array configuration issue.  Check ESX's logs for obvious storage errors, check array stats, and make sure that there are no fabric configuration problems.&lt;br /&gt;
&lt;p /&gt;
At the point of high storage latencies you shouldn't be using complex benchmarks to reproduce and solve this problem.  Go with Iometer and make certain you're doing an apples-to-apples comparison against a physical system (ideally dual-booted from the ESX server under test) to make sure of what your expected, non-virtual results are.  Check &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3961"&gt;Storage System Performance Analysis with Iometer&lt;/a&gt;  for information on using Iometer for problems like this.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Correct the System&lt;/h1&gt;
Corrections for these problems can include the following:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Reduce the guests and host's need for storage.
&lt;ol&gt;
&lt;li&gt;Some applications such as databases can utilize system memory to cache data and avoid disk access.  Check in the VMs to see if they may benefit from increased caches and provide more memory to the VM if resources permit.  This may reduce the burden on the storage system.&lt;/li&gt;
&lt;li&gt;Eliminate all possible swapping to reduce the burden on the storage system.  First verify that the VMs have the memory they need by checking swap statistics in the guest.  Provide memory if resources permit.  Next, as described in the "Memory" section of this paper, eliminate host swapping.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Configure the HBAs and RAID controllers for optimal use.  It may be worth reading &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for information on how disk queueing works.
&lt;ol&gt;
&lt;li&gt;Increase the number of outstanding disk requests for the VM by adjusting the "Disk.SchedNumReqOutstanding" parameter. For detailed instructions, check the "&lt;a class="jive-link-external" href="https://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_san_cfg.pdf#page=110"&gt;Equalizing Disk Access Between Virtual Machines&lt;/a&gt; " section in the "Fibre Channel SAN Configuration Guide".  This step and the following one must both be applied for either to work.&lt;/li&gt;
&lt;li&gt;Increase the queue depths for HBAs. Check the section "&lt;a class="jive-link-external" href="https://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_san_cfg.pdf#page=112"&gt;Setting Maximum Queue Depth for HBAs&lt;/a&gt; " in the "Fibre Channel SAN Configuration Guide" for detailed instructions.  Note that you have to set two variables to correctly change queue depths.  This step and the previous one must both be applied for either to work.&lt;/li&gt;
&lt;li&gt;Make sure the appropriate caching is enabled for the disk controllers.  You will need to the vendor provided tools to verify this.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;If latencies are high, inspect array performance using the vendor's array tools.  When too many servers simultaneously access common elements on an array the disks may have trouble keeping up.  Consider array-side improvements to increase throughput.&lt;/li&gt;
&lt;li&gt;Balance load across the physical resources that are available.
&lt;ol&gt;
&lt;li&gt;Spread heavily used storage across LUNs being accessed by different adapters.  The presence of separate queues for each adapter can yield some efficiency improvements.&lt;/li&gt;
&lt;li&gt;Use multi-pathing or multiple links in case the combined disk I/O is higher than a single HBA capacity.&lt;/li&gt;
&lt;li&gt;Using VMotion, migrate IO-intensive VMs across different ESX Servers, if possible.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Upgrade hardware, if possible.  Storage system performance often bottlenecks storage-intensive applications but for the very highest storage workloads (many tens of thousands of IOs per second) CPU upgrades at the ESX Server will increase the host's ability to handle IO.&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Resources&lt;/h1&gt;
Top-level performance analysis page: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt;  &lt;br /&gt;
&lt;br /&gt;
VirtualCenter performance counters: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
esxtop performance counters:  &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;a class="jive-link-external" href="https://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_san_cfg.pdf"&gt;Fibre Channel SAN Configuration Guide&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <pubDate>Tue, 27 May 2008 19:02:23 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5490</guid>
      <dc:date>2008-05-27T19:02:23Z</dc:date>
      <clearspace:dateToText>1 year, 3 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Memory Performance Analysis and Monitoring</title>
      <link>http://communities.vmware.com/docs/DOC-5430</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
This document is a living, up-to-date version of the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_analysis_methods_tn.pdf"&gt;performance analysis methods whitepaper&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
Host memory utilization represents the entirety of memory usage due to the VM and all tasks required by ESX Server to manage and provide control of the VMs.  Using ESX Server's monitoring capabilities there is no visibility into improper usage of configuration of memory within the guest.  Continue to use traditional monitoring tools in the guest to identify memory-hungry applications or shortages that lead to in-guest swapping.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Navigating esxtop&lt;/h1&gt;
As before, bring up esxtop to inspect system specifics.  Hitting the &amp;lsquo;m' key will display the memory counters.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5430-12-3050/esxtop-mem-main.JPG" alt="esxtop-mem-main.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5430-12-3050/esxtop-mem-main.JPG');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
Once running, the following can be observed from the esxtop report:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;The header data contains host data that impacts all VMs running on the host.  The physical memory row (PMEM) contains the total RAM installed on the system, the amount used by the console operating system (COS), the memory used by the kernel (VMK), and other statistics.&lt;/li&gt;
&lt;li&gt;The next few rows contains host-level memory statistics for various ESX subsystems:
&lt;ul&gt;
&lt;li&gt;VMKMEM: shows memory statistics for the ESX Server VMkernel&lt;/li&gt;
&lt;li&gt;COSMEM: displays the memory statistics as reported by the ESX Server service console.&lt;/li&gt;
&lt;li&gt;PSHARE: displays the ESX Server page-sharing statistics.&lt;/li&gt;
&lt;li&gt;SWAP: displays the ESX Server swap usage statistics.&lt;/li&gt;
&lt;li&gt;MEMCTL: displays the memory balloon driver statistics.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Relevant Counters &lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Type&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;VirtualCenter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;esxtop&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Details&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total memory size&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;MEMSZ&lt;/td&gt;
&lt;td&gt;The is the amount of memory that the VM has been sized to.  The VM will never get more than this but most of the time will be using far less than this amount due to sharing, ballooning, and swapping.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory target&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;SZTGT&lt;/td&gt;
&lt;td&gt;The amount of memory that the kernel would like to provide to the VM.  This number is calculated by on the guest's memory usage.  When memory is over-committed, it may not equal the amount of memory that is actually provided due to ballooning and swapping.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Granted memory&lt;/td&gt;
&lt;td&gt;mem.granted.average&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;The amount of memory that has been provided to the VM.  Memory is not granted to the VM until it has been touched once.  In the case of Linux, which does not zero out pages upon boot, a 4G VM will only be granted the small portion (100M or so) needed to run the OS until the OS or applications start to access more.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Touched memory&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;TCHD&lt;/td&gt;
&lt;td&gt;The amount of memory (in MB) that has been "touched" (read from or written to) in the past X minutes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consumed memory&lt;/td&gt;
&lt;td&gt;mem.consumed.average&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;The amount of machine memory allocated to the VM.  For instance, a Linux VM might have been sized to 4G.  Half of the pages may not yet have been used by the OS.  Perhaps 1G of this remaining 2G can be shared.  That leaves a consumed memory of only 1G.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared memory&lt;/td&gt;
&lt;td&gt;mem.shared.average&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;Shared memory represents the entire pool of shareable memory.  For instance, if two VMs each have 500M of identical memory, the shared memory is 1G.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared common memory&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.average&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;Shared common memory represents the footprint in machine memory as a result of memory sharing.  For instance, if two VMs each have 500M of identical memory, the shared common memory is 500M.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active memory&lt;/td&gt;
&lt;td&gt;mem.active.average &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;%ACTV, %ACTVS, %ACTVF&lt;/td&gt;
&lt;td&gt;The amount of memory (as a percentage of the entire host's memory) that has been used by the VM in the past sample period.  %ACTVS and %ACTVF are slow and fast counters showing recent and long-term averages.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ballon driver usage&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.average &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;MCTLSZ&lt;/td&gt;
&lt;td&gt;The amount of memory claimed by the balloon driver for us in other VMs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swap rate&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;SWW/s&lt;br /&gt;
&lt;br /&gt;
			SWR/s&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;The rates at which memory is swapped out (written) or in (read).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swap Totals&lt;/td&gt;
&lt;td&gt;mem.swapout.average,&lt;br /&gt;
			mem.swapin.average&lt;br /&gt;&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;These are cumulative amounts of swapping that has occurred since the VM was powered on.  It's important to check if swapin and swapout are increasing, rather than just seeing if they are nonzero.  Because if they are non-zero, it could be the result of swapping in the past, and not swapping at the present time.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NUMA migrations&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;NMIG&lt;/td&gt;
&lt;td&gt;The number of NUMA migrations that have occurred since the VM's creation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NUMA memory&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;NLMEM, NRMEM&lt;/td&gt;
&lt;td&gt;The amount of the VM's memory that is on the local and remote NUMA nodes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overhead&lt;/td&gt;
&lt;td&gt;mem.overhead.average &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;OVHD&lt;/td&gt;
&lt;td&gt;The amount of memory required by the VMkernel to maintain and execute the VM.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Evaluate the Data&lt;/h1&gt;
Memory analysis on an ESX Server means not just investigation of server-side statistics but also a solid understanding of the application that is running in the VM.  When memory is short on the host, ballooning and swapping may be visible in esxtop, with swapping having a great impact on performance.  When memory is short within the VM the guest will swap. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;How much memory are the VMs actually using?  While they may have been allocated large amounts of memory, its likely that the OS and applications are only using a small percentage of what the VM was assigned.  Check the active and touched memory counters for accurate numbers on guest memory usage.&lt;/li&gt;
&lt;li&gt;Is memory short in the host?  Swapping (SWW/s and SWR/s) is a certain sign of this problem.  Heavy use of the balloon driver may also suggest this but ballooning has a very slight impact to guest performance.&lt;/li&gt;
&lt;li&gt;Can memory deficiencies be addressed through VM resizing?  Checking memory usage of critical apps within the VMs can help inform decisions to decrease the amount of RAM provided to those VMs.  Some operating systems will expand to utilize all available memory at little or no value to the application.  Reducing the memory space and correcting over-sized caches frees up memory for other VMs.&lt;/li&gt;
&lt;li&gt;Is the collection of all VMs' active memory (TCHD or %ACTV) sustaining at an amount that exceeds the total available memory?  If so, then either more memory must be added to the host or VMs must be migrated to another DRS cluster.&lt;/li&gt;
&lt;li&gt;Are the guests swapping?  If the VM has been sized with too little memory then the guest OS will swap inside the VM.  This will appear to ESX Server as any other disk activity but should be investigated and solved with traditional OS analysis tools.&lt;/li&gt;
&lt;li&gt;Can NUMA migrations (NMIG) be seen on the system?  NMIG reports total migrations since the VM has been powered on.  If this number continues to climb then the VM is being migrated from node to node which most certainly degrades performance.&lt;/li&gt;
&lt;li&gt;Does the amount of memory located on a remote NUMA node (NRMEM) remain at a non-zero number?  This may be a sign that the VM has been sized to exceed the memory of a single NUMA node.  If the VM is using more memory than fits on a single node, some of its memory is certain to be located on a remote node.  Remote memory access is quite slow relative to local memory access.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Correct the System&lt;/h2&gt;
The prescriptive advice for memory shortages is fairly simple: use less memory or buy more.  The following recommendations are variations on this theme:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Verify that VMware Tools has been installed on every VM on the system and that the memory balloon driver has not been disabled.  (The balloon driver is always on by default and disabled manually through text-based advanced configuration in extremely rare cases.)  When provide the ability to balloon memory within the guests, ESX Server is able to take memory from VMs that are not using it and make it available to those that do need it.&lt;/li&gt;
&lt;li&gt;Provide more memory to the DRS cluster.  As total resources go up, VirtualCenter will balance VMs across the cluster so VMs that need the memory are able to get it.&lt;/li&gt;
&lt;li&gt;Set memory reservations to minimally provide the amount of memory required of the OS and critical applications.  This will allow for sustained, fast access for critical code and provide hints to VirtualCenter for optimal VM positioning across the DRS cluster.&lt;/li&gt;
&lt;li&gt;Make sure the amount of memory used by the VMkernel to maintain the VMs is acceptable.  This value, reported for each VM with the overhead counter (OVHD), is dependent on the memory size of the VM, the number of vCPUs provided to it, and whether or not it is executing a 64-bit OS.  Fewer VMs on the host, fewer aggregate vCPUs, and lower precision OSes (32-bit as opposed to 64-bit) will lower this number.  Reducing any of these in the cluster will free up resources for every VM in the cluster.&lt;/li&gt;
&lt;li&gt;Size VMs on NUMA systems to guarantee that each VM's memory will fit on a single node.  This means either decreasing the memory allocated to a VM or increasing the node memory size.&lt;/li&gt;
&lt;li&gt;Size guests appropriately according to their needs.  For example:
&lt;ol&gt;
&lt;li&gt;Depending on the access pattern of the data, databases may not benefit from the last doubling of cache size.  Experiment with smaller cache sizes and see if performance drops.  If not, decrease the VM's available memory so it can be used by other VMs.&lt;/li&gt;
&lt;li&gt;Check the guest OS's statistics for in-guest swapping.  Provide memory as its needed and pay attention to esxtop statistics to see if the additional memory provided generates a new bottleneck in the host.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h1&gt;Understanding Page Sharing&lt;/h1&gt;
One cannot fully optimize an ESX Server's memory without understanding the performance implications of page sharing.  VMware's page sharing algorithm was presented at EMC World 2008 as resulting in a 2% increase in CPU load.  But the benefits of page sharing have been demonstrated to provide overcommitment of memory safely to 2X and beyond.&lt;br /&gt;
&lt;br /&gt;
The value of page sharing can be seen int the following counters:&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;esxtop&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;VirtualCenter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHRD&lt;/td&gt;
&lt;td&gt;memory.shared&lt;/td&gt;
&lt;td&gt;The amount of memory in the VM that is sharable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHRDSVD&lt;/td&gt;
&lt;td&gt;&lt;i&gt;No equivalent.&lt;/i&gt;&lt;/td&gt;
&lt;td&gt;The amount of memory saved due to page sharing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i&gt;No equivalent.&lt;/i&gt;&lt;/td&gt;
&lt;td&gt;memory.sharedcommon&lt;/td&gt;
&lt;td&gt;The size of the memory after redundant pages have been removed.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
Note that missing counters can be calculated using the other two.  Shared memory minus shared common memory equals shared savings.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;References&lt;/h1&gt;
The top-level &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt; paper.&lt;br /&gt;
&lt;br /&gt;
The &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt; index.&lt;br /&gt;
&lt;br /&gt;
The &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt; page.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <pubDate>Fri, 23 May 2008 23:37:14 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5430</guid>
      <dc:date>2008-05-23T23:37:14Z</dc:date>
      <clearspace:dateToText>1 year, 1 month ago</clearspace:dateToText>
    </item>
    <item>
      <title>CPU Performance Analysis and Monitoring</title>
      <link>http://communities.vmware.com/docs/DOC-5420</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
CPU load is generated by the guest and its applications as well ESX Server as it provides a virtual interface to the hardware.  While the work performed by the host does result in some increase in load, the great majority of processing is due to the applications in the VM.  A solid understanding of the workload profile regardless of the virtual environment can assist CPU analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Check Utilization&lt;/h1&gt;
Invoke esxtop.  By default, it should show CPU utilization but pressing &amp;lsquo;c' will ensure this data is being displayed.  The following figure shows example data produced on a test system.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5420-6-2692/esxtop-cpu-main.JPG" alt="esxtop-cpu-main.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5420-6-2692/esxtop-cpu-main.JPG');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
Observe the following: &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;The PCPU(%) line in the header shows utilization for the processor(s) by core and in total.  The comma-delimited data first displayed shows core utilization followed by "used total" which averages utilization of all cores.&lt;/li&gt;
&lt;li&gt;The LCPU(%) line shows the percentage of CPU utilization per logical CPU. The percentages for the logical CPUs belonging to a package add up to 100 percent. This line appears only if hyperthreading is present and enabled.&lt;/li&gt;
&lt;li&gt;The CCPU(%) line shows the percentages of total CPU time as reported by the ESX Server service console. Use of any third party software, such as management agents and backup agents, inside the service console, may result in high CCPU(%) number.&lt;/li&gt;
&lt;li&gt;There is an idle world running whose %USED entry displays the amount of CPU cycles that remain unused.  If the idle world is reported at less than 100% utilization then only a fraction of one physical core remains for additional work.  As this number can max out at many hundreds of percentages (100% for each core) small numbers here represent heavily loaded systems.&lt;/li&gt;
&lt;li&gt;Check the utilization (%USED) of the interesting VMs.  The VMs are reported here with the names specified at their time of creation.  Like the idle row, utilization for each VM can exceed 100%.  A VM that was provided two vCPUs, as an example, can max out at 200% CPU utilization.&lt;/li&gt;
&lt;li&gt;Expand the group data for the VM that is most interesting.  This is done by hitting &amp;lsquo;e' and then entering the group ID number (GID) for the VM.  The figure below contains a CPU-expanded version for GID "30" in the previous figure.  Once expanded, esxtop will expand rows and provide counter data for every world in the group.  This includes:
&lt;ul&gt;
&lt;li&gt;vmmX:  For each vCPU provided to the VM, a virtual machine monitor (VMM) world is displayed.  This world will perform the majority of the work required to execute and virtualize the guest code (OS, application, and hypervisor).&lt;/li&gt;
&lt;li&gt;vcpu-X: A vcpu-X world is created to assist the VMM world for each vCPU.  Primarily this work revolves around the virtualization of the IO devices.&lt;/li&gt;
&lt;li&gt;mks: Mouse, keyboard, and screen interrupt servicing.&lt;/li&gt;
&lt;li&gt;vmware-vmx:  The VMX worlds assist in maintenance and communications with other worlds and should not represent a material portion of the group utilization.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5420-6-2693/esxtop-cpu-main-expanded.JPG" alt="esxtop-cpu-main-expanded.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5420-6-2693/esxtop-cpu-main-expanded.JPG');return false;"/&gt;&lt;br /&gt;
&lt;h1&gt;Evaluate the Data and Correct the System&lt;/h1&gt;
The general flow for evaluation starts by considering the system's load.  Is the system overloaded with too many VMs?  Is the guest using all of its vCPUs and simply requires more or faster processors?  Are all guests waiting for IO?  For example:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Check the PCPU(%) line to see if all cores' utilization is near 100%.  In this case the system is saturated.  If multiple VMs are competing for the CPUs, try to reduce the VMs on the system or find other means of decreasing the load on the system.  See "CPU Saturation of Host" below.&lt;/li&gt;
&lt;li&gt;See if the PCPU(%) line shows an unequal load across processor cores with some at saturation and some remaining near idle.  This would indicate applications within the VM utilizing all of the cores provided to them.  Increase its vCPU count, if possible, and verify that the guest is making use of the additional cores. If the application supports horizontal scalability, you may run multiple VMs to use the additional cores.  See "CPU Saturation of VM" below.&lt;/li&gt;
&lt;li&gt;If all CPUs remain underutilized, either the application in the VM is misconfigured or the VM is waiting for IO operations to complete.  See "Low CPU Utilization" below.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h2&gt;CPU Saturation of Host&lt;/h2&gt;
As stated above, both the PCPU(%) and %USED counters can be used to identify systems hosts that are using all physical CPUs.  It is possible, however, for the VMs on the system to be utilized nearly all of the processor cycles without actually requesting more that is available.  This near-saturation case is the sign of a heavily loaded system.&lt;br /&gt;
&lt;br /&gt;
A better sign of over-utilization on a host is ready time (%RDY).  When any world's ready time starts to climb, that world is spending the reported percentage of its time waiting for some CPU to become available for work.  Ready time above 10% is worth investigation and may be a sign of an over-utilized host.  For a more detailed discussion on ready time, see &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-7390"&gt;Ready Time&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Host saturation is a clear sign that too much work has been loaded onto a single server.  This is usually due to overly aggressive consolidation ratios.  Overcommiting CPU resources in this case will only worsen the performance. Consider the following remedies:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Verify that VMware Tools has been installed on every VM on the system.  In addition to many other benefits, VMware Tools provides a network driver (vmxnet) without which guest networking will be unnecessarily inefficient.&lt;/li&gt;
&lt;li&gt;Verify that the all systems in the DRS cluster are carrying load when the server of interest is overloaded.  If they aren't, increasing aggression of DRS algorithm and check VM reservations against other hosts in the cluster to ensure migrations will happen.  Lastly, increase the number of servers in the DRS cluster so VMs from this server can be migrated to servers with available resources.&lt;/li&gt;
&lt;li&gt;Increase the CPU resources available to the VMs by increasing or improving CPUs or cores on some of the systems in the DRS cluster.&lt;/li&gt;
&lt;li&gt;Set CPU reservations for the VMs that most need the processing power to guarantee that they get the CPU cycles they need.&lt;/li&gt;
&lt;li&gt;Ensure the newest version of ESX Server is being used.  The newer versions of ESX Server provide better efficiency and CPU-saving features such as TCP segmentation offload (TSO), large memory pages, and jumbo frames.&lt;/li&gt;
&lt;li&gt;Reduce the CPU resource footprint of running VMs.  As examples:
&lt;ol&gt;
&lt;li&gt;Decrease disk and or network activity for applications that cache data by increasing the amount of memory provided to the VM.  This may lower IO and reduce ESX Server's responsibility to virtualize the hardware.&lt;/li&gt;
&lt;li&gt;Assist CPU by replacing software I/O with dedicated hardware (such as iSCSI HBAs or TCP segmentation offload NICs).&lt;/li&gt;
&lt;li&gt;Reduce vCPU count for guests to only the number required to execute the workload.  For instance, a single-threaded application in a 4-way guest will only benefit from a single vCPU.  But the hypervisor's maintenance of the three idle vCPUs takes CPU cycles that could be used for other work.&lt;/li&gt;
&lt;li&gt;For VMs created using P2V conversion, analyze the VM resources as well as the applications running inside the VM. Stop the unnecessary services that may running inside the P2V'ed VM. Also reduce the number of vCPUs and memory count to only the number required to execute the workload.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
The easiest general comment for addressing CPU bottlenecks given correctly-configured VMs is to address processing power at the cluster level.  If VirtualCenter reports fully utilized CPUs for all hosts in the cluster, there is little possibility of avoiding a need to increase cluster resources or decrease VM count.&lt;br /&gt;
&lt;br /&gt;
One last nuance of virtual system tuning, mentioned in item 6c above, is the correct balancing of virtual CPU count.  Few applications fully utilize two or more vCPUs and many VMs are often committed to a special purpose with a single application.  The guest OS and the hypervisor must expend CPU cycles managing multiple vCPUs.  If the applications are not using them, the system efficiency as a whole will improve by reducing vCPU count for VMs.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;CPU Saturation of VM&lt;/h2&gt;
Like host CPU saturation, VM CPU saturation can be seen when the %USED for a VM is high.  Unlike host CPU saturation, the idle world may report a large amount of free computational resources and the VM's ready time (%RDY) may remain low.  This behavior can be seen when a single VM utilizes all of the processors allocated to it but additional CPUs remain unused on the host.  The VM's utilization of all of its vCPUs can be confirmed by expanding the VM's world on the CPU screen.  Once this has been confirmed, the following options are available:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Verify that VMware Tools has been installed on every VM on the system.  In addition to many other benefits, VMware Tools provides a network driver (vmxnet) without which guest networking will be unnecessarily inefficient.&lt;/li&gt;
&lt;li&gt;If possible, increase the number of vCPUs provided to the VM. As the application in the guest is successfully using all of its vCPUs, it may continue to scale as the vCPU count is increased.  Pay attention to the vmmX world for each vCPU after increasing vCPU count to verify that the VM is making use out of its newly provided resources.  As detailed in item 6c in the "CPU Saturation of Host" section, the addition of vCPUs imposes an overhead on the host whether they are being used or not.  So carefully assess the guest's needs to avoid unneeded vCPU count increases.&lt;/li&gt;
&lt;li&gt;If possible, you can power on multiple VMs running the same application. This will depend upon whether how well an application supports horizontal scalable configuration. It is possible that an application may perform better when running as multiple single vCPU Vms, rather a single SMP VM.&lt;/li&gt;
&lt;li&gt;Utilize faster processors.  As processor performance is continually increasing the option of upgrading processors or migrating the VM to systems with newer processors can provide more total throughput to the VM.&lt;/li&gt;
&lt;li&gt;Set CPU reservations for the VMs that most need the processing power to guarantee that they get the CPU cycles they need.&lt;/li&gt;
&lt;li&gt;Decrease the work as a result of running the VM.  As examples:
&lt;ol&gt;
&lt;li&gt;Decrease disk and or network activity for applications that cache data by increasing the amount of memory provided to the VM.  This may lower IO and reduce ESX Server's responsibility to virtualize the hardware.&lt;/li&gt;
&lt;li&gt;Assist CPU by replacing software I/O with dedicated hardware (such as iSCSI HBAs or TCP segmentation offload NICs).&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Low CPU Utilization&lt;/h2&gt;
Assuming performance problems have been confirmed, low CPU utilization is usually a sign of inefficiently designed datacenter architecture.  The design could be flawed in an individual VM or in the connectivity between various components.  The &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt;  will walk through investigation of system-level components such as memory and then system-wide components such as network and storage.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <pubDate>Fri, 23 May 2008 19:05:48 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5420</guid>
      <dc:date>2008-05-23T19:05:48Z</dc:date>
      <clearspace:dateToText>1 year, 2 months ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
    </item>
    <item>
      <title>Ubuntu</title>
      <link>http://communities.vmware.com/docs/DOC-5254</link>
      <description>&lt;br /&gt;
Ubuntu supports paravirtualization!&lt;br /&gt;
&lt;p /&gt;
I'll have more to say about Ubuntu later.  &lt;img class="jive-emoticon" border="0" src="http://communities.vmware.com/images/emoticons/happy.gif" alt=":)" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">ubuntu</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">paravirtualization</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">linux</category>
      <pubDate>Fri, 16 May 2008 22:13:21 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5254</guid>
      <dc:date>2008-05-16T22:13:21Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Windows Server 2003</title>
      <link>http://communities.vmware.com/docs/DOC-5253</link>
      <description>Please be patient while we build out this content.  If you have ideas for additions, mail me!&lt;br /&gt;
&lt;br /&gt;
Short list:&lt;br /&gt;
&lt;p /&gt;
&lt;ul&gt;
&lt;li&gt;Always install Service Pack 2.  Microsoft changed the interaction with the APIC to improve efficiency on virtual platforms.&lt;/li&gt;
&lt;li&gt;Read up on &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1730"&gt;idle loop behavior for Service Pack 1&lt;/a&gt; .&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">windows</category>
      <pubDate>Fri, 16 May 2008 22:04:43 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5253</guid>
      <dc:date>2008-05-16T22:04:43Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Red Hat Enterprise Linux</title>
      <link>http://communities.vmware.com/docs/DOC-5252</link>
      <description>Page under construction.&lt;br /&gt;
&lt;p /&gt;
Quick notes:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;RHEL5 always uses a &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3580"&gt;Linux Timer Rate&lt;/a&gt; timer rate of 1000 Hz.  This can be decreased in RHEL5.1 for greater efficiency of RHEL SMP guests.&lt;/li&gt;
&lt;li&gt;RedHat has not enabled VMI in their kernels, so out-of-the-box paravirtualization is not possible. But custom kernels can easily be built to take advantage of this feature. See &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1003644"&gt;VMware KB article #1003644&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">linux</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">redhat</category>
      <pubDate>Fri, 16 May 2008 21:55:13 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5252</guid>
      <dc:date>2008-05-16T21:55:13Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for Performance</title>
      <link>http://communities.vmware.com/docs/DOC-5251</link>
      <description>VMware and our partners have published a wide variety of white papers on best practices for installation and configuration of enterprise applications on VI3. We'll collect additional material on this page as we build this content out. Keep checking back!&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Operating Systems&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5253"&gt;Windows Server 2003&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;SUSE Linux Enterprise Server&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5252"&gt;Red Hat Enterprise Linux&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5254"&gt;Ubuntu&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h2&gt;Applications&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5502"&gt;Best Practices for Web Servers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5503"&gt;Best Practices for Apache&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5504"&gt;Best Practices for IIS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-8964"&gt;Best Practices for SQL Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9671"&gt;Best Practices for IBM Lotus Domino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <pubDate>Fri, 16 May 2008 21:35:33 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5251</guid>
      <dc:date>2008-05-16T21:35:33Z</dc:date>
      <clearspace:dateToText>7 months, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Understanding Performance</title>
      <link>http://communities.vmware.com/docs/DOC-5250</link>
      <description>&lt;br /&gt;
The following documents will explain some of the principles for virtual system performance. Please check back as we grow the number of articles here with time.&lt;br /&gt;
&lt;br /&gt;
ESX and Guest Operating Systems &lt;br /&gt;
&lt;p /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9882"&gt;ESX Monitor Modes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3580"&gt;Linux Timer Rate&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p /&gt;
CPU and Scheduling&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5101"&gt;Hyper-Threading on ESX Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-7390"&gt;Ready Time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5501"&gt;VMkernel Scheduler&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-4960"&gt;Co-scheduling SMP VMs in VMware ESX Server&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Memory&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6912"&gt;Large Memory Pages&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Network&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-10892"&gt;Advanced Networking Performance Options&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Storage &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9696"&gt;Storage Performance: VMFS and Protocols&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">vmfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">monitor</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">memory</category>
      <pubDate>Fri, 16 May 2008 21:30:03 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5250</guid>
      <dc:date>2008-05-16T21:30:03Z</dc:date>
      <clearspace:dateToText>1 month, 1 week ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
    </item>
    <item>
      <title>esxtop Performance Counters</title>
      <link>http://communities.vmware.com/docs/DOC-5240</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
This article contains a list of some of the performance counters provided by esxtop. This is far from exhaustive, as this list was created to answer the question: "which are the most important esxtop counters?"  Recently VMware has published an exhaustive list of esxtop information on &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9279"&gt;Interpreting esxtop Statistics&lt;/a&gt;.  Check that out for more information.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;CPU Counters&lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Counter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%RDY&lt;/td&gt;
&lt;td&gt;The percentage of time that the world or group is waiting a processor to be available to execute its workload.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%USED&lt;/td&gt;
&lt;td&gt;The percentage of CPU that is used by that world or group.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GID&lt;/td&gt;
&lt;td&gt;Group ID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NWLD&lt;/td&gt;
&lt;td&gt;The number of worlds in the group. When this number is greater than one, the row can be expanded to get information on each world.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;h2&gt;Memory Counters&lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Counter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%ACTV&lt;/td&gt;
&lt;td&gt;Instantaneous view of the percentage of memory pages that have been used by the VM in the previous seconds. Unlike TCHD which counts pages by following working sets, %ACTV is a more frequently updated number that is based on a sample of the entire memory pool.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%ACTVS&lt;/td&gt;
&lt;td&gt;Slow moving average of the %ACTV counter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%ACTVF&lt;/td&gt;
&lt;td&gt;Fast moving average of the %ACTV counter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCTL?&lt;/td&gt;
&lt;td&gt;Set to "Y" when the balloon driver is active in the guest and "N" when not.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCTLSZ&lt;/td&gt;
&lt;td&gt;This counter reports the amount of memory that the balloon driver is currently holding for use by other VMs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MEMSZ&lt;/td&gt;
&lt;td&gt;The amount of memory (in MB) allocated to the VM at the time of its creation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NHN&lt;/td&gt;
&lt;td&gt;The NUMA home node. This is the node on which the VM is booted. Migrations that have occurred since the VM started running would result in this VM running on another node(s).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NMIG&lt;/td&gt;
&lt;td&gt;The number of NUMA node migrations since the VM was booted. ESX Server's scheduler should avoid NUMA migrations so if this number continues to climb during normal operations some tuning of the VMs may be required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NRMEM&lt;/td&gt;
&lt;td&gt;The amount of memory that exists on a remote NUMA node.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NLMEM&lt;/td&gt;
&lt;td&gt;The amount of memory that exists on the local NUMA node.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;N%L&lt;/td&gt;
&lt;td&gt;The percentage of the VM's memory that exists on the local NUMA node. N%L = NLMEM / (NRMEM+NLMEM)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OVHD&lt;/td&gt;
&lt;td&gt;The amount of memory used by the VMkernel to maintain and execute the VM.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHRD&lt;/td&gt;
&lt;td&gt;The amount of the VM's memory that is shared with other VMs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHRDSVD&lt;/td&gt;
&lt;td&gt;The amount of memory that was saved due to page sharing.  This number may be less than or equal to SHRD.  As one VM must always claim the single copy of a shared page, one VM with a shared page will not be able to claim savings.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWR/s&lt;/td&gt;
&lt;td&gt;The rate at which memory is being swapped in from disk.  High swap rates indicate a need for more memory in the cluster.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWW/s&lt;/td&gt;
&lt;td&gt;The rate at which memory is being swapped out to disk.  High swap rates indicate a need for more memory in the cluster.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TCHD&lt;/td&gt;
&lt;td&gt;The amount of memory (in MB) that has been touched (recently used) by the VM. In this case "recently" means within a minute or two.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h2&gt;Storage Counters&lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Counter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ABRTS/s&lt;/td&gt;
&lt;td&gt;The rate at which disk operations are being aborted. Abort commands are issued by the guest when the storage system has not responded within an acceptable amount of time (as defined by the guest OS or application.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACTV&lt;/td&gt;
&lt;td&gt;The number of IO operations that are currently active. This represents operations for which the host is processing and can serve as a snapshot view of storage activity. When this number hovers near zero, the storage system isn't being used. If is sustains non-zero numbers, the a constant interaction with the strorage is occurring.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAVG/cmd&lt;/td&gt;
&lt;td&gt;The average amount of time it takes a device (HBA, array, and everything in between) to service a single request.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GAVG/cmd&lt;/td&gt;
&lt;td&gt;The total latency seen from the VM when performing an IO operation. GAVG = DAVG+KAVG.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KAVG/cmd&lt;/td&gt;
&lt;td&gt;The average amount of time it takes ESX Server's VMkernel to service a disk operation. Since this number represents time spent by the CPU to manage IO and processors are orders of magnitude faster than disks, it should be much, much less DAVG.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QUED&lt;/td&gt;
&lt;td&gt;The number of IO operations that require processing but have not yet be addressed. Commands are queued and awaiting management by the kernel when the driver's active buffer is full (see ACTV). Occasionally a queue will form and result in a small, non-zero QUED number but any significant (double-digit) average of queued commands means the storage hardware is unable to keep up with the host's needs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;READS/s&lt;/td&gt;
&lt;td&gt;The number of disk reads per second.  READS/s + WRITES/s = IOPS.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WRITES/s&lt;/td&gt;
&lt;td&gt;The number of disk writes per second.  READS/s + WRITES/s = IOPS.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h2&gt;Network Counters&lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Counter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%DRPRX&lt;/td&gt;
&lt;td&gt;The percentage of packets that were dropped that was supposed to be received.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%DRPTX&lt;/td&gt;
&lt;td&gt;The percentage of packets that were dropped for which transmission was attempted.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MbRX/s&lt;/td&gt;
&lt;td&gt;The megabits per second that are received at the network item.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MbTX/s&lt;/td&gt;
&lt;td&gt;The megabits per second that are transmitted from the network item.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">memory</category>
      <pubDate>Fri, 16 May 2008 18:59:39 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5240</guid>
      <dc:date>2008-05-16T18:59:39Z</dc:date>
      <clearspace:dateToText>10 months, 4 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Understanding VirtualCenter Performance Statistics</title>
      <link>http://communities.vmware.com/docs/DOC-5230</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
VirtualCenter (VC) is the entry point for virtual platform management but is less frequently used for performance analysis than esxtop. On the surface, VC is insufficient for performance analysis. But this is not necessarily the case. The VirtualCenter performance counter collection is reduced by default to minimize the data maintained by VC's database. The performance counters maintained by VC can be modified and detailed analysis can be performed based on those counters. This document will provide details necessary for understanding and enabling VC's performance monitoring capabilities.&lt;br /&gt;
&lt;br /&gt;
Refer to the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt; for information on using these counters.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;VirtualCenter Statistic Archival&lt;/h1&gt;
Our stats infrastructure has a lot of counters but our documentation has traditionally been quite thin in terms of descriptions. I got so sick of asking what stats are available at what stats level that I decided to start this page. Obviously it needs to be made more readable, but hopefully it is a start.&lt;br /&gt;
&lt;br /&gt;
Remember that stats in VC are generally organized into 2 archival categories: &lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Not archived: these are the "real-time" (past-hour) stats, which are refreshed every 20 seconds, and are displayed for the past hour in the VI client. These stats are not stored in the database.&lt;/li&gt;
&lt;li&gt;Archived stats. These stats are aggregations (rollups) of the real-time stats. They are aggregated at different sampling intervals and stored in the database. We follow the MRTG standard.
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Past day&lt;/b&gt;: past day stats take the real-time stats and roll them up so that there is 1 data point for every 5 minutes. Thus, there are 12 data points per hour and 288 per day.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Past week&lt;/b&gt;: past week stats take the past day stats and roll them up so that there is 1 data point for every 30 minutes. Thus, there are 48 data points per day and 336 per week.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Past month&lt;/b&gt;: past month stats take the past week stats and roll them up so there is 1 data point per 2 hours. Thus, there are 12 data points per day and 360 per month (30-day month).&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Past year&lt;/b&gt;: past year stats take the past month stats and roll them up so there is 1 data point per day. Thus, there are 365 data points per year.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
The basic flow is this: an ESX host stores statistics at 20s granularity for a period of 1 hour. Therefore, using the Host Client one can view the stats for a host/VM for the past-hour, or one can view those stats using the VI client attached to VirtualCenter. ESX will also aggregate the statistics into past-day statistics and store them for up to 1 day. These past-day statistics are sent to VC periodically and then stored in the database. The database is responsible for periodically taking these past-day stats and rolling them up into 30-minute weekly stats, and then doing the same for converting the weekly stats to monthly stats, etc. Because past-day, past-week, and past-month stats are stored in the database, I call them "archived" stats.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;VirtualCenter Statistics Level&lt;/h1&gt;
Statistics level is a means of organizing statistics for archiving purposes.  Its worth noting that only stats levels one and two are useful for deployment performance monitoring and analysis.  Levels three and four provide granularity and visibility that is useful only for developers.&lt;br /&gt;
&lt;br /&gt;
The concept of "stats level" applies only to the archived stats: we only store a stat in the database if we are at the appropriate stats level for that particular statistic. Non-archived stats are unaffected by stats level. In other words, every metric listed below is collected at 20s granularity and stored on the ESX host for 1 hour. However, unless VC is set to the stats level appropriate to that statistic, we will not store the data in the database or rollup the stat into a past-day stat on the ESX host. You can specify the stats level independently for each of the archiving interval. In other words, you might want to store level 4 stats for up to 1 day, but level 3 stats for 1 week.&lt;br /&gt;
&lt;br /&gt;
In practice, we use stats level to vary the level of detail for statistics that are archived. At stats level 1, we have pretty coarse-grained stats, while stats level 4 contains very detailed statistics, and also includes statistics for various instances (e.g., for each NIC of a VM).&lt;br /&gt;
&lt;br /&gt;
There are 3 important calls that I often use for stats (please refer to the SDK documentation for more information):&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;QueryStatsByLevel: this tells you what stats are available at what stats level. This is what I used to generate the tables below.&lt;/li&gt;
&lt;li&gt;QueryAvailableMetrics: this tells you what stats are available for a given entity during a specified time period.&lt;/li&gt;
&lt;li&gt;QueryPerf: this call takes a QuerySpec as an input and collects the stats for the specified entity over the specified time interval.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Let me give a concrete example of stats level. &lt;br /&gt;
&lt;br /&gt;
Suppose I want to know the value of mem.consumed.maximum for a given VM. This is the maximum amount of machine memory allocated to a VM (including overhead memory) over a specified interval. As shown below, this is a "level 4" statistic. This means that if I've set the stats level to 4 for past-day stats and then formulate a QuerySpec that asks for the value of this data 20 minutes ago at "past-day" granularity (i.e., at 5-minute granularity), then I will get a value. If the stats level is 2 for past-day (5-minute granularity) statistics, however, then such a query will not return a value, because it is level-4 stat and only level-2/level-1 stats are being stored at 5-minute granularity. In contrast, even if the stats level is 1, then if I formulate a QuerySpec with 20s (i.e., "real-time" or "past hour") as the interval of collection, I will get this value, because this data is stored for up to one hour at 20s granularity no matter what the stats level.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Update Interval&lt;/h1&gt;
Understanding the update interval is a key component to understanding the performance statistics.  The Virtual Infrastructure Client (VIC) displays live stats at a 20s update frequency.  Archived stats are archived at their archive frequency.  This is key to understanding the relative amounts of data presented by VC.&lt;br /&gt;
&lt;br /&gt;
For instance, a ready time of 1,000ms in the VIC's live stats graph translates into 5% ready time (1,000 / 20,000.)  The same amount of ready time in a five minute archival frequency would be 15,000 ms.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Counter Index&lt;/h1&gt;
For a list of all counters, see the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5600"&gt;vCenter Performance Counters&lt;/a&gt; page.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <pubDate>Thu, 15 May 2008 19:15:58 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5230</guid>
      <dc:date>2008-05-15T19:15:58Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>Hyper-Threading on ESX Server</title>
      <link>http://communities.vmware.com/docs/DOC-5101</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
Those of us at VMware that regularly engaged with our field or directly with customers are often asked how Hyper-Threading impacts the performance of a system. I've now been asked this question enough times to have my canned "It Depends" response on the tip of my tongue for every conference I present at. I'm going to use this document to elaborate on that point a bit and provide a little more detail.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;What Is Hyper-Threading?&lt;/h1&gt;
&lt;a class="jive-link-external" href="http://www.intel.com/technology/platform-technology/hyper-threading/index.htm"&gt;Hyper-Threading&lt;/a&gt; is a technology included by Intel first in their Netburst line of parts. Hyper-Threaded processors present their individual processing cores to the system as if they are two processing cores. To use Intel's parlance, that means that each &lt;i&gt;physical&lt;/i&gt; core appears in the operating system as two &lt;i&gt;logical&lt;/i&gt; cores. While the OS can distinguish between a system that has two logical cores (i.e. a single physical core with Hyper-Threading enabled) and two physical cores, applications cannot. It is up the the OS's scheduler to choose if it wishes to use logical cores in the same manner as physical cores.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Is It Supported In ESX Server?&lt;/h1&gt;
Hyper-Threading (HT) has been supported in ESX Server since version 2. ESX Server's scheduler is aware of the presence of HT and treats logical cores differently from physical cores. Virtual CPUs (vCPUs) requesting resources are assigned first to physical cores until all physical cores are loaded. If there are additional vCPUs requesting CPU resources they will then be assigned to the additional logical cores. By this method HT has no impact on performance until more vCPUs are concurrently executing than there exist physical cores.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;How Does It Perform on ESX Server?&lt;/h1&gt;
Understanding HT performance on native systems is tricky enough. Try Googling "hyperthreading performance" and you'll discover a world of information on this feature. By faking the presence of another processing core, Hyper-Threading removes some SMP scheduling from the guest operating system which is then handled by the processor's thread scheduler. Since the processor can manage context switches between its threads much faster than the OS can, this means that often heavy parallelism in applications results in improved performance due to Hyper-Threading.&lt;br /&gt;
&lt;br /&gt;
The exact gains due to HT even on native systems is dependent on the workload. The industry has cited numbers that range from 0% to 40% gain when HT is enabled on supported processors. For the most part, HT improves performance nominally. There are a few cases where performance can slow down, but these are exceptions rather than the norm.&lt;br /&gt;
&lt;br /&gt;
The best generalizations we can provide about HT on ESX are:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Until you have more vCPUs requesting processing power than there are physical cores, HT cannot hurt and provides no value.&lt;/li&gt;
&lt;li&gt;Once you have more vCPUs requesting CPU than physical cores on the system, HT usually provides small gains.&lt;/li&gt;
&lt;li&gt;While very early versions ESX may have sub-optimally handled HT, since ESX Server 2.5.3 robust support of HT in the scheduler means that it should not hurt performance.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h1&gt;For More Information&lt;/h1&gt;
Hyper-Threading configuration options for performance optimization were provided in ESX Server 3 and beyond.  See the &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/vi3_301_201_resource_mgmt.pdf#page=123"&gt;Resource Management Guide (page 123)&lt;/a&gt; for more information.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <pubDate>Fri, 09 May 2008 21:31:11 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5101</guid>
      <dc:date>2008-05-09T21:31:11Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Using Perfmon for esxtop-based Performance Analysis</title>
      <link>http://communities.vmware.com/docs/DOC-5100</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
By now everyone knows that guest performance metrics are not reliably in virtual machines (&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5661"&gt;Guest-based Performance Measurement&lt;/a&gt;). Slight fluctuations in very small time measurements and a lack of knowledge of the hypervisor's activities produces misleading numbers. Windows' performance monitoring tool, Perfmon, suffers from the same problems. However, Perfmon remains highly valuable for virtual machine performance analysis.&lt;br /&gt;
&lt;br /&gt;
The &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_analysis_methods_tn.pdf"&gt;performance analysis methods document&lt;/a&gt; that was published early in 2008 provided tips on esxtop-based performance analysis. To record esxtop data, &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/vi3_301_201_resource_mgmt.pdf#page=159"&gt;page 159 of the resource management guide&lt;/a&gt; describes running esxtop in batch mode. But on ESX Server 3.5, running esxtop in batch mode with all counters enabled results in an incredibly large CSV file that cannot easily be parsed. But Perfmon can help with this process.&lt;br /&gt;
&lt;br /&gt;
esxtop was constructed so that its CSV-formatted batch output file can be readily consumed by Perfmon. This means that Perfrmon can be used for two key activities:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Quickly analyzing results.&lt;/li&gt;
&lt;li&gt;Generating smaller CSV files of a subset of the data that can be more easily consumed by other analysis tools (such as Microsoft Excel.)&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h1&gt;Quick Results Analysis&lt;/h1&gt;
esxtop's batch output CSV file can be opened and viewed in Perfmon with the following steps:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Transfer the CSV file to a Windows sytem.&lt;/li&gt;
&lt;li&gt;Launch Perfmon (Run: "perfmon".)&lt;/li&gt;
&lt;li&gt;Right click on the graph and select "Properties..." from the drop-down menu.&lt;/li&gt;
&lt;li&gt;Select the "Source" tab.&lt;/li&gt;
&lt;li&gt;Select the "Log files:" radio button from the "Data source" section.&lt;/li&gt;
&lt;li&gt;Click the "Add..." button.&lt;/li&gt;
&lt;li&gt;Browse to and select the CSV file created by esxtop and click "OK".&lt;/li&gt;
&lt;li&gt;Click the "Apply" button.&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Optionally:&lt;/i&gt; reduce the range of time over which the data will be displayed by using the sliders under the "Time Range" button.&lt;/li&gt;
&lt;li&gt;Click "OK".&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Once the data has been loaded into Perfmon you may select ESX performance counters and display them using Perfmon's graphing system just like normal Windows performance counters. Refer to the Perfmon documentation for instructions on doing this.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Pruning CSV File Data&lt;/h1&gt;
Because so many people use Microsoft Excel to analyze performance data but its row and column limitations are quickly exceeded when generating large esxtop batch files, a means of removing unneeded data will assist analysis. Once Perfrmon has been loaded with the esxtop data, it can be used to generate smaller CSV files that can be easily consumed by Excel. Follow these steps:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Start Perfmon and load the esxtop batch data as described above.&lt;/li&gt;
&lt;li&gt;Have the Perfmon graph display the data of interest (that you'd like to import into Excel.)&lt;/li&gt;
&lt;li&gt;On the graph, right click "Save Data As..."&lt;/li&gt;
&lt;li&gt;In the popup box, select "CSV" as type and save the file.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
This will save only the counters that were displayed in the graph.  By iteratively selecting subsets of counters and saving off individual CSV files it becomes possible to quickly build performance graphs in Excel using esxtop batch output data.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">perfmon</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <pubDate>Fri, 09 May 2008 16:48:20 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-5100</guid>
      <dc:date>2008-05-09T16:48:20Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Co-scheduling SMP VMs in VMware ESX Server</title>
      <link>http://communities.vmware.com/docs/DOC-4960</link>
      <description>&lt;h1&gt;Background&lt;/h1&gt;
VMware ESX Server efficiently manages a mix of uniprocessor and multiprocessor VMs, providing a rich set of controls for specifing both absolute and relative VM execution rates.  For general information on cpu scheduling controls and other resource management topics, please see the official VMware &lt;a class="jive-link-external" href="http://www.vmware.com/support/pubs/resource_management/vi_pubs_res_mgmt.html"&gt;Resource Management Guide&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
For a multiprocessor VM (also known as an "SMP VM"), it is important to present the guest OS and applications executing within the VM with the illusion that they are running on a dedicated physical multiprocessor.  ESX Server faithfully implements this illusion by supporting near-synchronous coscheduling of the virtual CPUs within a single multiprocessor VM.&lt;br /&gt;
&lt;br /&gt;
The term "coscheduling" refers to a technique used in concurrent systems for scheduling related processes to run on different processors at the same time.  This approach, alternately referred to as "gang scheduling", had historically been applied to running high-performance parallel applications, such as scientific computations.  VMware ESX Server pioneered a form of coscheduling that is optimized for running SMP VMs efficiently.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Motivation&lt;/h1&gt;
An operating system generally assumes that all of the processors it manages run at approximately the same rate.  This is certainly true in non-virtualized environments, where the OS manages physical processor hardware.  However, in a virtualized environment, the processors managed by a guest OS are actually virtual cpu abstractions scheduled by the hypervisor, which time-slices physical processors across multiple VMs.&lt;br /&gt;
&lt;br /&gt;
At any particular point in time, each virtual cpu (VCPU) may be scheduled, descheduled, preempted, or blocked waiting for some event.  Without coscheduling, the VCPUs associated with an SMP VM would be scheduled independently, breaking the guest's assumptions regarding uniform progress.  We use the term "skew" to refer to the difference in execution rates between two or more VCPUs associated with an SMP VM.&lt;br /&gt;
&lt;br /&gt;
Inter-VCPU skew violates the assumptions of guest software. Non-trivial skew can result in severe performance problems, and may even induce failures when the guest expects inter-VCPU operations to complete quickly.  Let's first consider the performance implications of skew.  Guest OS kernels typically use spin locks for interprocessor synchronization.  If the VCPU currently holding a lock is descheduled, then the other VCPUs in the same VM will waste time busy-waiting until the lock is released.  Similar performance problems can also occur in multi-threaded&lt;br /&gt;
user-mode applications, which may also synchronize using locks or barriers.  Unequal VCPU progress will also confuse the guest OS cpu scheduler, which attempts to balance load across VCPUs.&lt;br /&gt;
&lt;br /&gt;
An extreme form of this performance problem may also lead to correctness issues.  For example, a guest kernel may perform inter-processor operations, such as TLB shootdowns, that are expected to complete quickly on physical hardware (e.g. several microseconds).  The guest OS may timeout if it finds that such operations have not completed after an unreasonably long period of time (e.g. several milliseconds).  Without coscheduling, we have observed this behavior in practice for several different guest operating systems, including Windows BSODs, and Linux kernel panics.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Strict coscheduling in ESX Server 2.x&lt;/h1&gt;
VMware introduced support for running SMP VMs with the release of ESX Server 2 in 2003.  ESX Server 2.x implemented coscheduling using an approach based on skew detection and enforcement.&lt;br /&gt;
&lt;br /&gt;
The ESX scheduler maintains a fine-grained cumulative skew value for each VCPU within an SMP VM.  A VCPU is considered to be making progress when it is running or idling.  A VCPU's skew increases when it is not making progress while at least one of its sibling VCPUs is making progress.  A VCPU is considered to be "skewed" if its cumulative skew value exceeds a configurable threshold, typically a few milliseconds.&lt;br /&gt;
&lt;br /&gt;
Once any VCPU is skewed, all of its sibling VCPUs within the same SMP VM are forcibly descheduled ("co-stopped") to prevent additional skew.  After a VM has been co-stopped, the next time any VCPU is scheduled, all of its sibling VCPUs must also be scheduled ("co-started").  This approach is called "strict" coscheduling, since all VCPUs must be scheduled simultaneously after skew has been detected.&lt;br /&gt;
&lt;br /&gt;
In some situations, such as when the physical machine has few cores, and is running a mix of UP and SMP VMs, coscheduling may incur "fragmentation" overhead.  For example, consider an ESX Server with two physical cores running one dual-VCPU VM and one single-VCPU VM.  When the UP VM is running, the scheduler cannot use the remaining physical core to run just one of the SMP VM's two VCPUs.  This effect is typically negligible in systems with larger numbers of cores (or with hyperthreading enabled), due to the increased flexibility available when mapping VCPUs to hardware execution contexts.&lt;br /&gt;
&lt;br /&gt;
Note that a VCPU executing in the guest OS idle loop can be descheduled without affecting coscheduling, since the guest OS can't tell the difference.  In other words, an idle VCPU does not accumulate skew, and is treated as if it were running for coscheduling purposes.  This optimization ensures that idle guest VCPUs don't waste physical processor resources, which can instead be allocated to other VMs.  For example, an ESX Server with two physical cores may be running one VCPU each from two different VMs, if their sibling VCPUs are idling, without incurring any coscheduling overhead.  Similarly, in the fragmentation example above, if one of the SMP VM's VCPU is idling, then there will be no coscheduling fragmentation, since its sibling VCPU can be scheduled concurrently with the UP VM.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Relaxed coscheduling in ESX Server 3.x&lt;/h1&gt;
The coscheduling algorithm employed by the ESX scheduler was significantly enhanced with the release of ESX Server 3 in 2006.  The basic coscheduling approach is still based on skew detection and enforcement.&lt;br /&gt;
&lt;br /&gt;
However, instead of requiring all VCPUs to be co-started, only those VCPUs that are skewed must be co-started.  This ensures that when any VCPU is scheduled, all other VCPUs that are "behind" will also be scheduled, reducing skew.  This approach is called "relaxed" coscheduling, since only a subset of a VM's VCPUs must be scheduled simultaneously after skew has been detected.&lt;br /&gt;
&lt;br /&gt;
To be more precise, suppose an SMP VM consists of multiple VCPUs, including VCPUs A, B, and C.  Suppose VCPU A is skewed, but VCPUs B and C are not skewed.  Since VCPU A is skewed, VCPU B can be scheduled to run only if VCPU A is also co-started.  This ensures that the skew between A and B will be reduced.  But note that VCPU C need not be co-started to run VCPU B.  As an optimization, the ESX scheduler will still try to co-start VCPU C opportunistically, but will not require this as a precondition for running VCPU B.&lt;br /&gt;
&lt;br /&gt;
Relaxed coscheduling significantly reduces the possibility of coscheduling fragmentation, improving overall processor utilization.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Conclusions&lt;/h1&gt;
ESX Server employs sophisticated cpu scheduling algorithms that enforce rate-based quality-of-service for both uniprocessor and multiprocessor VMs.  For multiprocessor VMs, coscheduling techniques ensure that virtual CPUs make uniform progress, faithfully implementing the illusion that the VM is running on dedicated multiprocessor hardware, ensuring efficient execution of guest software.  Optimizations such as relaxed coscheduling and descheduling idle VCPUs provide a high-performance execution environment that efficiently utilizes physical host resources.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Appendix: ESX Server coscheduling statistics&lt;/h1&gt;
ESX Server 3.x exports statistics related to the coscheduling behavior of multiprocessor VMs.  The "esxtop" utility can be used to examine these statistics on a live ESX system.&lt;br /&gt;
&lt;br /&gt;
The %CSTP column in the CPU statistics panel shows the fraction of time the VCPUs of a VM spent in the "co-stopped" state, waiting to be "co-started". This gives an indication of the coscheduling overhead incurred by the VM.  If this value is low, then any performance problems should be attributed to other issues, and not to the coscheduling of the VM's virtual cpus.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <pubDate>Fri, 02 May 2008 15:12:56 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-4960</guid>
      <dc:date>2008-05-02T15:12:56Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
      <clearspace:replyCount>5</clearspace:replyCount>
    </item>
    <item>
      <title>Storage System Performance Analysis with Iometer</title>
      <link>http://communities.vmware.com/docs/DOC-3961</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
&lt;a class="jive-link-external" href="http://www.iometer.org/"&gt;Iometer&lt;/a&gt;  is an open source tool originally developed by Intel that remains the simplest and best means of generating load on a system for performance analysis.  Because of minute fluctuations in the in-guest timers many in-guest benchmarks are unable to produce accurate results.  We at VMware have used Iometer for years and find its results to be accurate in most situations.  But see the disclaimer below for a word of caution.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-3961-8-2234/iometer_1.JPG" alt="iometer_1.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-3961-8-2234/iometer_1.JPG');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Figure 1.  Iometer with disk targets tab selected.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Figure 1 shows the topology and disk target tab for a particular VM.  As the manager only has one host visible beneath it and only one worker is present on another host, you can tell that this is a single server with only one processor.  Had Iometer been started on a 4-way box, four workers would be visible.  As the UI remains operational, dynamo workers can be connected from other hosts to drive load across multiple VMs that may be on multiple servers.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Configuring the Test &lt;/h1&gt;
For all Iometer tests, under "Disk Targets" always increase the "# of Outstanding I/Os" per target.  When left at the default value of &amp;lsquo;1', a relative low load will be placed on the array.  By increasing this number some the OS will queue up multiple requests and really saturate the storage.  The ideal number of outstanding IOs can be determined by running the test multiple times and increasing this number all the while.  At some point IOPS will stop increasing.  Generally an increase in return diminishes around 16 IOs/target but certainly more than 32 IOs/target will have no value due to the default queue depth in ESX.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for more information on queues.  In Figure 1 you can see that "# of Outstanding I/Os defaults to 1."&lt;br /&gt;
&lt;br /&gt;
When choosing the system to test in the topology frame, the "Disk Targets" tab will provide options as to the storage target.  The options here include formatted disks (yellow) or unformatted disks (blue).  In the former case Iometer address the storage through the OS's file system (FS).  In the latter, direct calls are made to the hardware without using a FS.  Storage specialists are usually more interested in just the hardware so evaluation of unformatted LUNs (blue) is preferable.  There is some cost of virtualizing the OS's interface to the disk through the FS so formatting the disk with the correct FS and testing the yellow target can be helpful.  Figure 1 shows two testable drives: the yellow-iconed C drive on which the OS was installed and the blue-iconed unformatted drive that is preferable for benchmarking. &lt;br /&gt;
&lt;br /&gt;
Always make certain that the "maximum disk size" in the "Disk Targets" tab is larger then the available memory!  For instance, when testing a formatted disk, setting the maximum size to 200,000 sectors (or 100 MB) could be cached by the guest OSes in a VM provided 1 GB of RAM.  In this case all Iometer calls to storage will be intercepted and cached by the guest, host, or storage cache.  Setting the disk maximum disk size to a number at least four times greater than the memory available in the largest cache will avoid caching.&lt;br /&gt;
&lt;br /&gt;
Under the "Access Specifications" tab, choose a workload that matches the most interesting profile.  Real workloads that are dominated by database performance often randomly read and write small, fixed-size block IO.  SQL Server on Windows, for instance, uses 16K blocks, 66% read (which implies 34% write), and 100% random (thus 0% sequential).  Exchange 2007 uses a similar profile but with an 8K block size.  Oracle on Linux has flexibility to use the block size set when the file system was created.  Depending on the DB specialists needs, this can range from 2K to 64K but will again be random with a 2:1 read-to-write ratio.  Note: you can approximate this Linux performance on a Windows guest but do not run Iometer on Linux (see "Iometer On Linux" below.)&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Application&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Block Size&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Randomness&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Read/write Ratio&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exchange 2003&lt;/td&gt;
&lt;td&gt;4K&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;60% read (40% write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exchange 2007&lt;/td&gt;
&lt;td&gt;8K&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;55% read (45% write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL Server&lt;/td&gt;
&lt;td&gt;16K, 64K&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;66% read (34% write)&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;b&gt;Table 1.  Example Iometer profiles.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Note the number of workers that has been specified in under the manager.  This will default to one worker (thread) for each physical or virtual processor on the system.  In the event that Iometer is being used to compare native to virtual performance, make sure that the worker numbers match!  For instance, the work count will be one on a UP VM but four for the same native measurements if the system is quad-core.  Correct the native worker count by detaching workers.&lt;br /&gt;
&lt;br /&gt;
Before invoking the test, be aware of the potential impacts of data alignment.  VMware has demonstrated substantial differences in performance based on alignment of data on storage arrays in our &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/esx3_partition_align.pdf"&gt;partition alignment paper&lt;/a&gt;.   Make sure that partitions and virtual disks have been created by Virtual Center to guarantee that the partitions and files are properly aligned.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Disclaimer (or: When Not To Trust Iometer)&lt;/h1&gt;
As discussed in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5581"&gt;Time-based Measurements in Virtual Machines&lt;/a&gt;, the hypervisor may introduce some inaccuracy to in-guest time measurement.  The likelihood of measurement error increases as the server's load increases.  Generally servers that are using less than 30% of their available CPU resources can be trusted.  In the event that a large VM on a small server is driving all CPUs to high utilization, Iometer results may suffer from some inaccuracy.  This is rarely the case with Iometer runs.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Analyzing the Results&lt;/h1&gt;
As mentioned above, the results provided by Iometer tend to be trustworthy.  But for unimpeachable results use the analysis techniques provide by VMware.  See the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt;  that's already been dedicated to this subject.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">iometer</category>
      <pubDate>Thu, 27 Mar 2008 18:02:53 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-3961</guid>
      <dc:date>2008-03-27T18:02:53Z</dc:date>
      <clearspace:dateToText>1 year, 4 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Performance Monitoring and Analysis</title>
      <link>http://communities.vmware.com/docs/DOC-3930</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
A common question that crosses my desk is, "how do I analyze and correct performance?" There are a variety of techniques for doing this and many sources of information spread around the interweb, so I'm going to collect a few thoughts here.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Monitoring&lt;/h1&gt;
Guest-based performance monitoring is an inaccurate and unhelpful means of evaluating performance in virtual deployments.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5661"&gt;Guest-based Performance Measurement&lt;/a&gt;  for more inforamtion.  Monitoring and analysis of VMware ESX Server should be performed with esxtop and VirtualCenter. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;esxtop&lt;/h2&gt;
esxtop is the tried-and-true means of collecting every performance stat needed and making it available in a way that is conducive to analysis. The best source of information on launching esxtop can be found in the &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/vi3_301_201_resource_mgmt.pdf#page=159"&gt;Resource Management Guide (page 159&lt;/a&gt;). It's worth nothing that the directions in that guide came from ESX Server 3.0.1. Since then, the developers have been kind enough to simplify the process of including all performance counters. This is done with the "-a" switch. So,&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;esxtop -a -b &amp;gt; analysis.csv&lt;/blockquote&gt;
&lt;br /&gt;
runs esxtop in batch mode and prints all performance counters.  Let me give you the quick "do-s" and "don't-s" of esxtop:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;With esxtop on ESX Server 3.5 and newer, always include the "-a" option to display all counters.&lt;/li&gt;
&lt;li&gt;With esxtop on versions of ESX Server prior to 3.5 always follow the resource management guide to enable all counters.  The storage latency statistics, for instance, are not displayed by default!&lt;/li&gt;
&lt;li&gt;Always start your VMs before running esxtop in batch mode.  If you start them after starting "esxtop -b", then esxtop will only produce VM data based on the VMs that were running at the time of its start.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;VirtualCenter&lt;/h2&gt;
VirtualCenter doesn't provide all of the performance counters you might need to analyze performance. But, it provides more than you might think! The default setup for VirtualCenter performance counter collection is fairly minimal. But this can be expanded by reconfiguring VC's performance counter collection. This is done as follows:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;From the VI client, Administration-&amp;gt;VirtualCenter Management Server Configuration...&lt;/li&gt;
&lt;li&gt;On the left, click "Statistics".&lt;/li&gt;
&lt;li&gt;Now increase the stats level as you see fit.&lt;/li&gt;
&lt;/ol&gt;
More information on VC performance counters and archival can be found in the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt; wiki article.  A few things are worth considering before going crazy with this:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;You probably will never need level four stats. Those are mainly for debugging.&lt;/li&gt;
&lt;li&gt;Use the DB size estimator after you monkey around with these levels. The DBs can get quite large and you want to know this before it happens and your VC system performance suffers.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Analysis&lt;/h1&gt;
Performance analysis techniques were previously detailed in the performance analysis whitepaper.  That has been migrated to this wiki and new material will appear here.&lt;br /&gt;
&lt;br /&gt;
The following pages will provide guidance for identifying and correcting performance problems on ESX-based systems.  We recommend following each of these pages in the order they're presented here.&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Check and correct CPU utilization: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5420"&gt;CPU Performance Analysis and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Identify memory bottlenecks and remove: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5430"&gt;Memory Performance Analysis and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Characterize storage performance and correct: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5490"&gt;Storage Performance Analysis and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Understand and improve the network utilization profile: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5500"&gt;Network Performance Analysis and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Within each of these pages are techniques for using counters from VirtualCenter and esxtop.  Information on those counters is provided in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5600"&gt;vCenter Performance Counters&lt;/a&gt;   and &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt;, respectively.&lt;br /&gt;
&lt;br /&gt;
Also, note that, while useless in collecting performance data, Perfmon can help with analysis of large esxtop output files.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5100"&gt;Using Perfmon for esxtop-based Performance Analysis&lt;/a&gt;  for more information.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">troubleshooting</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">virtualcenter</category>
      <pubDate>Wed, 26 Mar 2008 20:12:44 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-3930</guid>
      <dc:date>2008-03-26T20:12:44Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Linux Timer Rate</title>
      <link>http://communities.vmware.com/docs/DOC-3580</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
A hardware timer is used by modern systems for a variety of fine-grained operations at the operating system level. VMware's virtualization platforms virtualize this timer in the kernel. Because the virtual timer provided to the VM is actually software, it is subject to the same resource restrictions as other processes. The busier the system the more the timer execution must contend with other hypervisor activities. There are two implications of this:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;When the system is very busy, the software timer may not execute as regularly and virtual time may fall behind.&lt;/li&gt;
&lt;li&gt;Depending on how frequently the OS wishes to be interrupted by the timer, the hypervisor must do different amount of work.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
From &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1420"&gt;VMware KB article #1420&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;Linux guest operating systems keep time by counting timer interrupts. Unpatched 2.4 and earlier kernels program the virtual system timer to request clock interrupts at 100Hz (100 interrupts per second). 2.6 kernels, on the other hand, request interrupts at 1000Hz - ten times as often. Some 2.4 kernels modified by distribution vendors to contain 2.6 features also request 1000Hz interrupts, or in some cases, interrupts at other rates, such as 512Hz. &lt;br clear="all" /&gt;	 Furthermore, an SMP-capable Linux kernel requests additional timer interrupts from the virtual local APIC timer. An SMP-capable kernel running on a one-CPU system generates twice as many total timer interrupts as the corresponding UP kernel, while such a kernel running on a two-CPU system requests three times as many. In general, an SMP-capable kernel running on &amp;lt;n&amp;gt; CPUs requests &amp;lt;n+1&amp;gt; times as many interrupts per second as a UP kernel. For example, an unmodified 2.6 Linux kernel running on a two-CPU virtual machine requests a total of 3000 clock interrupts per second. &lt;br clear="all" /&gt;	 When a guest asks for more than 1000 clock interrupts per second, it can be difficult for the virtual machine to keep up, especially if other applications are running on the host at the same time. This can cause the clock in the guest operating system to fall so far behind real time that it is unable to catch up. The overhead of delivering so many virtual clock interrupts can also hurt guest performance and increase host CPU consumption.&lt;/blockquote&gt;
The amount of work required to manage the virtual timer is greatest with &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5252"&gt;Red Hat Enterprise Linux&lt;/a&gt; 5 (RHEL5) SMP systems, which use a clock frequency of 1000 Hz and suffer from a multiplicative amount of work due to SMP support. For instance, the following table of timer interrupts was created on a 1000 Hz RHEL VM:&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;vCPU Count&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Interrupts/sec&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;6,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;72,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
So, the amount of work that needs to be done by the hypervisor increases dramatically with the addition of vCPUs.  In addition, decreasing the timer interrupt rate greatly decreases the work that needs to be done by the VMkernel to virtualize the timer.  In RHEL 5.1, a Linux kernel that enables reducing the timer rate was included. By adding the parameter "divider=10" to the boot parameters, the amount of work required of the VMkernel to virtualize the timer goes down by an order of magnitude.</description>
      <category domain="http://communities.vmware.com/tags?communityID=2629">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">linux</category>
      <category domain="http://communities.vmware.com/tags?communityID=2629">timer</category>
      <pubDate>Fri, 14 Mar 2008 16:30:54 GMT</pubDate>
      <guid>http://communities.vmware.com/docs/DOC-3580</guid>
      <dc:date>2008-03-14T16:30:54Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
      <clearspace:replyCount>6</clearspace:replyCount>
    </item>
  </channel>
</rss>

