<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:clearspace="http://www.jivesoftware.com/xmlns/clearspace/rss" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>VMware Communities : All Content - All Communities</title>
    <link>http://communities.vmware.com/index.jspa</link>
    <description>All Content in VMware Communities</description>
    <language>en</language>
    <pubDate>Wed, 04 Nov 2009 06:49:47 GMT</pubDate>
    <generator>Clearspace 1.10.12 (http://jivesoftware.com/products/clearspace/)</generator>
    <dc:date>2009-11-04T06:49:47Z</dc:date>
    <dc:language>en</dc:language>
    <item>
      <title>Advanced Networking Performance Options</title>
      <link>http://communities.vmware.com/docs/DOC-10892</link>
      <description>Some of the advanced networking options available in vSphere 4.0 are reviewed in this paper. Many of these options control trade-offs between latency, throughput, CPU utilization, and reliability (e.g., dropped packets). It is not possible to optimize all of these at the same time, so option defaults are chosen to be suitable for the vast majority of applications. These options are provided to meet the stricter requirements of other applications. Advanced options often have subtle side effects, or merely move an issue from one area to another. Therefore it is recommended that VMware Support be engaged before changing such options, especially for production machines.&lt;br /&gt;
&lt;br /&gt;
There are over 100 options that can be set under Configuration &amp;rarr; Advanced Settings &amp;rarr; Net. Of these, the ones listed below are most likely to be useful for tuning networking performance. Many of the others are for internal testing or enable unreliable features.&lt;br /&gt;
&lt;br /&gt;
All of the options listed here take integer values. For the &amp;ldquo;Boolean&amp;rdquo; ones only the default value is shown: 0 for &amp;ldquo;false&amp;rdquo;, and 1 for &amp;ldquo;true&amp;rdquo;. Other parameters are shown with their default, minimum, and maximum values.&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Parameter Name&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;(Default, Minimum, Maximum)&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MaxPortRxQueueLen&lt;/td&gt;
&lt;td&gt;(80, 1, 500)&lt;/td&gt;
&lt;td&gt;Maximum length of the Rx queue for virtual ports whose clients support queueing. Possibly should be increased if Rx packet drops are seen in the port connected to a VM. Relevant only for e1000 vNICs used with Fault Tolerance (FT) and VLANs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MaxNetifTxQueueLen&lt;/td&gt;
&lt;td&gt;(500, 1, 1000)&lt;/td&gt;
&lt;td&gt;Maximum length of the Tx queue for the physical NICs. Increase if Tx packet drops are seen in uplink port to the pNIC.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GuestTxCopyBreak&lt;/td&gt;
&lt;td&gt;(64, 60, 4294967295)&lt;/td&gt;
&lt;td&gt;Packet header transmits smaller than this in bytes will be copied rather than mapped. More security and functionality than performance implications.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VmxnetTxCopySize&lt;/td&gt;
&lt;td&gt;(256, 0, 4294967295)&lt;/td&gt;
&lt;td&gt;Transmits smaller than this in bytes will be copied rather than mapped. Copying costs CPU but puts lets pressure on the Tx queue and doesn&amp;rsquo;t require completion.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VmxnetWinUDPTxFullCopy&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;Enable full copy of Windows vmxnet UDP Tx packets. Might disable to save CPU, especially for jumbo frames, at the cost of risking more packet drops.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NetTxDontClusterSize&lt;/td&gt;
&lt;td&gt;(0, 0, 8192)&lt;/td&gt;
&lt;td&gt;Tx packet size (in bytes) smaller than this are transmitted immediately (coalescing options are over-ruled for these packets). Used to ensure good latency for small packets.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceTxTimeout&lt;/td&gt;
&lt;td&gt;(4000, 1, 4294967295)&lt;/td&gt;
&lt;td&gt;The coalesce timeout in micro-seconds, or effectively the maximum latency without transmitting. Smaller values can reduce the packet latency at the cost of CPU. Risky to go below 1000.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceDefaultOn&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;Enable dynamic coalescing. Disable to test if issues are related to coalescing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceHandlerPcpu&lt;/td&gt;
&lt;td&gt;(1, 0, 128)&lt;/td&gt;
&lt;td&gt;pCPU that coalesce timeout handler runs on. May be important to set this if VM CPU pinning is used.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceTxQDepthCap&lt;/td&gt;
&lt;td&gt;(40, 0, 80)&lt;/td&gt;
&lt;td&gt;Maximum number of &amp;ldquo;normalized&amp;rdquo; Tx packets to coalesce. Reduce if Tx coalescing appears to be too aggressive.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoalesceRxQDepthCap&lt;/td&gt;
&lt;td&gt;(40, 0, 80)&lt;/td&gt;
&lt;td&gt;Maximum number of &amp;ldquo;normalized&amp;rdquo; Rx packets to coalesce. Reduce if Rx coalescing appears to be too aggressive.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vmxnetThroughputWeight&lt;/td&gt;
&lt;td&gt;(0, 0, 255)&lt;/td&gt;
&lt;td&gt;How far to favor Tx throughput for vmxnet 2 &amp;#38; 3. &amp;ldquo;0&amp;rdquo; is dynamic, otherwise this is a weight where a lower value favors latency and a higher value favors throughput.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TcpipHeapSize&lt;/td&gt;
&lt;td&gt;(24, 24, 120)&lt;/td&gt;
&lt;td&gt;Initial size of the TCP/IP module heap in megabytes. May need to increase if there are many vmkernel connections (NFS, iSCSI, etc.).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TcpipDefLROMaxLength&lt;/td&gt;
&lt;td&gt;(16000, 1, 65535)&lt;/td&gt;
&lt;td&gt;Maximum length for the LRO aggregated packet for vmkernel connections. Increasing this reduces the number of acknowledgments, which improves efficiency but may increase latency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E1000TxZeroCopy&lt;/td&gt;
&lt;td&gt;(0)&lt;/td&gt;
&lt;td&gt;If disabled copy UDP or non-TSO Tx packets for e1000.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E1000TxTsoZeroCopy&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;If enabled do not copy TSO Tx packets for e1000.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E1000IntrCoalesce&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;Enable interrupt coalescing for e1000. Disabling can improve latency at the expense of CPU.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MaxPktRxListQueue&lt;/td&gt;
&lt;td&gt;(3500, 0, 200000)&lt;/td&gt;
&lt;td&gt;Maximum number of packets queued in vmkernel. Increasing this can reduce the number of dropped packets but at the cost of increased vmkernel memory and queuing latency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vmxnet3RSSHashCache&lt;/td&gt;
&lt;td&gt;(1)&lt;/td&gt;
&lt;td&gt;Enable RSS hash cache for vmxnet3 in Windows guests.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VmklnxLROEnabled&lt;/td&gt;
&lt;td&gt;(0)&lt;/td&gt;
&lt;td&gt;Enable large packets for recent Linux guests with vmxnet 2 &amp;#38; 3. Most likely to benefit hosts with small number of VMs with few sessions each, where each session has a heavy Rx load (more than 1 MB/sec). This is an experimental feature and has not been tested extensively.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VmklnxLROMaxAggr&lt;/td&gt;
&lt;td&gt;(6, 0, 24)&lt;/td&gt;
&lt;td&gt;Maximum aggregation count in number of packets for vmklinux LRO.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxi</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <pubDate>Wed, 14 Oct 2009 16:58:33 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-10892</guid>
      <dc:date>2009-10-14T16:58:33Z</dc:date>
      <clearspace:dateToText>2 weeks, 3 days ago</clearspace:dateToText>
    </item>
    <item>
      <title>Using vscsiStats for Storage Performance Analysis</title>
      <link>http://communities.vmware.com/docs/DOC-10095</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
esxtop is a great tool for performance analysis of all types.  However, with only latency and throughput statistics, esxtop will not provide the full picture of the storage profile.  Furthermore, esxtop only provides latency numbers for Fibre Channel and iSCSI storage.  Latency analysis of NFS traffic is not possible with esxtop.&lt;br /&gt;
&lt;br /&gt;
Since ESX 3.5, VMware has provided a tool specifically for profiling storage: vscsiStats.  vscsiStats collects and reports counters on storage activity.  Its data is collected at the virtual SCSI device level in the kernel.  This means that results are reported per VMDK (or RDM) irrespective of the underlying storage protocol.  The following data are reported in histogram form:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;IO size&lt;/li&gt;
&lt;li&gt;Seek distance&lt;/li&gt;
&lt;li&gt;Outstanding IOs&lt;/li&gt;
&lt;li&gt;Latency (in microseconds)&lt;/li&gt;
&lt;li&gt;More!&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Running vscsiStats&lt;/h1&gt;
vscsiStats collection and analysis requires two steps:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Start statistics collection.&lt;/li&gt;
&lt;li&gt;View accrued statistics.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Documentation on command-line parameters are available when running '/usr/lib/vmware/bin/vscsiStats -h'.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Starting and Stopping vscsiStats Collection&lt;/h2&gt;
The tool is started with the following command:&lt;br /&gt;
&lt;pre class="jive-pre"&gt;&lt;code class="jive-code jive-plain"&gt;/usr/lib/vmware/bin/vscsiStats -s -w &amp;lt;world_group_id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
This command starts the process that will accrue statistics.  The world group ID must be set to a running virtual machine.  The running VMs' IDs can be obtained by running '/usr/lib/vmware/bin/vscsiStats -l'.&lt;br /&gt;
&lt;br /&gt;
After about 30 minutes vscsiStats will stop running.  If the analysis is needed for a longer period, the start command should be repeated above in this window.  That will defer the timeout and termination by another 30 minutes.&lt;br /&gt;
&lt;br /&gt;
Since results are accrued and reported out in summary, the histograms will include data since collection was started.  To reset all counters to zero, run '/usr/lib/vmware/bin/vscsiStats -r'.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Viewing Statistics&lt;/h2&gt;
Counters are displayed by using the following command:&lt;br /&gt;
&lt;pre class="jive-pre"&gt;&lt;code class="jive-code jive-plain"&gt;/usr/lib/vmware/bin/vscsiStats -p &amp;lt;histo_type&amp;gt; [-c]
&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
The histogram type is used to specify either all of the statistics or one group of them.  Options include all, ioLength, seekDistance, outstandingIOs, latency, interarrival.&lt;br /&gt;
&lt;br /&gt;
Results can be produced in a more compact comma-delimited list by adding the optional "-c" above.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Using vscsiStats Results&lt;/h1&gt;
&lt;h2&gt;Use Case 1: Identifying Sequential IO&lt;/h2&gt;
Storage arrays can process sequential IO much faster than random IO.  You can therefore improve the performance of a sequential workload by placing it on a dedicated LUN to allow the array to optimize access.  vscsiStats can help you identify your sequential workloads even if you don't understand anything about the application in the VM.&lt;br /&gt;
&lt;br /&gt;
Take the following graph as example, which I generated by running '/usr/lib/vmware/bin/vscsiStats -p seekDistance':&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5910/random_write_histo.png" alt="random_write_histo.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5910/random_write_histo.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
This graph shows that most of the commands are being issued a great distance from the previous command.  It looks like all of the commands were 50,000 or more logical blocks away from the previous command.  When I looked at the raw data, I saw that over 99% of the commands were more than 128 blocks away from the previous command.  That's random access if I've ever seen it.  Here's the opposite example:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5908/sequential_write_histo.png" alt="sequential_write_histo.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5908/sequential_write_histo.png');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
In this case the logical block number (LBN) of each command is most frequently exactly one larger than the previous command.  That's the signature of a heavily sequential workload.  It shouldn't surprise you to learn that both of these profiles were generated by Iometer using random and sequential writes, respectively. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Use Case 2: Optimizing for IO Sizes&lt;/h2&gt;
The IO size is an important characteristic of storage profiles.  A variety of best practices have been provided by storage vendors to enable customers to tune their storage to a particular IO size.  As an example, it may make sense to optimize an array's stripe size to its average IO size.  vscsiStats can provide a histogram of IO sizes to help this process.  The following graph was generated by '/usr/lib/vmware/bin/vscsiStats -p ioLength':&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5911/io_size_4k.png" alt="io_size_4k.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5911/io_size_4k.png');return false;"/&gt; &lt;br /&gt;
&lt;br /&gt;
From these results I can see that about a quarter of the commands came in IOs smaller than 4k.  About half of the commands were sized to 4k commands.  The minute number of remaining IOs were larger than 4k.  This signature is common of a VMDK formatted to 4k blocks and supporting OS and application execution.  The storage array should be optimized for 4k blocks if this disk's performance is a priority. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Use Case 3: Storage Latency Analysis (Including NFS!)&lt;/h2&gt;
esxtop is a terrific tool for latency-based storage analysis.  Fibre Channel and iSCSI HBAs have device and kernel latencies in esxtop's storage panel.  Software iSCSI initiators will show up as vmhba32 (ESX 3.5 and earlier) and vmhba33 (ESX 4.0 and later.)  But esxtop does not provide latency statistics for NFS stores.&lt;br /&gt;
&lt;br /&gt;
Because vscsiStats collects its results where the guest interacts with the hypervisor, it is unaware of the storage implementation.  Latency statistics can be collected for all storage configurations with this tool.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5917/latency.png" alt="latency.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-10095-6-5917/latency.png');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
The above graph shows that the server in my office with a single direct-attached SCSI disk is performing as I would expect.  About half of all the operations are completing in under 5 ms.  The other half take 5-15 ms to complete.  A few commands took longer than 15 ms, but the number is so small that it doesn't concern me.  Similar results can be seen with NFS arrays.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;vscsiStats on ESXi&lt;/h1&gt;
vscsiStats can be installed on ESXi hosts after putting the host into tech support mode.  More information on this process is availalble on &lt;a class="jive-link-external" href="http://vpivot.com/2009/10/21/vscsistats-for-esxi/"&gt;Scott's blog on the subject on vPivot&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Additional Resources&lt;/h1&gt;
My colleagues Ajay Gulati, Chethan Kumar, and Irfan Ahmad presented at VPACT 09 &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-10104" title="This paper presents workload characterization study of three top-tier enterprise applications using VMware ESX server hypervisor. We further separate out different components (for example data, index and redo log in a database) of these workloads to understand their behavior in isolation.  We find that most workloads show highly random access patterns. Next, we study the impact of storage consolidation on workloads (both random and sequential) and their burstiness."&gt;Storage Workload Characterization and Consolidation in Virtualized Enviornments&lt;/a&gt;.  This paper serves as an excellent example of vscsiStats in action.&lt;br /&gt;
&lt;br /&gt;
I learned vscsiStats by reviewing Irfan's VMworld 2007 presentation (&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-10084" title="The presentation deck delivered by Irfan Ahman at VMworld 2007.  This details a powerful storage analysis tool that has been packaged since ESX 3.5."&gt;vscsiStats: Fast and Easy Disk Workload Characterization on VMware ESX Server&lt;/a&gt;) and playing with the tool.  Check out his presentation if you'd like more detail.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxi</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vscsistats</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">iscsi</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">nfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">san</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">nas</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmfs</category>
      <pubDate>Fri, 29 May 2009 22:28:27 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-10095</guid>
      <dc:date>2009-05-29T22:28:27Z</dc:date>
      <clearspace:dateToText>1 month, 1 day ago</clearspace:dateToText>
      <clearspace:replyCount>9</clearspace:replyCount>
    </item>
    <item>
      <title>Understanding Performance</title>
      <link>http://communities.vmware.com/docs/DOC-5250</link>
      <description>&lt;br /&gt;
The following documents will explain some of the principles for virtual system performance. Please check back as we grow the number of articles here with time.&lt;br /&gt;
&lt;br /&gt;
ESX and Guest Operating Systems &lt;br /&gt;
&lt;p /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9882"&gt;ESX Monitor Modes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3580"&gt;Linux Timer Rate&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p /&gt;
CPU and Scheduling&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5101"&gt;Hyper-Threading on ESX Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-7390"&gt;Ready Time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5501"&gt;VMkernel Scheduler&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-4960"&gt;Co-scheduling SMP VMs in VMware ESX Server&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Memory&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6912"&gt;Large Memory Pages&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Network&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-10892"&gt;Advanced Networking Performance Options&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Storage &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9696"&gt;Storage Performance: VMFS and Protocols&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">monitor</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">memory</category>
      <pubDate>Fri, 16 May 2008 21:30:03 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5250</guid>
      <dc:date>2008-05-16T21:30:03Z</dc:date>
      <clearspace:dateToText>1 month, 1 week ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
    </item>
    <item>
      <title>vCenter Performance Counters</title>
      <link>http://communities.vmware.com/docs/DOC-5600</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
The following table of vCenter (VC) performance counters lists the counters with a description of their purpose.  This page has been updated for vSphere 4, so the counter levels will differ slightly on older versions of VC.&lt;br /&gt;
&lt;br /&gt;
Remember, with the exception of ready time, statistic levels one and two are the only ones needed for 99% of the performance monitoring and analysis out there.  Don't spend many of your own cycles worrying about levels three and four!&lt;br /&gt;
&lt;br /&gt;
For information on enabling VC to display and archive these counters see the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding vCenter Performance Statistics&lt;/a&gt; article.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Understanding vCenter Measurement Windows&lt;/h1&gt;
Before you continue, you should know that all total count metrics reported by VC are reported over the sample window.  When you're looking at live stats, this sample window is 20 seconds.  When you're looking at archive stats, it will depend on the interval duration.  That duration could be five minutes, 30 minutes, two hours, or one day.&lt;br /&gt;
&lt;br /&gt;
This causes a lot of confusion when comparing esxtop results to live VC results to archived VC results.  As an example, ready time might be reported as 10% in esxtop.  In live VC results this amount of ready time would be reported as 2000 ms (10% of the 20s window.)  In one day archive results, the same number would be reported as 30,000 ms (10% of the five minute interval duration.)  All of these numbes reflect the same amount of ready time.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;CPU Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;cpu.ready.summation&lt;/td&gt;
&lt;td&gt;Ready time is the time spend waiting for CPU(s) to become available in the past update interval.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;cpu.usagemhz.average&lt;/td&gt;
&lt;td&gt;The CPU utilization.  The maximum possible value here is the frequency of the processors times the number of cores.  As an example, a VM using 4000 MHz  on a system with four 2 GHz processors is using 50% of the CPU (4000 / (4 * 2000) = 0.5)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;cpu.usage.average&lt;/td&gt;
&lt;td&gt;The CPU utilization.  This value is reported with 100% representing all processor cores on the system.  As an example, a 2-way VM using 50% of a four-core system is completely using two cores.&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;cpu.reservedCapacity.average&lt;/td&gt;
&lt;td&gt;CPU Reserved Capacity&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;cpu.idle.summation&lt;/td&gt;
&lt;td&gt;CPU Idle&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;cpu.swapwait.summation&lt;/td&gt;
&lt;td&gt;Swap wait time is time that the world spent waiting for memory to be swapped in.  When the VM is waiting for memory, it is not doing work.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.system.summation&lt;/td&gt;
&lt;td&gt;System time is the time spent in VMkernel during the last update interval.  This does not include guest code execution.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.wait.summation&lt;/td&gt;
&lt;td&gt;Wait time is the time spent waiting for hardware or VMkernel lock thread locks during the last update interval.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.extra.summation&lt;/td&gt;
&lt;td&gt;CPU extra is the time above the statically calculated entitlement. Entitlement is the share of processing time that a VM should get as a result of its vCPU count and assigned shares. &lt;i&gt;You should not use or care about this counter in any of your own analysis.&lt;/i&gt;&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.used.summation&lt;/td&gt;
&lt;td&gt;CPU Used&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;cpu.guaranteed.latest&lt;/td&gt;
&lt;td&gt;Guaranteed time is reported as the amount of the reservation time that the VM used in the past update interval.  As an example, if 2000 MHz have been reserved for the VM on an four-way, 2 GHz host, that's 25% of the CPU resource.  In a 20s update interval, there are 80,000 ms available on this four-way system.  That means 20,000 ms of time has been reserved.  If a VM used only half of its available cycles, the guaranteed time is 10,000 ms.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usage.none&lt;/td&gt;
&lt;td&gt;CPU Usage (None)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usage.minimum&lt;/td&gt;
&lt;td&gt;CPU Usage (Minimum)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usage.maximum&lt;/td&gt;
&lt;td&gt;CPU Usage (Maximum)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usagemhz.none&lt;/td&gt;
&lt;td&gt;CPU Usage in MHz (None)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usagemhz.minimum&lt;/td&gt;
&lt;td&gt;CPU Usage in MHz (Minimum)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;cpu.usagemhz.maximum&lt;/td&gt;
&lt;td&gt;CPU Usage in MHz (Maximum)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Memory Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.consumed.average&lt;/td&gt;
&lt;td&gt;The amount of machine memory that is in use by the VM. While a VM may&lt;br /&gt;
			have been configured to use 4 GB of RAM, as an example, it might have&lt;br /&gt;
			only touched half of that. Of the 2 GB left, half of that might be&lt;br /&gt;
			saved from memory sharing. That would result in 1 GB of consumed memory.&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.overhead.average&lt;/td&gt;
&lt;td&gt;The memory used by the VMkernel to maintain and execute the VM.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.swapinrate.average&lt;/td&gt;
&lt;td&gt;The swap in rate reports the rate at which a VM's memory is being swapped in from disk.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.swapoutrate.average&lt;/td&gt;
&lt;td&gt;The swap out rate reports the rate at which a VM's memory is being swapped out to disk.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.usage.average&lt;/td&gt;
&lt;td&gt;The percentage of memory used as a percent of all available machine memory.  Available for host and VM.&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.average&lt;/td&gt;
&lt;td&gt;The amount of memory currently claimed by the balloon driver. This is&lt;br /&gt;
			not a performance problem, per se, but represents the host starting to&lt;br /&gt;
			take memory from less needful VMs for those with large amounts of&lt;br /&gt;
			active memory. But if the host is ballooning, check swap rates (swapin&lt;br /&gt;
			and swapout) which would be indicative of performance problems.&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.granted.average&lt;/td&gt;
&lt;td&gt;The amount of memory that was granted to the VM by the host.  Memory is not granted to the host until it is touched one time and granted memory may be swapped out or ballooned away if the VMkernel needs the memory.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.active.average&lt;/td&gt;
&lt;td&gt;The amount of memory used by the VM in the past small window of time.  This is the "true" number of how much memory the VM currently has need of.  Additional, unused memory may be swapped out or ballooned with no impact to the guest's performance.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.shared.average&lt;/td&gt;
&lt;td&gt;The average amount of shared memory.  Shared memory represents the entire pool of memory from which sharing savings are possible.  The amount of memory that this has been condensed to is reported in shared common memory.  So, total saving due to memory sharing equals shared memory minus shared common memory.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.zero.average&lt;/td&gt;
&lt;td&gt;The amount of zero pages in the guest.  Zero pages are not represented in machine memory so this results in 100% savings when mapping from the guest to the machine memory.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.unreserved.average&lt;/td&gt;
&lt;td&gt;Memory Unreserved (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapused.average&lt;/td&gt;
&lt;td&gt;The amount of swap memory currently in use.  A large amount of swap memory is not a performance problem.  This could be memory that the guest doesn't need.  Check the swap rates (swapin, swapout) to see if the guest is actively in need of more memory than is available.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapunreserved.average&lt;/td&gt;
&lt;td&gt;Memory Swap Unreserved (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.average&lt;/td&gt;
&lt;td&gt;The average amount of shared common memory.  Shared memory represents the entire pool of memory from which sharing savings are possible.  The amount of memory that this has been condensed to is reported in shared common memory.  So, total saving due to memory sharing equals shared memory minus shared common memory.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.heap.average&lt;/td&gt;
&lt;td&gt;Memory Heap (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.heapfree.average&lt;/td&gt;
&lt;td&gt;Memory Heap Free (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.state.latest&lt;/td&gt;
&lt;td&gt;Memory State&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapped.average&lt;/td&gt;
&lt;td&gt;Memory Swapped (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swaptarget.average&lt;/td&gt;
&lt;td&gt;Memory Swap Target (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapin.average&lt;/td&gt;
&lt;td&gt;The rate at which memory is being swapped in from disk.  A large number here represents a problem with lack of memory and a clear indication that performance is suffering as a result.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.swapout.average&lt;/td&gt;
&lt;td&gt;The rate at which memory is being swapped out to disk.  A large number here represents a problem with lack of memory and a clear indication that performance is suffering as a result.&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.vmmemctltarget.average&lt;/td&gt;
&lt;td&gt;Memory Balloon Target (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.sysUsage.average&lt;/td&gt;
&lt;td&gt;Memory Used by vmkernel&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;mem.reservedCapacity.average&lt;/td&gt;
&lt;td&gt;Memory Reserved Capacity&lt;/td&gt;
&lt;td&gt;megaBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.usage.none&lt;/td&gt;
&lt;td&gt;Memory Usage (None)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.usage.minimum&lt;/td&gt;
&lt;td&gt;Memory Usage (Minimum)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.usage.maximum&lt;/td&gt;
&lt;td&gt;Memory Usage (Maximum)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.granted.none&lt;/td&gt;
&lt;td&gt;Memory Granted (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.granted.minimum&lt;/td&gt;
&lt;td&gt;Memory Granted (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.granted.maximum&lt;/td&gt;
&lt;td&gt;Memory Granted (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.active.none&lt;/td&gt;
&lt;td&gt;Memory Active (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.active.minimum&lt;/td&gt;
&lt;td&gt;Memory Active (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.active.maximum&lt;/td&gt;
&lt;td&gt;Memory Active (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.shared.none&lt;/td&gt;
&lt;td&gt;Memory Shared (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.shared.minimum&lt;/td&gt;
&lt;td&gt;Memory Shared (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.shared.maximum&lt;/td&gt;
&lt;td&gt;Memory Shared (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.zero.none&lt;/td&gt;
&lt;td&gt;Memory Zero (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.zero.minimum&lt;/td&gt;
&lt;td&gt;Memory Zero (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.zero.maximum&lt;/td&gt;
&lt;td&gt;Memory Zero (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.unreserved.none&lt;/td&gt;
&lt;td&gt;Memory Unreserved (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.unreserved.minimum&lt;/td&gt;
&lt;td&gt;Memory Unreserved (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.unreserved.maximum&lt;/td&gt;
&lt;td&gt;Memory Unreserved (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapused.none&lt;/td&gt;
&lt;td&gt;Memory Swap Used (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapused.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap Used (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapused.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap Used (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapunreserved.none&lt;/td&gt;
&lt;td&gt;Memory Swap Unreserved (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapunreserved.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap Unreserved (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapunreserved.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap Unreserved (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.none&lt;/td&gt;
&lt;td&gt;Memory Shared Common (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.minimum&lt;/td&gt;
&lt;td&gt;Memory Shared Common (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.maximum&lt;/td&gt;
&lt;td&gt;Memory Shared Common (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heap.none&lt;/td&gt;
&lt;td&gt;Memory Heap (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heap.minimum&lt;/td&gt;
&lt;td&gt;Memory Heap (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heap.maximum&lt;/td&gt;
&lt;td&gt;Memory Heap (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heapfree.none&lt;/td&gt;
&lt;td&gt;Memory Heap Free (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heapfree.minimum&lt;/td&gt;
&lt;td&gt;Memory Heap Free (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.heapfree.maximum&lt;/td&gt;
&lt;td&gt;Memory Heap Free (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapped.none&lt;/td&gt;
&lt;td&gt;Memory Swapped (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapped.minimum&lt;/td&gt;
&lt;td&gt;Memory Swapped (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapped.maximum&lt;/td&gt;
&lt;td&gt;Memory Swapped (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swaptarget.none&lt;/td&gt;
&lt;td&gt;Memory Swap Target (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swaptarget.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap Target (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swaptarget.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap Target (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapin.none&lt;/td&gt;
&lt;td&gt;Memory Swap In (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapin.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap In (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapin.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap In (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapout.none&lt;/td&gt;
&lt;td&gt;Memory Swap Out (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapout.minimum&lt;/td&gt;
&lt;td&gt;Memory Swap Out (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.swapout.maximum&lt;/td&gt;
&lt;td&gt;Memory Swap Out (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.none&lt;/td&gt;
&lt;td&gt;Memory Balloon (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.minimum&lt;/td&gt;
&lt;td&gt;Memory Balloon (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.maximum&lt;/td&gt;
&lt;td&gt;Memory Balloon (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctltarget.none&lt;/td&gt;
&lt;td&gt;Memory Balloon Target (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctltarget.minimum&lt;/td&gt;
&lt;td&gt;Memory Balloon Target (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.vmmemctltarget.maximum&lt;/td&gt;
&lt;td&gt;Memory Balloon Target (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.overhead.none&lt;/td&gt;
&lt;td&gt;Memory Overhead (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.overhead.minimum&lt;/td&gt;
&lt;td&gt;Memory Overhead (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.overhead.maximum&lt;/td&gt;
&lt;td&gt;Memory Overhead (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.consumed.none&lt;/td&gt;
&lt;td&gt;Memory Consumed (None)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.consumed.maximum&lt;/td&gt;
&lt;td&gt;Memory Consumed (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.consumed.minimum&lt;/td&gt;
&lt;td&gt;Memory Consumed (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sysUsage.none&lt;/td&gt;
&lt;td&gt;Memory Used by vmkernel&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sysUsage.maximum&lt;/td&gt;
&lt;td&gt;Memory Used by vmkernel&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;mem.sysUsage.minimum&lt;/td&gt;
&lt;td&gt;Memory Used by vmkernel&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Disk Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;disk.maxTotalLatency&lt;/td&gt;
&lt;td&gt;The highest reported total latency (device and kernel times) in the sample window.&lt;/td&gt;
&lt;td&gt;milliseconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;disk.usage.average&lt;/td&gt;
&lt;td&gt;Average disk throughput over the sample period.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.read.average&lt;/td&gt;
&lt;td&gt;Average disk throughput due to read operaitons over the sample period.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.write.average&lt;/td&gt;
&lt;td&gt;Average disk throughput due to write operations over the sample period.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.commands.summation&lt;/td&gt;
&lt;td&gt;Disk Commands Issued&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.commandsAborted.summation&lt;/td&gt;
&lt;td&gt;The number of aborts that have occurred in the last window of time. Abort commands are issued by the guest when the storage system has not responded within an acceptable amount of time (as defined by the guest OS or application.)&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.busResets.summation&lt;/td&gt;
&lt;td&gt;Disk Bus Resets&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.deviceReadLatency.average&lt;/td&gt;
&lt;td&gt;Device read latency.  This is the time the physical device from the HBA to the platter takes to service an IO request.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.kernelReadLatency.average&lt;/td&gt;
&lt;td&gt;Kernel read latency.  This is the time the VMkernel takes to service an IO.  This is the time between the guest OS and the device.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.totalReadLatency.average&lt;/td&gt;
&lt;td&gt;Total read latency.  The sum of the device and kernel read latencies.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.queueReadLatency.average&lt;/td&gt;
&lt;td&gt;Queue Read Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.deviceWriteLatency.average&lt;/td&gt;
&lt;td&gt;Device write latency. This is the time the physical device from the HBA to the platter takes to service an IO request.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.kernelWriteLatency.average&lt;/td&gt;
&lt;td&gt;Kernel write latency.  This is the time the VMkernel takes to service an IO.  This is the time between the guest OS and the device.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.totalWriteLatency.average&lt;/td&gt;
&lt;td&gt;Total write latency.  The sum of the device and kernel write latencies.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.queueWriteLatency.average&lt;/td&gt;
&lt;td&gt;Queue Write Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.deviceLatency.average&lt;/td&gt;
&lt;td&gt;Physical Device Command Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.kernelLatency.average&lt;/td&gt;
&lt;td&gt;Kernel Disk Command Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;disk.queueLatency.average&lt;/td&gt;
&lt;td&gt;Queue Command Latency&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;disk.numberRead.summation&lt;/td&gt;
&lt;td&gt;The number of IO read operations in the previous sample period.  Note that these operations may be variable sized up to 64 KB.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;disk.numberWrite.summation&lt;/td&gt;
&lt;td&gt;The number of IO write operations in the previous sample period.  Note that these operations may be variable sized up to 64 KB.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;disk.totalLatency.average&lt;/td&gt;
&lt;td&gt;This is the average total latency over the sample window.  Total latency is the sum of kernel and device latency for both read and write commands.&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;disk.write.average&lt;/td&gt;
&lt;td&gt;Disk Write Rate&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;disk.usage.none&lt;/td&gt;
&lt;td&gt;Disk Usage (None)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;disk.usage.minimum&lt;/td&gt;
&lt;td&gt;Disk Usage (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;disk.usage.maximum&lt;/td&gt;
&lt;td&gt;Disk Usage (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Network Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;net.usage.average&lt;/td&gt;
&lt;td&gt;Network Usage (Average)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;net.droppedRx.summation&lt;/td&gt;
&lt;td&gt;The number of received packets that were dropped over the sample period.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;net.droppedTx.summation&lt;/td&gt;
&lt;td&gt;The number of transmitted packets that were dropped over the sample period.&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;net.received.average&lt;/td&gt;
&lt;td&gt;Average network throughput for received traffic.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;net.transmitted.average&lt;/td&gt;
&lt;td&gt;Average network throughput for transmitted traffic.&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;net.packetsRx.summation&lt;/td&gt;
&lt;td&gt;Network Packets Received&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;net.packetsTx.summation&lt;/td&gt;
&lt;td&gt;Network Packets Transmitted&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;net.usage.none&lt;/td&gt;
&lt;td&gt;Network Usage (None)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;net.usage.minimum&lt;/td&gt;
&lt;td&gt;Network Usage (Minimum)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;net.usage.maximum&lt;/td&gt;
&lt;td&gt;Network Usage (Maximum)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Other Statistics&lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Level&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Counter name in API&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;units&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;sys.uptime.latest&lt;/td&gt;
&lt;td&gt;Uptime&lt;/td&gt;
&lt;td&gt;second&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;sys.heartbeat.summation&lt;/td&gt;
&lt;td&gt;Heartbeat&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.cpufairness.latest&lt;/td&gt;
&lt;td&gt;CPU Fairness&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.memfairness.latest&lt;/td&gt;
&lt;td&gt;Memory Fairness&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.effectivecpu.average&lt;/td&gt;
&lt;td&gt;Effective CPU Resources&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.effectivemem.average&lt;/td&gt;
&lt;td&gt;Effective Memory Resources&lt;/td&gt;
&lt;td&gt;megaBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;clusterServices.failover.latest&lt;/td&gt;
&lt;td&gt;Current failover level&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;sys.resourceCpuUsage.average&lt;/td&gt;
&lt;td&gt;Resource CPU Usage (Average)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;managementAgent.memUsed.average&lt;/td&gt;
&lt;td&gt;Memory Used (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;managementAgent.swapUsed.average&lt;/td&gt;
&lt;td&gt;Memory Swap Used (Average)&lt;/td&gt;
&lt;td&gt;kiloBytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;managementAgent.swapIn.average&lt;/td&gt;
&lt;td&gt;Memory Swap In (Average)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;managementAgent.swapOut.average&lt;/td&gt;
&lt;td&gt;Memory Swap Out (Average)&lt;/td&gt;
&lt;td&gt;kiloBytesPerSecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actav1.latest&lt;/td&gt;
&lt;td&gt;CPU Active (1 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actpk1.latest&lt;/td&gt;
&lt;td&gt;CPU Active (1 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runav1.latest&lt;/td&gt;
&lt;td&gt;CPU Running (1 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actav5.latest&lt;/td&gt;
&lt;td&gt;CPU Active (5 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actpk5.latest&lt;/td&gt;
&lt;td&gt;CPU Active (5 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runav5.latest&lt;/td&gt;
&lt;td&gt;CPU Running (5 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actav15.latest&lt;/td&gt;
&lt;td&gt;CPU Active (15 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.actpk15.latest&lt;/td&gt;
&lt;td&gt;CPU Active (15 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runav15.latest&lt;/td&gt;
&lt;td&gt;CPU Running (15 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runpk1.latest&lt;/td&gt;
&lt;td&gt;CPU Running (1 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.maxLimited1.latest&lt;/td&gt;
&lt;td&gt;CPU Throttled (1 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runpk5.latest&lt;/td&gt;
&lt;td&gt;CPU Running (5 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.maxLimited5.latest&lt;/td&gt;
&lt;td&gt;CPU Throttled (5 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.runpk15.latest&lt;/td&gt;
&lt;td&gt;CPU Running (15 min. peak)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.maxLimited15.latest&lt;/td&gt;
&lt;td&gt;CPU Throttled (15 min. average)&lt;/td&gt;
&lt;td&gt;percent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.sampleCount.latest&lt;/td&gt;
&lt;td&gt;Group CPU Sample Count&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;rescpu.samplePeriod.latest&lt;/td&gt;
&lt;td&gt;Group CPU Sample Period&lt;/td&gt;
&lt;td&gt;millisecond&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;sys.resourceCpuUsage.none&lt;/td&gt;
&lt;td&gt;Resource CPU Usage (None)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;sys.resourceCpuUsage.maximum&lt;/td&gt;
&lt;td&gt;Resource CPU Usage (Maximum)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;sys.resourceCpuUsage.minimum&lt;/td&gt;
&lt;td&gt;Resource CPU Usage (Minimum)&lt;/td&gt;
&lt;td&gt;megaHertz&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <pubDate>Fri, 30 May 2008 00:15:21 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5600</guid>
      <dc:date>2008-05-30T00:15:21Z</dc:date>
      <clearspace:dateToText>1 month, 3 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>5</clearspace:replyCount>
    </item>
    <item>
      <title>New Blog Home</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/09/16/new-blog-home</link>
      <description>&lt;br /&gt;
I have moved my blog home to a new location.  Come visit and read at &lt;a class="jive-link-external" href="http://vpivot.com"&gt;vPivot.com&lt;/a&gt;.&lt;br /&gt;
&lt;p /&gt;
Scott</description>
      <pubDate>Wed, 16 Sep 2009 18:19:47 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/09/16/new-blog-home</guid>
      <dc:date>2009-09-16T18:19:47Z</dc:date>
      <clearspace:dateToText>2 months, 6 days ago</clearspace:dateToText>
    </item>
    <item>
      <title>Love Your Balloon Driver</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/09/09/love-your-balloon-driver</link>
      <description>A couple of days ago we finally got out one of my favorite papers from our ongoing vSphere launch activities.  This &lt;a class="jive-link-external" href="http://www.vmware.com/resources/techresources/10062"&gt;paper on ESX memory management&lt;/a&gt;, written by Fei Guo in performance engineering, has three graphs that are absolute gems.  These graphs show balloon driver memory savings next to throughput numbers for three common benchmarks.  The conclusion is inescapable: the balloon driver reclaims memory from over-provisioned VMs with virtually no impact to performance.  This is true on every workload save one: Java.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Example 1: Kernel Compile&lt;/h2&gt;
Linux kernel compilation models a common developer environment involving a large number of code compiles.  This process is CPU and IO intensive but uses very little memory. &lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4976-6949/Picture+1.png" alt="Picture 1.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4976-6949/Picture+1.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
Results of two experiments are shown on this graph: in one memory is reclaimed only through ballooning and in the other memory is reclaimed only through host swapping.  The bars show the amount of memory reclaimed by ESX and the line shows the workload performance.  The steadily falling green line reveals a predictable deterioration of performance due to host swapping.  The red line demonstrates that as the balloon driver inflates, kernel compile performance is unchanged.&lt;br /&gt;
&lt;br /&gt;
Kernel compilation performance remains high with ballooning because this workload needs very little memory and the guest OS can easily take unused pages from the application.  Performance falls with swapping because ESX randomly selects virtual machine pages for swapping, whether those pages are in use by the application or not.  The guest OS is better at selecting pages for reclamation than ESX is. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Example 2: Oracle/Swingbench&lt;/h2&gt;
Oracle's database is best tested against Swingbench, the OLTP load generation tool provided by Oracle.  Database workloads utilize all system resources but show a non-linear dependence on memory.  Memory can be safely reclaimed from OSes running databases until the cache becomes smaller than needed by the workload.  The following figure shows this. &lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4976-6950/Picture+2.png" alt="Picture 2.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4976-6950/Picture+2.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
As before, the virtual machine using only ballooning maintains higher performance under memory pressure than the virtual machine whose memory is being swapped away by the host.  Performance is constant and shows no negative impact due to ballooning until the balloon encroaches on the SGA.  Again, ESX's host swapping randomly selects pages to send to disk which degrades performance even at small swap amounts.&lt;br /&gt;
&lt;br /&gt;
As with kernel compile, the balloon driver safely reclaims memory from over-provisioned VMs with little impact to application performance. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Example 3: Java/SPECjbb&lt;/h2&gt;
Java provides a special challenge in virtual environments due to the JVM's introduction of a third level of memory management.  The balloon driver draws memory from the virtual machine without impacting throughput because the guest OS efficiently claims pages that its processes are not using.  But in the case of Java, the guest OS is unaware of how the JVM is using memory and is forced to select memory pages an arbitrarily and inefficiently as ESX's swap routine.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4976-6951/Picture+3.png" alt="Picture 3.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4976-6951/Picture+3.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
Neither ESX nor the guest OS can efficiently take memory from the JVM without significantly degrading performance.  Memory in Java is managed internal to the JVM and efforts by the host or guest to remove pages will equally negatively impact Java applications performance.  In these environments it is wise to manually set the JVM's heap size and specify memory reservations for the virtual machine in ESX to account for the JVM, OS, and heap. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Conclusions and Scott's Special Recommendation&lt;/h2&gt;
Love your balloon driver.  Your application owners are always asking for more memory than they need.  With great comfort you can over-provision memory some and rely on ESX and the balloon driver to reclaim what is not in use.  Without the balloon driver, ESX will be forced to use its last technology for managing memory over-commit: host swapping.  And host swapping always decreases performance. &lt;br /&gt;
&lt;br /&gt;
So here is my special recommendation for you: never, ever disable the balloon driver.  This forces the host to swap that virtual machine's memory, should that resource become scarce.  And where ballooning usually will not hurt performance, swapping always will.  If you &lt;i&gt;must&lt;/i&gt; protect an application from memory reclamation due to memory over-commitment, use reservations.  They make admission control more effective, they self-document the needs of the VM, and they are easily configured.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmkernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">balloon</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">swap</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">java</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">specjbb</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">oracle</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">swingbench</category>
      <pubDate>Wed, 09 Sep 2009 17:26:33 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/09/09/love-your-balloon-driver</guid>
      <dc:date>2009-09-09T17:26:33Z</dc:date>
      <clearspace:dateToText>2 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Four Things You Should Know About ESX 4's Scheduler</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/08/21/four-things-you-should-know-about-esx-4s-scheduler</link>
      <description>I spent a great deal of time answering customers' questions about the scheduler.  Never have so many questions been asked about such an abstruse component for which so little user influence is possible.  But CPU scheduling is central to system performance, so VMware strives to provide as much information on the subject as possible.  In this blog entry, I want to point out a few nuggets of information on the CPU scheduler.  These four bullets answer 95% of the questions I get asked.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Item 1: ESX 4's Scheduler Better Uses Caches Across Sockets&lt;/h1&gt;
On UMA systems with low load levels, virtual machine performance improves when each virtual CPU (vCPU) is placed on its own socket.  This is because providing each vCPU its own socket also give it the entire cache on that CPU.  On page 18 of a &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf-vsphere-cpu_scheduler.pdf"&gt;recent paper on the scheduler written by Seongbeom Kim&lt;/a&gt;, a graph highlights the case where vCPU spreading improves performance.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4886-6674/Picture+2.png" alt="Picture 2.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4886-6674/Picture+2.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
The X-axis represents different combinations of VM and vCPU counts.  SPECjbb is memory intensive and shows great gains with increases in CPU cache.  The few cases that show dramatic benefit due to the ESX 4.0 scheduler are benefiting from the distribution of vCPUs across sockets.  Very large gains are possible in this somewhat uncommon case.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Item 2: Overuse of SMP Only Slows Consolidated Environments At Saturation &lt;/h1&gt;
For years customers have asked me how many vCPUs they should give to their VMs.  The best guidance, "as few as possible", seems too vague to satisfy.  It remains the only correct answer, unfortunately.  But &lt;a class="jive-link-external" href="http://blogs.vmware.com/performance/2009/06/measuring-the-cost-of-smp-with-mixed-workloads.html"&gt;a recent experiment performed by Bruce Herndon's team&lt;/a&gt; sheds some light on this VM sizing question.&lt;br /&gt;
&lt;br /&gt;
In this experiment we ran VMmark against VMs that were configured outside of VMmark specifications.  In one case some of the virtual machines were given too few vCPUs and in another they were given too many.  Because VMmark's workload is fixed, changing VM sizes does not alter the amount of work performed by the VMs.  In other words, the system's score does not depend on the VMs' vCPU count.  Until CPU saturation, that is.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4886-6675/Picture+3.png" alt="Picture 3.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4886-6675/Picture+3.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
Notice that the scores are similar between the undersized, right-sized, and over-sized VMs.  Up until tile 10 (60 VMs) they are nearly identical.  There is a slight difference in processor utilization that begins to impact throughput (score) as the system runs out of CPU.  At that point wasted cycles dedicated to unneeded vCPUs negatively impact the system performance.  Two points I will call out from this work:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Sloppy VI admins that provide too many vCPUs need not worry about performance when their servers are under low load.  But performance will suffer when CPU utilization spikes.&lt;/li&gt;
&lt;li&gt;The penalty of over-sizing VMs gets worse as VMs get larger.  Using a 2-way VM is not that bad, but unneeded use of 4-way VM when one or two processors suffice can cost up to 15% of your system throughput.  I presume that unnecessarily eight vCPUs would be criminal.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Item 3: ESX Has Not Strictly Co-scheduled Since ESX 2.5&lt;/h1&gt;
I have documented ESX's relaxation of co-scheduling previously (&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-4960"&gt;Co-scheduling SMP VMs in VMware ESX Server&lt;/a&gt;).  But this statement cannot be repeated too frequently: ESX has not strictly co-scheduled virtual machines since version 2.5.   This means that ESX can place vCPUs from SMP VMs individually.  It is not necessary to wait for physical cores to be available for every vCPU before starting the VM.  However, as Item 3 pointed out, this does not give you free license to over-size your VMs.  Be frugal with your SMP VMs and assign vCPUs only when you need them. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Item 4: The Cell Construct Has Been Eliminated in ESX 4.0 &lt;/h1&gt;
In the performance best practices deck that I give at conferences I talk about the benefits of creating small virtual machines over large ones.  In versions of ESX up to ESX 3.5, the scheduler used a construct called a cell that would contain and lock CPU cores.  The vCPUs from a single VM could never span a cell.  With a ESX 3.x's cell size of four this meant that VMs never spanned multiple four-core sockets.  Consider this figure:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4886-6688/Picture+1.png" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4886-6688/Picture+1.png" class="jive-image"  /&gt;  &lt;br /&gt;
&lt;br /&gt;
What this figure shows is that a four-way VM on ESX 3.5 can only be placed in two locations on this hypothetical two-socket configuration.  There are 12 combinations for a two-way VM and eight for a uniprocessor VM.  The scheduler has more opportunities to optimize VM placement when you provide it with smaller VMs.&lt;br /&gt;
&lt;br /&gt;
In ESX 4 we have eliminated the cell lock so VMs can span multiple sockets, as item one states.  Continue to think of this placement problem as a challenge to the scheduler that you can alleviate.  By choosing multiple, smaller VMs you free the scheduler to pursue opportunities to optimize performance in consolidated environments.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">scheduler</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmmark</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">cpu</category>
      <pubDate>Wed, 19 Aug 2009 16:59:14 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/08/21/four-things-you-should-know-about-esx-4s-scheduler</guid>
      <dc:date>2009-08-19T16:59:14Z</dc:date>
      <clearspace:dateToText>3 months, 4 days ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>First Success of VMware's Performance Service Offering</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/08/17/first-success-of-vmwares-performance-service-offering</link>
      <description>Just over a week ago I had the privilege of riding along with VMware's Professional Services Organization as they piloted a possible performance offering.  We are considering two possible services: one for performance troubleshooting and another for infrastructure optimization.  During this trip we piloted the troubleshooting service, focusing on the customer's disappointing experience with SQL Server's performance on vSphere.&lt;br /&gt;
&lt;br /&gt;
If you have read my blog entries (&lt;a class="jive-link-blogpost" href="http://communities.vmware.com/blogs/drummonds/2009/03/13/sql-server-performance-problems-not-due-to-vmware"&gt;SQL Server Performance Problems Not Due to VMware&lt;/a&gt;) or &lt;a class="jive-link-external" href="http://www.vmware.com/a/webcasts/details/265"&gt;heard me speak&lt;/a&gt;, you know that SQL performance is a major focus of my work.  SQL Server is the most common source of performance discontent among our customers, yet 100% of the problems I have diagnosed were not due to vSphere.  When this customer described the problem, I knew this SQL Server issue was stereotypical of my many engagements:&lt;br /&gt;
&lt;blockquote&gt;"We virtualized our environment nearly a year ago and and quickly determined that virtualization was not right for our SQL Servers.  Performance dropped by 75% and we know this is VMware's fault because we virtualized on much newer hardware on the exact same SAN.  We have since moved the SQL instance back to native."&lt;/blockquote&gt;
Most professionals in the industry stop here, incorrectly bin this problem as a deficiency of virtualization, and move on with their deployments.  But I know that &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_vsphere_sql_scalability.pdf"&gt;vSphere's abilities with SQL Server&lt;/a&gt; are phenomenal, so I expect to make every user happy with their virtual SQL deployment. I start by challenging the assumptions and trust nothing that I have not seen for myself.  Here are my first steps on the hunt for the source of the problem:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Instrument the SQL instance that has been moved back to native to profile its resource utilization.  Do this by running Perfmon to collect stats on the database's memory, CPU, and disk usage.&lt;/li&gt;
&lt;li&gt;Audit the infrastructure and document the SAN configuration.  Primarily I will need RAID group and LUN configuration and an itemized list of VMDKs on each VMFS volume.&lt;/li&gt;
&lt;li&gt;Use esxtop and vscsiStats to measure resource utilization of important VMs under peak production load.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
There are about a dozen other things that I could do here, but my experience in these issues is that I can find 90% of all performance problems with just these three steps.  Let me start by showing you the two RAID groups that were most important to the environment.  I have greatly simplified the process of estimating these groups' performance, but the rough estimate will serve for this example:&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;RAID Group&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Configuration&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Performance Estimate&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;RAID5 using 4 15K disks&lt;/td&gt;
&lt;td&gt;4 x 200 = 800 IOPS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;td&gt;RAID5 using 7 10K disks&lt;/td&gt;
&lt;td&gt;7 x 150 = 1050 IOPS&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
We found two SQL instances in their environment that were generating significant IO: one that had been moved back to native and one that remained in a virtual machine.  By using Perfmon for the native instance and vscsiStats the virtual one, we documented the following demands during a one-hour window:&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;SQL Instance&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Peak&lt;/b&gt; &lt;b&gt;IOPS&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Average&lt;/b&gt; &lt;b&gt;IOPS&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;X (physical)&lt;/td&gt;
&lt;td&gt;1800&lt;/td&gt;
&lt;td&gt;850&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Y (virtual)&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;td&gt;400&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
In the customer's first implementation of the virtual infrastructure, both SQL Servers, X and Y, were placed on RAID group A.  But in the native configuration SQL Server X was placed on RAID group B.  This meant that the storage bandwidth of the physical configuration was approximately 1850 IOPS.  In the virtual configuration the two databases shared a single 800 IOPS RAID volume.&lt;br /&gt;
&lt;br /&gt;
It does not take a rocket scientist to realize that users are going to complain when a critical SQL Server instances goes from 1050 IOPS to 400.  And this was not news to the VI admin on-site, either.  What we found as we investigated further was that virtual disks requested by the application owners were used in unexpected and undocumented ways and frequently demanded more throughput than originally estimated.  In fact, through vscsiStats analysis (&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-10095"&gt;Using vscsiStats for Storage Performance Analysis&lt;/a&gt;), my contact and I were able to identify an "unused" VMDK with moderate sequential IO that we immediately recognized as log traffic.  Inspection of the application's configuration confirmed this.&lt;br /&gt;
&lt;br /&gt;
Despite the explosion of VMware into the data center we remain the new kid on the block.  As soon as performance suffers the first reaction is to blame the new kid.   But next time you see a performance problem in your production environment, I urge you to look at the issue as a consolidation challenge, and not a virtualization problem.  Follow the best practices you have been using for years and you can correct this problem without needing to call me and my colleagues to town.&lt;br /&gt;
&lt;br /&gt;
Of course, if you want to fly us out for to help you correct a specific problem or optimize your design, I promise we will make it worth your while.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">sql</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vscsistats</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <pubDate>Thu, 13 Aug 2009 18:23:22 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/08/17/first-success-of-vmwares-performance-service-offering</guid>
      <dc:date>2009-08-13T18:23:22Z</dc:date>
      <clearspace:dateToText>3 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Performance Debate at Burton Group's Catalyst 2009</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/08/03/performance-debate-at-burton-groups-catalyst-2009</link>
      <description>Last week Chris Wolf moderated a debate on virtual platform performance between myself and Simon Crosby, CTO of Citrix.  A &lt;a class="jive-link-external" href="http://www.catalyst.burtongroup.com/Na09/PlayerVideo011.html"&gt;recording of the debate&lt;/a&gt; was put online shortly after its conclusion.&lt;br /&gt;
&lt;br /&gt;
Simon and I disagreed on a few issues and demonstrated different strategies in the discussion.  My goal in representing the fine efforts of our performance team was to show to the audience VMware's commitment to product performance.  This commitment is demonstrated through a never ending series of benchmark publications and continual product improvement.  In the years since I joined VMware we have quantified ESX's ability to serve web pages (&lt;a class="jive-link-external" href="http://www.vmware.com/company/news/releases/specweb2005.html"&gt;SPECweb&lt;/a&gt;), enable massive numbers of database transactions (&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/Perf_ESX40_Oracle-TPC-C-eval.pdf"&gt;TPC-C&lt;/a&gt;, with disclaimers), and establish industry leadership in consolidated workloads (&lt;a class="jive-link-external" href="http://www.vmware.com/products/vmmark/"&gt;VMmark&lt;/a&gt;).  As we released these and dozens of other numbers, Citrix has remained silent on its own product's performance.&lt;br /&gt;
&lt;br /&gt;
I was pleased that the event's format gave me the opportunity to discuss our accomplishments.  My only regret was that I lacked the time to dispense with the most important of several factual inaccuracies from Simon.  At one point in the discussion Simon claimed that VMmark is not run by anyone except VMware.  In fact, it is closer to the truth to say that VMmark is run by everyone except VMware.  A quick view of &lt;a class="jive-link-external" href="http://www.vmware.com/products/vmmark/results.html"&gt;the VMmark results page&lt;/a&gt; will show results from every major server vendor, with no submissions from VMware.&lt;br /&gt;
&lt;br /&gt;
Thanks to the Burton Group and Chris Wolf for letting me participate.  It was a pleasure.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmmark</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">specweb</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">citrix</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">xenserver</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">oracle</category>
      <pubDate>Mon, 03 Aug 2009 17:33:34 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/08/03/performance-debate-at-burton-groups-catalyst-2009</guid>
      <dc:date>2009-08-03T17:33:34Z</dc:date>
      <clearspace:dateToText>3 months, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Tweaking vSphere Performance For High-Consolidation Workloads</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/06/25/tweaking-vsphere-performance-for-highconsolidation-workloads</link>
      <description>I was recently copied on an internal thread discussing a performance tweak for VMware vSphere.  The thread discussed gains that can be derived from an adjustment to the CPU scheduler.  In ESX 3.5, ESX's cell construct limited vCPU mobility between different sockets.  ESX 4.0 has no such limitations and its aggressive migrations are non-optimal in some cases. &lt;br /&gt;
&lt;br /&gt;
This thread details the application of this change in ESX 4 and provides some insight into its impact.  This scheduler modification is going to be baked in to the first update to ESX 4.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="jive-quote"&gt;
On 4socket (or more) Dunnington (or any non-NUMA) platform, VMmark score can be further improved by enabling CoschedHandoffLLC:  In console OS, it can be enabled via vsish (available from VMware*debug-tools*.rpm):&lt;br /&gt;
&lt;br /&gt;
vsish -e set /config/Cpu/intOpts/CoschedHandoffLLC 1 &lt;br clear="all" /&gt;I believe that config parameter is also tunable through VC or VI client. (haven't confirmed myself)&lt;br /&gt;
&lt;br /&gt;
The degree of improvement depends on the configurations but in one case, the improvement was about 10 - 20%.&lt;br /&gt;
&lt;br /&gt;
In default setting, VMmark may suffer many inter-package vcpu migrations which causes performance degradation. Setting CoschedHandoffLLC reduces the number of inter-package vcpu migrations and recovers performance loss.&lt;br /&gt;
&lt;br /&gt;
The fix is disabled by default in ESX 4.0 GA but will be enabled by default in ESX 4.0 u1.&lt;br /&gt;
&lt;/div&gt;
&lt;br /&gt;
Try this out and let me know if you see a significant change on any of your workloads.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">intel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">scheduler</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxi</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmkernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmmark</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vsphere</category>
      <pubDate>Thu, 25 Jun 2009 10:41:57 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/06/25/tweaking-vsphere-performance-for-highconsolidation-workloads</guid>
      <dc:date>2009-06-25T10:41:57Z</dc:date>
      <clearspace:dateToText>4 months, 4 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Using Perfmon For Accurate, ESX Performance Counters</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/06/18/using-perfmon-for-accurate-esx-performance-counters</link>
      <description>My colleague in product management, Praveen Kannan, has been working to extend Perfmon to show some ESX performance counters.  This capability is automatically installed with VMware Tools on vSphere 4.  But Praveen and I have made a stand-alone version available to those of you that are still on VI3.  &lt;a class="jive-link-external" href="http://ftpsite.vmware.com/download/vmStatsProvider/vmStatsProvider_006_release.exe"&gt;Download it here&lt;/a&gt; to give it a try.&lt;br /&gt;
&lt;br /&gt;
To install, place the file in an appropriately-named directory on any Windows VM on VI3.  Double-click the executable, which will self-extract the files into the same directory.  Run "install.bat" and you're done. &lt;br /&gt;
&lt;br /&gt;
Once you bring up Perfmon you'll see two new performance objects on your computer: "VM Memory" and "VM Processor".  These objects contain counters exposed by ESX that accurately reflect the VM's memory and CPU usage.  Here's Perfmon on my test VM after I've installed the tool.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4616-6039/new_counters.png" alt="new_counters.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-4616-6039/new_counters.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
This makes collection of host stats a breeze.  Windows Management Instrumentation (WMI) programs can now easily get access to reliable host statistics.  And anyone with access to Perfmon can get see their VM's resource usage.  Unlike guest-based statistics, the host-statistics shown through these counters accurately reflect resource usage in the presence of virtualization overheads and time slicing of VMs. &lt;br /&gt;
&lt;br /&gt;
Disclaimer:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;&lt;i&gt;This is a pre-release "sneak peak" version. Eventually this tool will be available for download on vmware.com and supported by VMware. But today there is no support for this tool and you're using it "as-is".  Use at your own risk and do not contact VMware support for help with this release.&lt;/i&gt;&lt;/blockquote&gt;
That's VMware's official position on this tool.  But feel free to comment here with any ideas about this great new feature.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">windows</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">perfmon</category>
      <pubDate>Thu, 18 Jun 2009 00:22:58 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/06/18/using-perfmon-for-accurate-esx-performance-counters</guid>
      <dc:date>2009-06-18T00:22:58Z</dc:date>
      <clearspace:dateToText>5 months, 1 week ago</clearspace:dateToText>
      <clearspace:replyCount>11</clearspace:replyCount>
    </item>
    <item>
      <title>ESX Monitor Modes</title>
      <link>http://communities.vmware.com/docs/DOC-9882</link>
      <description>VMware has supported Intel and AMD's virtualization assist since 2006.  Long before then we were using an all-software approach that we call binary translation (BT).  With the benefit of years of development and optimization, BT outperformed the early versions of hardware assist.  But as hardware assist evolved the use of these new features became more attractive.&lt;br /&gt;
&lt;br /&gt;
Because our support for hardware assist is rich and BT is heavily optimized, the monitor can benefit from using either technology in different situations.  The following tables detail the defaults in ESX 4.0, which can be changed through VM settings if desired. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Monitor Defaults with Intel Processors &lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;VM Configuration&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Core-i7 (Nehalem)&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;45nm Core2 with VT-x&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;65nm Core2 with VT-x and FlexPriority&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;65nm Core2 with VT-x and No FlexPriority&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;P4 with VT-x&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;EM64T without VT-x&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;No EM64T&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FT enabled&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;64-bit guests&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VMI enabled&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenServer, UnixWare, OS/2&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit Linux and 32-bit FreeBSD&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit Windows XP, Windows Vista, Windows Server 2003, Windows Server 2008&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows 2000, Windows NT, DOS, Windows 95, Windows 98, Netware, 32-bit Solaris&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT (*)&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All other 32-bit guests&lt;/td&gt;
&lt;td&gt;VT-x + EPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;VT-x + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
(*) When we use BT on an Intel system with VT-x capability, we dynamically switch to VT-x if the guest enters long mode.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Monitor Defaults with AMD Processors &lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Configuration&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Barcelona, Phenom, and Newer&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;AMD64 pre-Barcelona&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;No AMD64&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FT enabled&lt;/td&gt;
&lt;td&gt;AMD-V + SPT&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;64-bit guests&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;Not runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VMI enabled&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenServer, UnixWare, OS/2&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit Linux and 32-bit FreeBSD&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit Windows XP, Windows Vista, Windows Server 2003, Windows Server 2008&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows 2000, Windows NT, DOS, Windows 95, Windows 98, Netware, 32-bit Solaris&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All other 32-bit guests&lt;/td&gt;
&lt;td&gt;AMD-V + RVI&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;td&gt;BT + SPT&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h2&gt;Legend&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;VT-x: Intel's virtualization hardware assist.&lt;/li&gt;
&lt;li&gt;EPT: &lt;i&gt;Extended Page Tables.&lt;/i&gt;  Intel's on-board, virtualization-aware memory management unit (MMU).&lt;/li&gt;
&lt;li&gt;EM64T: Intel's 64-bit extensions to the x86 architecture.&lt;/li&gt;
&lt;li&gt;SPT: &lt;i&gt;Shadow page tables.&lt;/i&gt;  ESX's software memory management unit (i.e., not EPT or RVI.)&lt;/li&gt;
&lt;li&gt;BT: &lt;i&gt;Binary translation.&lt;/i&gt;  ESX's software virtualization capability (i.e., not VT or AMD-V)&lt;/li&gt;
&lt;li&gt;AMD-V: AMD's virtualization hardware assist.&lt;/li&gt;
&lt;li&gt;RVI: &lt;i&gt;Rapid Virtualization indexing.&lt;/i&gt;  AMD's on-board, virtualization-aware memory management unit (MMU).&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">monitor</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esx</category>
      <pubDate>Tue, 28 Apr 2009 20:40:05 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-9882</guid>
      <dc:date>2009-04-28T20:40:05Z</dc:date>
      <clearspace:dateToText>5 months, 1 week ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>My Hyper-V Video</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/06/10/my-hyperv-video</link>
      <description>&lt;br /&gt;
There's been no shortage of comments on the Hyper-V video I posted.  I made &lt;a class="jive-link-external" href="http://blogs.vmware.com/vmtn/2009/06/an-apology-from-scott-drummonds.html"&gt;a comment on this action&lt;/a&gt; in a VMTN blog entry.  Read up and comment here or there.&lt;br /&gt;
&lt;p /&gt;
Scott</description>
      <pubDate>Wed, 10 Jun 2009 20:22:47 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/06/10/my-hyperv-video</guid>
      <dc:date>2009-06-10T20:22:47Z</dc:date>
      <clearspace:dateToText>5 months, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Drink From the Fire Hose</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/06/03/drink-from-the-fire-hose</link>
      <description>A few weeks ago our communities' administrators setup an XML aggregation of all blogs in VMware's performance community.  In addition to the regular postings coming from VROOM! and me, there are several other members of our performance team that irregularly contribute new content.  If you follow the aggregator and its RSS feed then you'll be notified of new performance content as it goes live.&lt;br /&gt;
&lt;br /&gt;
The aggregator can be found at &lt;a class="jive-link-external" href="http://www.vmware.com/vmtn/planet/vmware/performance.xml"&gt;http://www.vmware.com/vmtn/planet/vmware/performance.xml&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Enjoy!</description>
      <pubDate>Wed, 03 Jun 2009 21:04:48 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/06/03/drink-from-the-fire-hose</guid>
      <dc:date>2009-06-03T21:04:48Z</dc:date>
      <clearspace:dateToText>5 months, 3 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Storage Workload Characterization and Consolidation in Virtualized Enviornments</title>
      <link>http://communities.vmware.com/docs/DOC-10104</link>
      <description />
      <category domain="http://communities.vmware.com/tags?communityID=1">oracle</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">exchange</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vscsistats</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">iscsi</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">san</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">nfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmfs</category>
      <pubDate>Wed, 03 Jun 2009 00:21:48 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-10104</guid>
      <dc:date>2009-06-03T00:21:48Z</dc:date>
      <clearspace:dateToText>5 months, 3 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Newer Processors and Virtualization Performance</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/06/02/newer-processors-and-virtualization-performance</link>
      <description>Newer processors are much more important to virtualization than physical, un-virtualized environments.  The generational improvements haven't just increased the raw compute power, they've also reduced the overheads associated with virtualization.  This blog entry will describe three key changes that have particularly impacted virtual performance.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Hardware Assist Is Faster&lt;/h2&gt;
In 2008, AMD became the first CPU vendor to produce a hardware memory management unit equipped to support virtualization.  They called this technology Rapid Virtualization Indexing (RVI).  This year Intel did the same with Extended Page Tables (EPT) on its Xeon 5500 line.  Both vendors have been providing the ability to virtualize privileged instructions since 2006, with continually improving results.  Consider the following graph showing the latency of one key instruction from Intel:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-3171-5926/vmexit_latencies.png" alt="vmexit_latencies.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-3171-5926/vmexit_latencies.png');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
This instruction, VMEXIT, is called each time the guest exits to the kernel.  The graph shows its latency (delay) in completing this instruction, which represents a wait time incurred by the guest.  Clearly Intel has made great strides in reducing VMEXIT's wait time from its Netburst parts (Prescott and Cedar Mill) to its Core architecture (Merom and Penryn) and on to its current generation, Core i7 (Nehalem).  AMD processors have shown commensurate gains with AMD-V. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Pipelines Are Shorter&lt;/h2&gt;
The longest pipelines in the x86 world were in Intel's Netburst processors.  These processor's pipelines had twice as many stages at their counterparts at AMD and twice as many as the generation of Intel CPUs that followed.  The increased pipeline length would have enabled support for 8 GHz silicon, had it arrived.  Instead, silicon switching speeds hit a wall at 4 GHz and Intel (and its customers) were forced to suffer the drawbacks of large pipelines.&lt;br /&gt;
&lt;br /&gt;
Large pipelines aren't necessarily a problem for desktop environments, where single threaded applications used to dominate the market.  But in the enterprise, application thread counts were larger.  Furthermore, consolidation in virtual environments drew thread counts even higher.  With more contexts in the processor, the number of pipeline stalls and flushes increased, and performance fell.&lt;br /&gt;
&lt;br /&gt;
Because of decreased efficiency of consolidated workloads on processors with long pipelines, VMware has often recommended that performance-intensive VMs be run on processors no older than 2-3 years.  This excludes Intel's Netburst parts.  VI3 and vSphere will do a fine job at virtualizing your less-demanding applications on any supported processors.  But use newer parts for the applications that hold your highest performance expectations. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Caches Are Larger &lt;/h2&gt;
A cache is highly effective when it fully contains the software's working set.  The addition from the hypervisor of even a small about of code will change the working set and reduce cache hit rate.  I've attempted to illustrate this concept with the following simplified view of the relationship between cache hit rates, application working set, and cache sizes:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-3171-5927/cache_hit_rates.png" alt="cache_hit_rates.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-3171-5927/cache_hit_rates.png');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
This graph is based on a model that greatly simplifies working sets and the hypervisor's impact on them.  Assuming that ESX increases the working set by 256 KB, this graph shows the difference in cache hit rate due to the contributions of the hypervisor.  Notice that with very small caches and very small application working sets, the cache hit rate suffers greatly due to the addition of even 256 KB of virtualization support instructions.  And even up to 2 MB, a 10% decrease in cache hit rate can be seen in some applications.  With a 256 KB contribution by the kernel, cache hit rates do not change significantly with cache sizes of 4 MB and beyond.&lt;br /&gt;
&lt;br /&gt;
In some cases a 10% improvement in cache hit rate can double application throughput.  This means that a doubling of cache size can profoundly effect the performance of virtual applications as compared to native.  Given ESX's small contribution to the working set, you can see why we at VMware recommend that customers run their performance-intensive workloads on CPUs with 4 MB caches or larger.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmkernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">monitor</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">rvi</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">ept</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">intel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">amd</category>
      <pubDate>Tue, 02 Jun 2009 20:08:42 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/06/02/newer-processors-and-virtualization-performance</guid>
      <dc:date>2009-06-02T20:08:42Z</dc:date>
      <clearspace:dateToText>5 months, 3 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>vscsiStats: Fast and Easy Disk Workload Characterization on VMware ESX Server</title>
      <link>http://communities.vmware.com/docs/DOC-10084</link>
      <description />
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">iscsi</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">fc</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">san</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">fibre</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">nas</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">nfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmfs</category>
      <pubDate>Mon, 01 Jun 2009 22:53:39 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-10084</guid>
      <dc:date>2009-06-01T22:53:39Z</dc:date>
      <clearspace:dateToText>5 months, 3 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Final Thoughts on the Hyper-V Video</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/05/15/final-thoughts-on-the-hyperv-video</link>
      <description>Its been about 10 days since I posted the &lt;a class="jive-link-external" href="http://www.youtube.com/watch?v=XlLPmWwzHzM"&gt;YouTube video showing Hyper-V's stability problems&lt;/a&gt; in consolidated environments. I immediately received a lot of questions about the configuration that I answered to the best of my ability in my "&lt;a class="jive-link-blogpost" href="http://communities.vmware.com/blogs/drummonds/2009/05/15/video-on-hyperv-crashes"&gt;Video on Hyper-V Crashes&lt;/a&gt;" blog entry.  Many respondents were not surprised by stability problems with a first-generation product and some people requested more detail on this issue for further discussion.  But there were too many comments to address in all. &lt;br /&gt;
&lt;br /&gt;
One of the more interesting emails I received pointed out that it unreasonable to blame Hyper-V for &lt;a class="jive-link-external" href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;#38;articleId=9132389"&gt;the collapse of these very large and very busy websites&lt;/a&gt;. Hyper-V's stability issues would bring down individual VMs or small groups when the parent partition blue screened.  I think that this is a reasonable observation, so its worth including here.  I can't say that Hyper-V was responsible for the MSDN and TechNet crashes.  That would be for Microsoft to say, when and if they choose to expose the issue behind the outage.&lt;br /&gt;
&lt;br /&gt;
Lastly, all comments come from people that fall into one of two categories: one camp thinks the video captures are bogus and the other believes they're based on a real, reasonable, repeatable workload.  I'm not going to try and move you from one camp to the other.&lt;br /&gt;
&lt;br /&gt;
It is clear that a small, vocal, and surprisingly profane number of you think that I made this whole thing up.  The premise of this latter group appears to be that Microsoft wouldn't make a product that a customer could crash under normal conditions.  If this is your reasoning then no video, discussion or demonstration is going to change your mind.  I'll let everyone else make their decisions based on Microsoft's track record and his or her experience with Microsoft products.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Update: 5/15/09&lt;/h2&gt;
The team responsible for the research has deciced to post details: &lt;a class="jive-link-blogpost" href="http://communities.vmware.com/blogs/bherndon/2009/06/08/setting-the-record-straight-on-the-hyperv-video"&gt;Setting the Record Straight on the Hyper-V Video&lt;/a&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">hyper-v</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmmark</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">microsoft</category>
      <pubDate>Tue, 12 May 2009 22:50:15 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/05/15/final-thoughts-on-the-hyperv-video</guid>
      <dc:date>2009-05-12T22:50:15Z</dc:date>
      <clearspace:dateToText>6 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Video on Hyper-V Crashes</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/05/15/video-on-hyperv-crashes</link>
      <description>Since I posted &lt;a class="jive-link-external" href="http://www.youtube.com/watch?v=XlLPmWwzHzM"&gt;the YouTube video showing Hyper-V blue screens&lt;/a&gt; last Friday I've received a lot of comments, questions, compliments and complaints.  The video and descriptive text have raised more questions than answers, so here are a few details to help fill out the story.&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;The workload was not technically VMmark.  There are two reasons for this:
&lt;ul&gt;
&lt;li&gt;VMmark's run rules specify that the VMs must be configured with a single virtual disk.  Because this configuration can't make use of Hyper-V's paravirtualized SCSI driver, which requires a second virtual disk, the run rules were violated to make Hyper-V produce its best results.&lt;/li&gt;
&lt;li&gt;The vendors that provided requirements for VMmark included use of SMP Linux guests.  Hyper-V's lack of support for these configurations means that it is unable to run VMmark according to the rules.  Those rules were ignored by the test team and the ESX and Hyper-V tests were run with uniprocessor Linux guests so that Hyper-V was able to produce some number.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The server ran 15 tiles* when ESX was installed.  So, the hardware is good.&lt;/li&gt;
&lt;li&gt;The server successfully ran 10 tiles* when Hyper-V was installed, although at a much higher CPU utilization and lower throughput than ESX.  The server seems to run Hyper-V correctly.&lt;/li&gt;
&lt;li&gt;The 11-tile* run was tried many, many times.  Hyper-V was unable to run 11 tiles without guest blue screens or the parent partition crashing and bringing down the server.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
(*)  As detailed in the first bullet, these aren't real "tiles".  They have been dumbed down (Linux SMP) and reconfigured (extra virtual disk) to work around Hyper-V limitations. &lt;br /&gt;
&lt;br /&gt;
I'm hoping to convince the people responsible for the test to shed their anonymity and come out with an official paper.  I'll provide those details as soon as I can get them.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Update: 5/15/09&lt;/h2&gt;
The team reasonable for the research has posted details of the experiment.  Read more at &lt;a class="jive-link-blogpost" href="http://communities.vmware.com/blogs/bherndon/2009/06/08/setting-the-record-straight-on-the-hyperv-video"&gt;Setting the Record Straight on the Hyper-V Video.&lt;/a&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmmark</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">hyper-v</category>
      <pubDate>Wed, 06 May 2009 22:59:51 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/05/15/video-on-hyperv-crashes</guid>
      <dc:date>2009-05-06T22:59:51Z</dc:date>
      <clearspace:dateToText>6 months, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Virtualizing Microsoft SQL Server</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/05/11/virtualizing-microsoft-sql-server</link>
      <description>At VMworld Europe 2009 my engineering colleague Chethan Kumar and I presented the results of a six-month investigation into the performance of SQL Server on ESX.  Tomorrow (May 12 at 09:00 PDT) we're going to offer an updated version of this session to the general public.  If you have any interest in virtualized SQL Server deployments, please &lt;a class="jive-link-external" href="http://www.vmware.com/a/webcasts/details/265"&gt;register and attend&lt;/a&gt; the presentation to discover what we learned in our investigation.&lt;br /&gt;
&lt;br /&gt;
I provided some notes on that presentation in a blog entry (&lt;a class="jive-link-blogpost" href="http://communities.vmware.com/blogs/drummonds/2009/03/13/sql-server-performance-problems-not-due-to-vmware"&gt;SQL Server Performance Problems Not Due to VMware&lt;/a&gt;) right after the show.  But the large numbers of attendees and exceptionally high ratings encouraged me to setup this encore session.  And since Chethan's research on SQL Server performance tuning has continued, we have some updates to the experimental results.&lt;br /&gt;
&lt;br /&gt;
In tomorrow's webinar we will tell the story of our exploration into persistent rumors of SQL Server performance problems.  The search began after VMworld 2008 when I decided to engage every customer with a complaint on SQL Server performance.  At the same time Chethan investigated every possible application, operating system, and hypervisor parameter that could impact SQL performance.  I talked to dozens of customers and Chethan spent hundreds of hours on this work.&lt;br /&gt;
&lt;br /&gt;
This presentation will detail the results of our investigation and leave its attendees with a clear understanding SQL performance on VMware.  Our conclusions are surprisingly simple and certain to help you get the most out of your virtual infrastructure.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">sql</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmworld</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">microsoft</category>
      <pubDate>Mon, 11 May 2009 20:20:29 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/05/11/virtualizing-microsoft-sql-server</guid>
      <dc:date>2009-05-11T20:20:29Z</dc:date>
      <clearspace:dateToText>6 months, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>The Role of the ESX Monitor</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/04/29/the-role-of-the-esx-monitor</link>
      <description>There's a lot of confusion out there on VMware's support for the CPU vendors' virtualization assist technology.  VMware has always led the industry with its support for hardware assist.  We were the first vendor to support AMD-v and Intel VT-x in 2006, the first to support AMD RVI in 2008, and will be the first to support Intel EPT when vSphere 4 becomes publicly available.  These technologies--which we call hardware assist--provide value to the part of ESX we call the monitor.&lt;br /&gt;
&lt;br /&gt;
As we prepare for vSphere's general availability we're generating a lot of documentation to help people get the most out of the new version of ESX.  One of my colleagues started a document that details the role of the monitor and how it flexibly uses different hardware assist technologies.  I've summarized the default behavior of our monitor in several situations in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9882"&gt;ESX Monitor Modes&lt;/a&gt;.  Of course vSphere's users will be able to override these defaults if they want to experiment with their workloads.&lt;br /&gt;
&lt;br /&gt;
I wanted to include a textual summary of the role of the monitor in virtualization but found myself getting bogged down with the writing.  So, I thought I'd try something new.  Let me know what you think of this short video clip explaining the role of the monitor and how it might leverage hardware assist.&lt;br /&gt;
&lt;br /&gt;
{youtube}&lt;a class="jive-link-external" href="http://www.youtube.com/watch?v=PYqsxIE5P-U"&gt;http://www.youtube.com/watch?v=PYqsxIE5P-U&lt;/a&gt;{youtube}</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">monitor</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vsphere</category>
      <pubDate>Wed, 29 Apr 2009 00:57:40 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/04/29/the-role-of-the-esx-monitor</guid>
      <dc:date>2009-04-29T00:57:40Z</dc:date>
      <clearspace:dateToText>6 months, 3 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Storage Performance: VMFS and Protocols</title>
      <link>http://communities.vmware.com/docs/DOC-9696</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
VMware's customers are always asking us about the storage stack.  Without exception, the two most common questions about our storage system performance are:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Which storage protocol performs best?&lt;/li&gt;
&lt;li&gt;Does VMFS scale to meet the demands of many servers and VMs?&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
This document will contain a few of the points needed to help understand this issue.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Storage Protocols&lt;/h1&gt;
VMware published a &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/storage_protocol_perf.pdf"&gt;paper comparing storage protocols&lt;/a&gt; in 2008.  This paper detailed the two key characteristics of ESX's storage stack:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;The hypervisor is easily able to drive the storage connection to link speed.&lt;/li&gt;
&lt;li&gt;Configurations where protocol management happens in the HBA (Fibre Channel and HW iSCSI) are more CPU efficient.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
On the first note, take the following graph, taken from page three of the paper:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/5618/protocol_throughput.png" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/5618/protocol_throughput.png" class="jive-image"  /&gt;  &lt;br /&gt;
&lt;br /&gt;
Note that in this case all four test cases drive the storage to link speed.  That's 2 Gb/s with the Fibre Channel HBA and 1 Gb/s with the other three.  In short, if throughput is your goal, make decisions based on link speed.  If you check through the rest of the paper, you'll see that response time is similar for all of the configurations, as well.  But you will see slight differences in throughput in some of the protocols.&lt;br /&gt;
&lt;br /&gt;
This brings us to the second point from above: less work is done by the CPU when protocol management can be off-loaded to the HBA.  This means that FC and HW iSCSI HBAs will have additional CPU cycles for the VMs' work.  It can also explain the slight differences in throughput in the other graphs in the paper.  The efficiency results quoted in the paper are here:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9696-5-5619/protocol_efficiency.png" alt="protocol_efficiency.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9696-5-5619/protocol_efficiency.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
The increased overheads of running software iSCSI or NFS are due to the VMkernel managing those protocols.  It's worth noting that the proliferation of iSCSI in the enterprise has led VMware to spend considerable effort to improve the efficiency of SW iSCSI.  Expect its efficiency to improve dramatically in the following releases. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;VMFS Scalability&lt;/h1&gt;
Many in the industry erroneously believe that VMFS won't scale as storage demands grow.  Often SCSI reservations and disk locking are cited as the technical-sounding but vaguely-supported reason for this claim.  It's worth sampling data from our &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/scalable_storage_performance.pdf"&gt;scalable storage performance paper&lt;/a&gt; to debunk this myth.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9696-5-5623/vmfs_scalability.png" alt="vmfs_scalability.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-9696-5-5623/vmfs_scalability.png');return false;"/&gt;  &lt;br /&gt;
&lt;br /&gt;
This chart is a favorite in our world-wide tours as we address VMFS scalability.  It's was first introduced in a VMFS scalability blog article that went live in February of 2008.  It shows the results of using 64 hosts to generate a variety of traffic on a single VMFS volume.  And it's a wealth of information on VMFS and storage access patterns.  For instance:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;The aggregate number of random writes, in cyan in the middle, maintains perfectly flat linear scalability as the host count grows from 1 to 64.&lt;/li&gt;
&lt;li&gt;The aggregate number of random reads is initially limited by the few disks being accessed but ultimately matches the throughput of random writes as many disks come to bear to serve the large number of random reads.&lt;/li&gt;
&lt;li&gt;The sequential read activity, which highlights the strengths of today's arrays, demonstrates the largest total throughput which only slightly drops as the array manages so many connections.&lt;/li&gt;
&lt;li&gt;But the sequential read activity drops off dramatically as hosts are added.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
This last example showing degradation in aggregate sequential read capability is an artifact of the workload that is very important to database administrators: multiple sequential reads approximate random activity.  Why is this?  As many hosts request more and more sequential data, the array interleaves these requests to maintain response times.  This means that the sequential accesses get "shuffled" which results in a random access pattern.&lt;br /&gt;
&lt;br /&gt;
In short, VMFS has no scalability problems as many hosts drive tremendous amounts of traffic to a single volume.  If the data isn't convincing enough, consider the following: there are no SCSI reservations used during normal data access.  This means that there are no scalability limitations as a result of virtual machine storage access.  A word of caution, though: the file system is locked during administrative operations that change the metadata on the volume.  This means that virtual machine creation or destruction can will result in file system locks.  Perform these operations off of peak hours.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmfs</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">iscsi</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">fc</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">nfs</category>
      <pubDate>Fri, 13 Mar 2009 00:12:37 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-9696</guid>
      <dc:date>2009-03-13T00:12:37Z</dc:date>
      <clearspace:dateToText>7 months, 4 days ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for Performance</title>
      <link>http://communities.vmware.com/docs/DOC-5251</link>
      <description>VMware and our partners have published a wide variety of white papers on best practices for installation and configuration of enterprise applications on VI3. We'll collect additional material on this page as we build this content out. Keep checking back!&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Operating Systems&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5253"&gt;Windows Server 2003&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;SUSE Linux Enterprise Server&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5252"&gt;Red Hat Enterprise Linux&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5254"&gt;Ubuntu&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h2&gt;Applications&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5502"&gt;Best Practices for Web Servers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5503"&gt;Best Practices for Apache&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5504"&gt;Best Practices for IIS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-8964"&gt;Best Practices for SQL Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9671"&gt;Best Practices for IBM Lotus Domino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <pubDate>Fri, 16 May 2008 21:35:33 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5251</guid>
      <dc:date>2008-05-16T21:35:33Z</dc:date>
      <clearspace:dateToText>7 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Understanding ESX Memory Management at Partner Exchange 2009</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/04/10/understanding-esx-memory-management-at-partner-exchange-2009</link>
      <description>I recently attended a practice talk for next week's Partner Exchange hosted by &lt;a class="jive-link-profile" href="http://communities.vmware.com/people/kitcolbert"&gt;Kit Colbert&lt;/a&gt;, one of our senior engineers, who is leading a whole bunch of cool efforts around performance.  I wanted to "leak" one slide that his showed us that we'll be touching up for publication.  Some of you that are curious about memory counters and want a different take from &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5430"&gt;Memory Performance Analysis and Monitoring&lt;/a&gt; may find this interesting.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-2895-5729/guest_host_memory.png" alt="guest_host_memory.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-2895-5729/guest_host_memory.png');return false;"/&gt; &lt;br /&gt;
&lt;br /&gt;
Some of this stuff won't make sense outside of Kit's presentation, but let me point out a few things that may help consume the information in this incredible chart:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;One of the key messages from Kit's presentation is that ESX reports memory with respect to the guest (the VM) and the host.  The very top rectange shows memory stats reported for each VM.  The second rectangle shows the single VM's memory stats reported by each host.&lt;/li&gt;
&lt;li&gt;As can be seen from the above, the consumed memory in the host represents everything in the VM, minus the savings due to page sharing.&lt;/li&gt;
&lt;li&gt;This graph doesn't yet highlight the difference between ballooned memory and swapped memory from the guest perspective.  From the guest's perspective, swapped memory is much more attractive then ballooned memory, as the guest doesn't know that the swapped memory is gone.  But it does see the ballooned memory as pinned.  ESX is clever enough to deflate the balloon driver, if possible, when the guest starts to access swapped memory to avoid the host's swapping of guest memory.&lt;/li&gt;
&lt;li&gt;The final rectangle shows memory of all VMs from the host's perspective.  Don't pay attention to the reserved and unreserved memory; I'm told those are unnecessary distractions that will be removed.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
Kit is going to be in Orlando with me next week to talk about ESX and guest memory management.  He's going to explain the difficult process of recovering unused memory from guests to enable over-commitment.  Be sure and see him if you're in town!</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esx</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxi</category>
      <pubDate>Fri, 10 Apr 2009 00:22:15 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/04/10/understanding-esx-memory-management-at-partner-exchange-2009</guid>
      <dc:date>2009-04-10T00:22:15Z</dc:date>
      <clearspace:dateToText>7 months, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>SQL Server Performance Problems Not Due to VMware</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/03/13/sql-server-performance-problems-not-due-to-vmware</link>
      <description>Microsoft SQL Server runs at roughly 80% of native on VI3 in most benchmarked environments.  In production environments, and under loads that model those conditions, SQL Server runs at 90-95% of native on ESX 3.5.  I can say this with confidence despite a large amount of the industry's skepticism because I've spent so much time on SQL Server in the past half year.  I'd like to share some of my research on the subject and observations with you.&lt;br /&gt;
&lt;br /&gt;
Two weeks ago my colleague Chethan Kumar and I presented on SQL Server in Cannes, France for VMworld Europe 2009.  This presentation was the culmination of six months of investigation that was started at VMworld 2008 in Las Vegas.  At that event I heard so many concerns about SQL Server performance that I was resolved to identify the problems.  I talked with every customer I could find that claimed that SQL ran at anything less than 70% of native.  So many of these contacts claimed that they had measured SQL at 25% of native or worse, that I knew that something was going wrong.&lt;br /&gt;
&lt;br /&gt;
First, let me show you a slide that Chethan presented at the show in Cannes:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-2720-5630/sql_tuning.png" alt="sql_tuning.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-2720-5630/sql_tuning.png');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
Chethan spent three months investigating SQL Server to find out how much he could improve virtual performance from the "out of the box" experience.  As this figure details, the sum total of performance improvements was 15%.  Here's another break-down of these results:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-2720-5632/sql_tuning_summary.png" alt="sql_tuning_summary.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-2720-5632/sql_tuning_summary.png');return false;"/&gt;   &lt;br /&gt;
&lt;br /&gt;
The only option that we found in ESX to improve virtual performance was static transmit coalescing, which is documented on &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/specweb_perf_final.pdf"&gt;page four of one of our SPECweb papers&lt;/a&gt;.  Large pages and SQL's priority boost, which are best practices provided by Microsoft for SQL Server configuration, provide the largest gains in performance.&lt;br /&gt;
&lt;br /&gt;
The key messages that we communicated to our audience were that a properly running SQL Server should run at 80% of native or better.  In most production cases it can run at a performance indistinguishable from native speed.  And if performance is lagging, there don't exist many changes that can be made to ESX that can yield and performance gains at all.&lt;br /&gt;
&lt;br /&gt;
This begs the question: "If ESX can't be tuned to double SQL performance, what is causing these reports of terrible SQL Server throughput?"  The great majority of the problems are coming from mis-configured storage.  But a variety of other items such as poor hardware selection or use of the wrong virtualization software contribute to the confusion, as well.  I've been documenting these issues in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-8964"&gt;Best Practices for SQL Server&lt;/a&gt; on this community and will continue to update that document as more problems are discovered.&lt;br /&gt;
&lt;br /&gt;
If you have a SQL Server running un-virtualized in your environment, I'd like you to try virtualizing it again.  Follow our best practices document and pay close attention to your storage configuration during deployment.  I feel confident that once you've setup your environment properly, you're going to like what you see.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">sql</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">microsoft</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmworld</category>
      <pubDate>Fri, 13 Mar 2009 17:47:57 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/03/13/sql-server-performance-problems-not-due-to-vmware</guid>
      <dc:date>2009-03-13T17:47:57Z</dc:date>
      <clearspace:dateToText>8 months, 1 week ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>Best Practices for SQL Server</title>
      <link>http://communities.vmware.com/docs/DOC-8964</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
At VMworld 2008 in Las Vegas several of us in our virtual performance team met with a variety of customers to talk about Microsoft SQL Server. We already had a large base of customers running very many SQL Server DBs on our products and we wanted to collect information on the challenges posed in the process of virtualizing this critical workload. We were pleased to see that ESX Server handled SQL VMs with excellent performance. But, for many customers, the first efforts at virtualizing SQL didn't yield high-performing SQL VM.  After careful investigation and many, many discussions we've started to put together the puzzle as to where SQL Server performance problems come from.  This page will document these common problems, borrowing slides from our presentations on the subject.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Virtualizing SQL: The Checklist&lt;/h1&gt;
We've talked with dozens of customers in the past months to document the issues that resulted in poor SQL performance. Happily, none of the issues were due to underlying technologies. Here is a list of issues and an explanation of the impacts. These items are roughly listed in the order of decreasing likelihood of occurrence. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 1: Configure Storage Correctly&lt;/h2&gt;
Storage configuration problems are the number one cause of SQL performance issues.  Usually these problems arise because the DBA requests a virtual disk of the VI admin, the VI admin places the VMDK on a LUN that may or may not meet the DBA's performance needs.  For instance:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;VMs' VMDK files placed on VMFS volumes without enough spindles.&lt;/li&gt;
&lt;li&gt;Many VMDK files placed on a single VMFS volume which could use more spindles.&lt;/li&gt;
&lt;li&gt;Database and log files placed on the same LUN which, you guessed it, could use more spindles.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
This may be obvious to some, but this problem occurs again and again.  The VI administrator should be aware of a few technical items that can help understand and avoid this problem:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Based on the IO demands of the DB files, a certain number of spindles should be guaranteed to this file.  This means that its VMDK must be placed on a VMFS volume to accout for the SQL Server's demands and all of the other demands on that volume.&lt;/li&gt;
&lt;li&gt;Mixing sequential activity (such as log file update) and random activity (such as database access) results in random behavior.  This means that the LUN configuration in the pre-virtual physical environment may not be sufficient for the consolidated environment.  This is discussed some in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9696"&gt;Storage Performance: VMFS and Protocols&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;When storage isn't meeting the SQL Server's demands, the device latency or kernel latency (queueing time) will increase.  Read up on these counters in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5490"&gt;Storage Performance Analysis and Monitoring&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Item 2: Use Recent Hardware&lt;/h2&gt;
&lt;br /&gt;
Often companies that are dipping their metaphorical toes into&lt;br /&gt;
virtualization want to run proof-of-concept (POC) experiments to verify&lt;br /&gt;
that the virtual platform can meet their performance expectations. But&lt;br /&gt;
its surprising how many times these experiments are run on older,&lt;br /&gt;
poorly-performing hardware. Presumably the shiny, new systems were in&lt;br /&gt;
use for production applications so only the mothballed, cobweb-covered&lt;br /&gt;
servers from a previous generation were available for the POC. This&lt;br /&gt;
causes many problems.  Check out this slide from a talk on SQL Server at VMworld Europe 2009:&lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5626/newer_hardware.png" alt="newer_hardware.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5626/newer_hardware.png');return false;"/&gt;  &lt;br /&gt;
&lt;p /&gt;
The slide points out a couple of things. First, the larger caches and shorter pipelines on newer Intel processors results in a considerable drops in performance overheads.  Second, the latency of the VMEXIT instruction, which determines the amount of time it takes to transition from the VM to the VMkernel, has shrunk by a large amount with subsequent generations of hardware.  And don't forget the other additions from Intel and AMD such as hardware assisted memory management and IO virtualization. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 3: Follow SQL Server Best Practices&lt;/h2&gt;
&lt;br /&gt;
Microsoft has kindly provided a &lt;a class="jive-link-external" href="http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/storage-top-10.mspx"&gt;web page of best practices for SQL Storage configuration&lt;/a&gt;. These be practices should still be followed when configuring your virtual SQL deployments!&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 4: Configure VM Identically to Native and Run The Right Test&lt;/h2&gt;
&lt;p /&gt;
For many SQL Server POCs the goal is to measure the VM's ability to perform, with respect to the virtual platform. If this comparison is to be performed, its critical that the VM be configured identically to the physical hardware. Obviously this means that the VM should be run on the same hardware using identically configured LUNs. Its also important to ensure that the VM has the same number of vCPUs and amount of memory as the physical baseline. This means restricting the number of pCPUs and amount of memory with NUMPROC and MAXMEM, respectively, in boot.ini.&lt;br /&gt;
&lt;br /&gt;
It also means that the test being applied should be understood.  If a benchmark is chosen that uses a very small database, the content will be cached and the storage system won't be used.  This can skew the results and produce recommendations not consistent with production deployments.  Here is another slide from the same VMworld Europe 2009 presentation detailing some of what we know about the SQL Server benchmarking alternatives:&lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5628/sql_benchmarks.png" alt="sql_benchmarks.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5628/sql_benchmarks.png');return false;"/&gt; &lt;br /&gt;
&lt;p /&gt;
We at VMware prefer DVD Store.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 5: Use VMware's ESX Server&lt;/h2&gt;
&lt;br /&gt;
VMware's hosting products, VMware Server, VMware Workstation, and even VMware Fusion, are all capable of running SQL Server. But if the database is going to be run in production on enterprise-class hardware, use VMware's enterprise-class hypervisor: ESX Server.  These products are not often confused by the initiated but rogue members of large companies often run off-the-books proof-of-concept experiments on VMware's hosted products.  When they produce results they don't like, the results get spread throughout the company which can slow the virtual deployment.&lt;br /&gt;
&lt;p /&gt;
Consider the following data, again from the VMworld Europe 2009 SQL Server presentation:&lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5629/vmmark_esx_server.png" alt="vmmark_esx_server.png" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-8964-6-5629/vmmark_esx_server.png');return false;"/&gt;  &lt;br /&gt;
&lt;p /&gt;
This information is getting a bit dated now, as it was performed years ago on ESX Server 3.0.  But the point stands: before believing results claiming that "VMware cannot run SQL Server" its worth investigating the platform used to generate the results. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Item 6: Understand Memory Management and Configure Correctly&lt;/h2&gt;
Database performance is heavily dependent on the amount of memory available. Almost without exception, providing more memory to SQL Server will improve performance. However, if that memory is coming from a host that is already over-committed or is being provided through workarounds to 32-bit limitations, performance may suffer. Here are a few keys for SQL Server memory management:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;If more than 3 GB is desired, use 64-bit versions of the OS and application.&lt;/li&gt;
&lt;li&gt;If memory is over-committed on the box, set reservations for performance-critical SQL Server VMs to guarantee that those VMs' memory isn't ballooned or swapped out.&lt;/li&gt;
&lt;li&gt;If SQL Server's "lock pages in memory" parameter has been set, provide set the VM's reservations to the amount of memory in the VM. This setting can adversely interfere with ESX Server's balloon driver. Setting reservations will stop the balloon driver from inflating into the VM's memory space.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h2&gt;Item 7: Align Disk Partitions&lt;/h2&gt;
This item is really a special but very important case of item two, follow best practices. Partition alignment can impact storage performance which can be critical to some SQL Server VMs' performance. See VMware's &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/esx3_partition_align.pdf"&gt;paper on partition alignment&lt;/a&gt; for more information on this.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Whitepapers&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/SQL_Server_consolidation.pdf"&gt;SQL Server Workload Consolidation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/SQLServerWorkloads.pdf"&gt;SQL Server Performance in a VMware Infrastructure 3 Environment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/benchmarking_micrsoft_sql_vmware_esx_server_wp.pdf"&gt;Benchmarking Microsoft SQL Server Using VMware ESX Server 3.5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.dell.com/downloads/global/solutions/vmware_1955.pdf"&gt;VMware VMotion Performance on the Dell PowerEdge 1955 Blade Server&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">sql</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">windows</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">cpu</category>
      <pubDate>Mon, 01 Dec 2008 20:45:05 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-8964</guid>
      <dc:date>2008-12-01T20:45:05Z</dc:date>
      <clearspace:dateToText>8 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Virtual Desktop Benchmarking</title>
      <link>http://communities.vmware.com/message/1176800</link>
      <description>This is helpful do you have an example of the number of users and the size of the network connectivity even if its a rough target? At some point there has to be a rough allocation of bandwith per users. If we do not want to upgrade our networks what experiance and how many users can we get on that network? Is it 100Kbps, 500Kbps, 1.5Mbps, 100Mbps?</description>
      <pubDate>Thu, 19 Feb 2009 16:03:02 GMT</pubDate>
      <author>wponder</author>
      <guid>http://communities.vmware.com/message/1176800</guid>
      <dc:date>2009-02-19T16:03:02Z</dc:date>
      <clearspace:dateToText>9 months, 5 days ago</clearspace:dateToText>
      <clearspace:replyCount>4</clearspace:replyCount>
    </item>
    <item>
      <title>Building Block Architecture for Superior Performance</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/02/17/building-block-architecture-for-superior-performance</link>
      <description>If any of you have heard me speak in the numerous events I've done in the past two years, you may have heard me detail the areas where virtualization performance can exceed native.  There are scalability limitations in traditional software that make nearly every enterprise application fall short of utilizing the cores that are available to them today.  As the core explosion continues, this under-utilization of processors will worsen.  Here is a graph that we've been showing to illustrate that point:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/5369/core_explosion.png" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/5369/core_explosion.png" class="jive-image"  /&gt;  &lt;br /&gt;
&lt;br /&gt;
In 2008 I visited VMworld Europe and showed on using multiple virtual machines on a single physical host could circumvent the limitations in today's software.  In that experiment we showed that &lt;a class="jive-link-external" href="http://blogs.vmware.com/performance/2008/02/16000-exchange.html"&gt;16,000 Exchange mailboxes could be fit on a single physical server&lt;/a&gt; when no one had ever put more than 8,000 on in a single native instance.  We called this approach designing by "building blocks" and were confident that as the core count continued to increase, we'd continue to expose more applications whose performance could be improved through virtualization.   &lt;br /&gt;
&lt;br /&gt;
On Thursday last week SPEC accepted VMware's submission of a SPECweb2005 result.  And last night we posted &lt;a class="jive-link-external" href="http://blogs.vmware.com/performance/2009/02/vmware-sets-performance-record-with-specweb2005-result.html"&gt;an article on VROOM!&lt;/a&gt; detailing the experiment and providing information on the submission.  This submission is an incredible first for us: not only have we shown that we can circumvent limitations in web servers, but we posted a world record performance number in the process.  Of course, if any of you have seen Sreekanth Setty's presentation at VMworld on his ongoing work on SPECweb2005, this result wouldn't surprise you:&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/5370/specweb_scaling.png" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/5370/specweb_scaling.png" class="jive-image"  /&gt; &lt;br /&gt;
&lt;br /&gt;
Getting a benchmark standardization body like SPEC to approve these results isn't always easy.  Most of the industry remains stuck in a mode of thinking of performance as a single instance's maximum throughput.  But given the scale-out capabilities of a large number of enterprise applications I'd argue that benchmarking should account for scale-out capabilities on a single box.  VMware's customers follow this practice faithfully in sizing their deployments to match their needs and everyone wants to know the platform's ability to handle this use-case.  SPEC's willingness to accept results showing building blocks on a single host is commendable and progressive.  As more benchmarks approve submissions like these VMware will continue to be able to show record numbers.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">specweb</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">web</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">performance</category>
      <pubDate>Tue, 17 Feb 2009 16:21:46 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/02/17/building-block-architecture-for-superior-performance</guid>
      <dc:date>2009-02-17T16:21:46Z</dc:date>
      <clearspace:dateToText>9 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>SPECweb Record</title>
      <link>http://communities.vmware.com/message/1174394</link>
      <description>&lt;br /&gt;
Last Thursday SPEC accepted VMware's submission on a record-setting SPECweb2005 score.  Over the weekend we posted &lt;a class="jive-link-external" href="http://blogs.vmware.com/performance/2009/02/vmware-sets-performance-record-with-specweb2005-result.html"&gt;an article on VROOM!&lt;/a&gt; with some of the details on this experiment, which had been many months in the making.  I wanted to write some thoughts on the SPEC results (&lt;a class="jive-link-blogpost" href="http://communities.vmware.com/blogs/drummonds/2009/02/17/building-block-architecture-for-superior-performance"&gt;Building Block Architecture for Superior Performance&lt;/a&gt;) and gauge the industry's thoughts.&lt;br /&gt;
&lt;p /&gt;
Is anyone surprised by this performance record?  Do any of you continue to run web servers on native, unvirtualized hardware?  Any ideas where this building block approach might be able to extract more throughput out of a software architecture than the traditional scale-up model?&lt;br /&gt;
&lt;p /&gt;
Scott</description>
      <pubDate>Tue, 17 Feb 2009 16:43:09 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/message/1174394</guid>
      <dc:date>2009-02-17T16:43:09Z</dc:date>
      <clearspace:dateToText>9 months, 1 week ago</clearspace:dateToText>
    </item>
    <item>
      <title>Standardization of Virtual Desktop Benchmarking</title>
      <link>http://communities.vmware.com/blogs/drummonds/2009/02/09/standardization-of-virtual-desktop-benchmarking</link>
      <description>For years now VMware has been providing products that enable a virtual desktop experience.  Historically, this would occur in virtual desktops on our hosted products but in some cases virtualization of Citrix XenApp (formerly presentation server) could provide a large number of desktops off a single virtual machine.  And more recently VMware View offers a means of hosting a large number of desktops on a single server where each is granted its own operating system instance.  As the number of virtual desktops and alternatives for implementing virtual desktops has grown, the need for a benchmark that can compare the performance of these alternatives has arisen.&lt;br /&gt;
&lt;br /&gt;
Desktop benchmarking is not new to the industry, as people have been using PCs for decades.  But standards in virtual desktop benchmarking are non-existent.  Some might argue that traditional tools, common to PCs for years, should be used.  But there are several reasons why this is not true:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Pre-virtual desktop benchmarking is built to completely saturate all memory and CPU resources provided.  Fully saturating CPU on a single multi-way VM, as an example, results in far fewer VMs per host than is common in VDI deployments.  Fewer VMs means less work for the hypervisor's scheduler.&lt;/li&gt;
&lt;li&gt;Existing desktop benchmarks are often throughput-based, as opposed to latency-based.  Because existing tools want to differentiate between powerful processors and large amounts of memory, they're designed to pack more and longer instructions in each run than is common in virtual desktop deployments.  Most desktop deployments won't run massive video renders but the response times of individual button clicks and window appearance is critical.&lt;/li&gt;
&lt;li&gt;No existing benchmarks are aware of the peculiarities of VM-based timing.  VDI benchmarks need to be aware of this by either using host timing or invoking and measuring operations from remote, non-virtual locations.&lt;/li&gt;
&lt;/ol&gt;
At VMworld 2008 VMware presented a VDI workload that had been constructed from a collaborative effort between all VDI teams within VMware with review and qualification by several of our partners.  The first measurements on this workload came from Dell and EqualLogic and we quickly made details on its characteristics available via &lt;a class="jive-link-external" href="http://www.vmware.com/resources/techresources/1085"&gt;white paper&lt;/a&gt;.  Key features of this workload include:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;A diverse set of applications (Word, PowerPoint, Excel, Acrobat, and Internet Explorer) common to business desktop deployments.&lt;/li&gt;
&lt;li&gt;Load generation modeled after the most common VDI deployments.&lt;/li&gt;
&lt;li&gt;Small (less than 500 ms) operation generation and measurement.&lt;/li&gt;
&lt;li&gt;Host-based measurement and an architecture to support remote command invocation in the next release.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
As an attempt at the world's first VDI benchmark, we're very pleased with our efforts.  We found that it met the unique requirements in measuring virtual desktops of all kinds.  And since it was generated with large group of internal collaborators and multiple partners, it's an excellent beginning at what the industry needs to standardize this process.&lt;br /&gt;
&lt;br /&gt;
But today we realize that its just a beginning.  I want to encourage everyone to bring your comments to VMware via this blog or the performance forums on what you think the characteristics of a industry standard virtual desktop benchmark should be.  We'll never make one benchmark that meets everyone's needs and I suspect that there are even some common needs that will require significant development resources.  But I expect that with your guidance and assistance in refining this workload we'll accelerate the process of getting this benchmark in a shape that the industry can embrace</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vdi</category>
      <pubDate>Fri, 06 Feb 2009 04:00:17 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2009/02/09/standardization-of-virtual-desktop-benchmarking</guid>
      <dc:date>2009-02-06T04:00:17Z</dc:date>
      <clearspace:dateToText>9 months, 2 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>Ready Time</title>
      <link>http://communities.vmware.com/docs/DOC-7390</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
Ready time is as important as it is confusing.  I'm going to collect a few thoughts on ready time in this collection point with the hopes that some of the confusion around this important part of virtual system performance can be eliminated. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Details&lt;/h1&gt;
Stated simply, ready time is the amount of time a VM wants to run but has not be provided CPU resources on which to execute.  Somewhat confusingly, ready time is reported in two different values between esxtop and VirtualCenter.  In esxtop is reported in an easily-consumed percentage format.  A number of 5% means the VM spent 5% of its last sample period waiting for available CPU resources.  In VirtualCenter ready time is reported as a time measurement.  In VC's real-time data, which produces sample values every 20,000 ms, a number of 1,000 ms is reported for a 5% ready time.&lt;br /&gt;
&lt;br /&gt;
There is so much more to know about ready time that I'm not going to reproduce here.  Read the &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/esx3_ready_time.pdf"&gt;whitepaper on the subject&lt;/a&gt;  for more details.  There have been no changes in the details on ready time since ESX 3.0 that make that paper out-of-date. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Interpreting Ready Time Values&lt;/h1&gt;
The most common question we get on ready time is, "what ready time numbers constitute a problem?"  While there is no easy answer to this, we can offer some guidance on the acceptable values.  But before I lay that out, let me say that ready time should &lt;i&gt;not&lt;/i&gt; be the ultimate measurement of system performance.  As always, user experience and latency should be.  There are some situations where user experience is horrible on a system with no load and virtually zero ready time.  This could happen with a mis-configured array, as an example.  And occasionally we see aggressively-consolidated hosts showing very high ready times that are meeting user needs.  There are no absolutes with ready time.&lt;br /&gt;
&lt;br /&gt;
But, there are a few general regions into which ready time values can be binned.  Note that these ready time values are per vCPU.  esxtop reports ready time for a VM once its been summed up across all vCPUs.  That means that 5% ready on each of four vCPUs will be reported as 20% ready at the VM level.  This is the high end of a very light amount of ready time.&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Value, per vCPU&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;r == 0%&lt;/td&gt;
&lt;td&gt;This doesn't happen.  The very presence of a hypervisor between the operating system and the hardware means that there is a non-zero ready time on all operations.  But on healthy systems this number is so small that end-users don't know their workload has been virtualized.  See the next section.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0% &amp;lt; r &amp;lt;= 5%&lt;/td&gt;
&lt;td&gt;This is the "normal" region for ready time.  Very small single digit numbers result in a minimal impact to user experience.  If performance problems exist on the system and ready time falls into this region, your problems lie elsewhere.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5% &amp;lt; r &amp;lt;= 10%&lt;/td&gt;
&lt;td&gt;In this region ready time is starting to be worth watching.  Most systems function healthily with ready time in this region but highly sensitive measurements may be suffering.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10% &amp;lt; r&lt;/td&gt;
&lt;td&gt;While some systems continue to meet expectations, double-digit ready time percentages often mean some action is required to address performance issues.  See the last section for guidance.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
Again, remember that VirtualCenter performance numbers must be re-calculated to percentages to find the category on the above table.  But since VC reports ready time per vCPU, no special arithmetic is needed to account for the number of vCPUs in the VM (as is needed with esxtop.)&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Causes and Correction&lt;/h1&gt;
There are two general areas that can cause unnecessarily high ready times:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Overloaded hosts.&lt;/li&gt;
&lt;li&gt;Excessive use of SMP.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Host Overloading&lt;/h2&gt;
The most common cause of high ready time is trying to get too much work out of too little hardware.  Consider the following simple case: on a hypothetical system with only one physical CPU, if two 1-way VMs are fully loaded by their users then each wants to have an entire CPU.  Because only one is available, ESX will time share that resource and give each of them only 50% of the CPU.  As a result, each VM will spend 50% of its time waiting for the processor.  This would be reported as 50% ready time.&lt;br /&gt;
&lt;br /&gt;
Often this condition is observable when ready time is high and total host CPU utilization is also very high.  The only fix for this is to back off the load on the system.  VMs should be migrated off or processor resources should be increased. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Excessive SMP&lt;/h2&gt;
In ESX Server 2.5, SMP guests had to be &lt;i&gt;co-scheduled&lt;/i&gt; to start at the exact same moment.  If a 2-way VM was ready to run but only one physical core was available, the VM would not be scheduled until a second core was freed up.  This would increase its ready time.  In ESX Server 3.0 and later versions, relaxed co-scheduling was introduced which meant that a subset of a VM's vCPUs could be scheduled ahead of others.  However, guest operating systems still require some degree of co-scheduling which means that the relaxation isn't absolute.  In short, increasing vCPUs still puts some burden on the scheduler to try and co-schedule the vCPUs that can increase ready time.  This is one ready why VMware advises only allocating vCPUs to VMs that are using them.  Read &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-4960"&gt;Co-scheduling SMP VMs in VMware ESX Server&lt;/a&gt;  for more information on co-scheduling.&lt;br /&gt;
&lt;br /&gt;
This condition is manifested by hosts that have sub-optimal CPU utilization and lots of SMP VMs.  A host may have a dozen 4-way VMs with each showing high ready time but only be at an aggregate 40% CPU utilization.  This is a clear sign that the scheduler is spending a great deal of time managing unneeded vCPUs.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">smp</category>
      <pubDate>Wed, 27 Aug 2008 17:50:36 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-7390</guid>
      <dc:date>2008-08-27T17:50:36Z</dc:date>
      <clearspace:dateToText>9 months, 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>New Documentation on esxtop</title>
      <link>http://communities.vmware.com/message/1136356</link>
      <description>&lt;p /&gt;
I attended the esxtop session at VMworld this past year and learned quite a bit but this looks like a very well made document which I have already saved into my folder of useful documentation.  I will definatly have to find time to read through it.  &lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
Thanks for some thorough documentation on esxtop!&lt;br /&gt;
&lt;p /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Kyle&lt;/li&gt;
&lt;/ul&gt;</description>
      <pubDate>Mon, 05 Jan 2009 19:11:56 GMT</pubDate>
      <author>khughes</author>
      <guid>http://communities.vmware.com/message/1136356</guid>
      <dc:date>2009-01-05T19:11:56Z</dc:date>
      <clearspace:dateToText>10 months, 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>4</clearspace:replyCount>
    </item>
    <item>
      <title>esxtop Performance Counters</title>
      <link>http://communities.vmware.com/docs/DOC-5240</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
This article contains a list of some of the performance counters provided by esxtop. This is far from exhaustive, as this list was created to answer the question: "which are the most important esxtop counters?"  Recently VMware has published an exhaustive list of esxtop information on &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-9279"&gt;Interpreting esxtop Statistics&lt;/a&gt;.  Check that out for more information.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;CPU Counters&lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Counter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%RDY&lt;/td&gt;
&lt;td&gt;The percentage of time that the world or group is waiting a processor to be available to execute its workload.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%USED&lt;/td&gt;
&lt;td&gt;The percentage of CPU that is used by that world or group.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GID&lt;/td&gt;
&lt;td&gt;Group ID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NWLD&lt;/td&gt;
&lt;td&gt;The number of worlds in the group. When this number is greater than one, the row can be expanded to get information on each world.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;h2&gt;Memory Counters&lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Counter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%ACTV&lt;/td&gt;
&lt;td&gt;Instantaneous view of the percentage of memory pages that have been used by the VM in the previous seconds. Unlike TCHD which counts pages by following working sets, %ACTV is a more frequently updated number that is based on a sample of the entire memory pool.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%ACTVS&lt;/td&gt;
&lt;td&gt;Slow moving average of the %ACTV counter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%ACTVF&lt;/td&gt;
&lt;td&gt;Fast moving average of the %ACTV counter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCTL?&lt;/td&gt;
&lt;td&gt;Set to "Y" when the balloon driver is active in the guest and "N" when not.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCTLSZ&lt;/td&gt;
&lt;td&gt;This counter reports the amount of memory that the balloon driver is currently holding for use by other VMs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MEMSZ&lt;/td&gt;
&lt;td&gt;The amount of memory (in MB) allocated to the VM at the time of its creation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NHN&lt;/td&gt;
&lt;td&gt;The NUMA home node. This is the node on which the VM is booted. Migrations that have occurred since the VM started running would result in this VM running on another node(s).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NMIG&lt;/td&gt;
&lt;td&gt;The number of NUMA node migrations since the VM was booted. ESX Server's scheduler should avoid NUMA migrations so if this number continues to climb during normal operations some tuning of the VMs may be required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NRMEM&lt;/td&gt;
&lt;td&gt;The amount of memory that exists on a remote NUMA node.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NLMEM&lt;/td&gt;
&lt;td&gt;The amount of memory that exists on the local NUMA node.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;N%L&lt;/td&gt;
&lt;td&gt;The percentage of the VM's memory that exists on the local NUMA node. N%L = NLMEM / (NRMEM+NLMEM)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OVHD&lt;/td&gt;
&lt;td&gt;The amount of memory used by the VMkernel to maintain and execute the VM.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHRD&lt;/td&gt;
&lt;td&gt;The amount of the VM's memory that is shared with other VMs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHRDSVD&lt;/td&gt;
&lt;td&gt;The amount of memory that was saved due to page sharing.  This number may be less than or equal to SHRD.  As one VM must always claim the single copy of a shared page, one VM with a shared page will not be able to claim savings.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWR/s&lt;/td&gt;
&lt;td&gt;The rate at which memory is being swapped in from disk.  High swap rates indicate a need for more memory in the cluster.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWW/s&lt;/td&gt;
&lt;td&gt;The rate at which memory is being swapped out to disk.  High swap rates indicate a need for more memory in the cluster.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TCHD&lt;/td&gt;
&lt;td&gt;The amount of memory (in MB) that has been touched (recently used) by the VM. In this case "recently" means within a minute or two.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h2&gt;Storage Counters&lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Counter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ABRTS/s&lt;/td&gt;
&lt;td&gt;The rate at which disk operations are being aborted. Abort commands are issued by the guest when the storage system has not responded within an acceptable amount of time (as defined by the guest OS or application.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACTV&lt;/td&gt;
&lt;td&gt;The number of IO operations that are currently active. This represents operations for which the host is processing and can serve as a snapshot view of storage activity. When this number hovers near zero, the storage system isn't being used. If is sustains non-zero numbers, the a constant interaction with the strorage is occurring.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAVG/cmd&lt;/td&gt;
&lt;td&gt;The average amount of time it takes a device (HBA, array, and everything in between) to service a single request.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GAVG/cmd&lt;/td&gt;
&lt;td&gt;The total latency seen from the VM when performing an IO operation. GAVG = DAVG+KAVG.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KAVG/cmd&lt;/td&gt;
&lt;td&gt;The average amount of time it takes ESX Server's VMkernel to service a disk operation. Since this number represents time spent by the CPU to manage IO and processors are orders of magnitude faster than disks, it should be much, much less DAVG.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;QUED&lt;/td&gt;
&lt;td&gt;The number of IO operations that require processing but have not yet be addressed. Commands are queued and awaiting management by the kernel when the driver's active buffer is full (see ACTV). Occasionally a queue will form and result in a small, non-zero QUED number but any significant (double-digit) average of queued commands means the storage hardware is unable to keep up with the host's needs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;READS/s&lt;/td&gt;
&lt;td&gt;The number of disk reads per second.  READS/s + WRITES/s = IOPS.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WRITES/s&lt;/td&gt;
&lt;td&gt;The number of disk writes per second.  READS/s + WRITES/s = IOPS.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h2&gt;Network Counters&lt;/h2&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Counter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%DRPRX&lt;/td&gt;
&lt;td&gt;The percentage of packets that were dropped that was supposed to be received.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;%DRPTX&lt;/td&gt;
&lt;td&gt;The percentage of packets that were dropped for which transmission was attempted.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MbRX/s&lt;/td&gt;
&lt;td&gt;The megabits per second that are received at the network item.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MbTX/s&lt;/td&gt;
&lt;td&gt;The megabits per second that are transmitted from the network item.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">memory</category>
      <pubDate>Fri, 16 May 2008 18:59:39 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5240</guid>
      <dc:date>2008-05-16T18:59:39Z</dc:date>
      <clearspace:dateToText>10 months, 3 weeks ago</clearspace:dateToText>
    </item>
    <item>
      <title>DRS/DPM Video Live!</title>
      <link>http://communities.vmware.com/message/1094164</link>
      <description>&lt;br /&gt;
Scott,&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
Thanks.  Please send it to mikepodohert at aim dot com ( work email does not have the space for the file)&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;br /&gt;</description>
      <pubDate>Fri, 07 Nov 2008 22:25:54 GMT</pubDate>
      <author>mikepodoherty</author>
      <guid>http://communities.vmware.com/message/1094164</guid>
      <dc:date>2008-11-07T22:25:54Z</dc:date>
      <clearspace:dateToText>1 year, 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>5</clearspace:replyCount>
    </item>
    <item>
      <title>DPM Power/Performance Video</title>
      <link>http://communities.vmware.com/blogs/drummonds/2008/11/06/dpm-powerperformance-video</link>
      <description>Back in September the performance team here at VMware embarked on a project to measure power savings as a result of using VI3's distributed power management (DPM).  This feature, experimentally supported in VI3 will full support planned for the next release, leverages DRS to consolidate idle and lightly-loaded VMs onto as few servers as possible.  Once the workload has been consolidated to the bare minimum hardware required, spare servers are powered down.  The end result is the flexible performance due to automated load balancing and a halving of total power usage.&lt;br /&gt;
&lt;br /&gt;
The experiment that we performed was based on a workload derived from &lt;a class="jive-link-external" href="http://www.vmware.com/products/vmmark/"&gt;VMmark&lt;/a&gt;.  In fact, it was precisely the VMmark workload.  But the execution of the test against a cluster of systems makes the results invalid for comparison against other systems.  VMmark run rules require the test run against VMs on a single server.&lt;br /&gt;
&lt;p /&gt;
&lt;br /&gt;
We started the test with 13 tiles worth of VMs (108 VMs in all) on the DRS cluster.  With all of these VMs idle, DPM consolidated them to a single host and turned off three servers.  As the load was applied to the VMs at 9:00 AM and driven through an eight-hour workday, DRS and DPM powered on servers and balanced load, as needed.  When the day ended at 5:00 PM, the load was again consolidated and servers were powered down.  The video we shot includes power meters of the systems under test and screenshots of activity induced by DRS and DPM.&lt;br /&gt;
&lt;p /&gt;
&lt;br /&gt;
Check out the &lt;a class="jive-link-external" href="http://www.youtube.com/watch?v=7CbRS0GGuNc"&gt;video on YouTube&lt;/a&gt; and let me know what you think.  I'm considering recording some of the other amazing things that we're doing with our products and would love your feedback on what you'd like to see.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">dpm</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">vmmark</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">drs</category>
      <pubDate>Thu, 06 Nov 2008 22:18:05 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2008/11/06/dpm-powerperformance-video</guid>
      <dc:date>2008-11-06T22:18:05Z</dc:date>
      <clearspace:dateToText>1 year, 2 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>VMkernel Scheduler</title>
      <link>http://communities.vmware.com/docs/DOC-5501</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
Details on the ESX Server scheduler are commonly requested when I engage customers and partners.  People want to know more about how the scheduler works, when SMP should be used, and what the deal is with SMP co-scheduling.  This page will answer these questions and others as they arise in the forum or the discussion portion of this page.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Terminology and Architecture&lt;/h1&gt;
In VMware parlance, the monitor is the part of our products that provides a virtual interface to the guest operating systems.  The VMkernel is the part of our products that manages interactions with the devices, handles memory allocation, and schedules access to the CPU resources, among other things.  This is shown in the following figure. &lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3049/multi_mode_monitor.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3049/multi_mode_monitor.JPG" class="jive-image"  /&gt;&lt;br /&gt;
&lt;br /&gt;
This document will provide information on one part of the VMkernel: the scheduler.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Performance Scaling and the Scheduler&lt;/h1&gt;
It is a critical requirement for enterprise deployments that an operating system provide fast and fair access to the underlying resources.  As a critical part of this design, the scheduler has undergone countless engineer-years of development to guarantee that this requirement is met.  We've now released dozens of papers showing linear scaling of workloads as vCPU count is scaled up within a single VM and VM count is scaled up within a single host.  Here are a few such papers that contain supporting data.&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/Oracle_Scaling_in_ESX_Server.pdf"&gt;VM scaling&lt;/a&gt;  as demonstrated by Oracle databases.&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/db2_scalability_wp_vi3.pdf"&gt;VM and vCPU scaling&lt;/a&gt;  under IBM DB2 load.&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/SQLServerWorkloads.pdf"&gt;VM and vCPU scaling&lt;/a&gt;  with SQL Server running in the VM.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
The scheduler's ability to fairly scale up to and beyond totally committed CPU resources is no accident.  In fact, in a conversation I had with a QA manager I was assured that the VMkernel's scheduler would fairly distribute CPU resources to all VMs at least up to 4x CPU overcommitment.  Of course, on a system with the CPU over-committed by 4x each VM will only run at 1/4 native speed but the scheduler keeps the VMs running at that performance.  Not one at 1/8 speed, one at 1/10 speed, and another at 1/4 speed.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;SMP and the Scheduler&lt;/h1&gt;
As ESX Server supports uniprocessor (UP) and symmetric multiprocessor (SMP) VMs, the fair-and-fast requirement for the scheduler must be upheld in the presence of concurrently executing UP and SMP VMs.  Internal testing of this requirement shows fair scheduling even in the presence of concurrently executing 1-way, 2-way, and 4-way VMs.&lt;br /&gt;
&lt;br /&gt;
In fact, the ability to fairly execute under such environments is a very tricky problem for a scheduler.  We've run analysis on competitors' products and found that the ability to fairly balance differently-sized VMs is something of which ESX Server alone is capable.  Stay tuned in the coming months as we back this claim up with performance data. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;Cell Size&lt;/h2&gt;
&lt;br /&gt;
One construct that assists the scheduler in optimally placing VMs on a heavily utilized system is a cell.  A cell is a logical grouping of a subset of CPU cores in the system.  In ESX 3 versions the cell size is equal to four.  Since the cell is statically assigned to physical cores, this means that each four-core processor is in exactly one cell.  When only dual-core processors are present, a cell is comprised of two sockets.  The most important thing to know about cells is the following:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;&lt;i&gt;A VM cannot span more than one cell.&lt;/i&gt;&lt;/blockquote&gt;
&lt;br /&gt;
This means that four-way VMs run on only one socket at a time in systems with quad-core CPUs.  For this case, the number of options presented to the scheduler is equal to the number of sockets.  In future versions of ESX we plan to increase the cell size to eight.  In some cases (such as systems with hexa-core CPUs) a modification of the cell size can improve performance.  See &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1007361"&gt;KB article 1007361&lt;/a&gt; for more information. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;UP or SMP?&lt;/h2&gt;
When and if to use SMP is a common question from VMware users.  The simple answer to this is to only use SMP when needed.  Why only use SMP when needed?  There are two reasons:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;SMP schedulers are less efficient than UP schedulers.  This is a simple experiment that can be confirmed with trivial benchmarks like Netperf or Passmark.  On UP systems (either virtual or native) the UP hardware abstraction layer (HAL) will provide marginally better results than the SMP HAL.&lt;/li&gt;
&lt;li&gt;Even when unused, virtualization of idle vCPUs requires resources by the kernel.  Memory is needed to maintain data structures and CPU resources are needed to virtualize the idle system.  The amount of work needed to support an idle CPU varies greatly but usually is in the realm of 1-2% of a single CPU core.&lt;/li&gt;
&lt;li&gt;The work required to deliver timer interrupts increases quadratically with the number of vCPUs, like RHEL5, the number of timing interrupts delivered by the VMkernel can be quite high.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5252"&gt;Red Hat Enterprise Linux&lt;/a&gt;  for more information on this issue.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h2&gt;What About Co-scheduling? &lt;/h2&gt;
Back in the days of ESX Server 2.5, SMP VMs had to have their vCPUs co-scheduled at the same instant to begin running.  Because only 2-way VMs were supported at this time, that meant that two CPU cores had to be available simultaneously to launch a 2-way VM.  On a server with a total of only two cores, this meant that the VM could not be launched concurrently with any other process on the server.  This would include the service console, the web interface, or any other process.&lt;br /&gt;
&lt;br /&gt;
This requirement was reduced in ESX Server 3.0 through a process called relaxed co-scheduling.  Effectively SMP VMs can have their vCPUs scheduled at slightly different times and idle vCPUs didn't necessarily have to be scheduled concurrently with running vCPUs.  More details on this are available in the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-4960"&gt;Co-scheduling SMP VMs in VMware ESX Server&lt;/a&gt; page.  &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;NUMA Considerations&lt;/h2&gt;
Support for non-uniform memory access (NUMA) architectures was introduced in ESX Server 2.  This meant that the scheduler became aware that memory was not uniform across each CPU.  Each CPU node had access to its own local memory and a larger pool of remote memory (which was divided as local memory for the other CPU nodes.)  Memory access to local memory is much faster than remote memory so the scheduler should favor the placement of processes on nodes that held the processes' memory.&lt;br /&gt;
&lt;br /&gt;
Subsequent generations of ESX Server continued to optimize for the use of NUMA memory.  This included placement of vCPUs next to needed memory and startup of VMs at NUMA nodes with resources available for execution.  All of this is transparently handled by the scheduler but it should be noted that the newer your version of ESX Server, the better its NUMA scheduling is.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">numa</category>
      <pubDate>Tue, 27 May 2008 22:56:02 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5501</guid>
      <dc:date>2008-05-27T22:56:02Z</dc:date>
      <clearspace:dateToText>1 year, 3 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>8</clearspace:replyCount>
    </item>
    <item>
      <title>Memory Performance Analysis and Monitoring</title>
      <link>http://communities.vmware.com/docs/DOC-5430</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
This document is a living, up-to-date version of the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_analysis_methods_tn.pdf"&gt;performance analysis methods whitepaper&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
Host memory utilization represents the entirety of memory usage due to the VM and all tasks required by ESX Server to manage and provide control of the VMs.  Using ESX Server's monitoring capabilities there is no visibility into improper usage of configuration of memory within the guest.  Continue to use traditional monitoring tools in the guest to identify memory-hungry applications or shortages that lead to in-guest swapping.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Navigating esxtop&lt;/h1&gt;
As before, bring up esxtop to inspect system specifics.  Hitting the &amp;lsquo;m' key will display the memory counters.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5430-12-3050/esxtop-mem-main.JPG" alt="esxtop-mem-main.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5430-12-3050/esxtop-mem-main.JPG');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
Once running, the following can be observed from the esxtop report:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;The header data contains host data that impacts all VMs running on the host.  The physical memory row (PMEM) contains the total RAM installed on the system, the amount used by the console operating system (COS), the memory used by the kernel (VMK), and other statistics.&lt;/li&gt;
&lt;li&gt;The next few rows contains host-level memory statistics for various ESX subsystems:
&lt;ul&gt;
&lt;li&gt;VMKMEM: shows memory statistics for the ESX Server VMkernel&lt;/li&gt;
&lt;li&gt;COSMEM: displays the memory statistics as reported by the ESX Server service console.&lt;/li&gt;
&lt;li&gt;PSHARE: displays the ESX Server page-sharing statistics.&lt;/li&gt;
&lt;li&gt;SWAP: displays the ESX Server swap usage statistics.&lt;/li&gt;
&lt;li&gt;MEMCTL: displays the memory balloon driver statistics.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Relevant Counters &lt;/h1&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Type&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;VirtualCenter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;esxtop&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Details&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total memory size&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;MEMSZ&lt;/td&gt;
&lt;td&gt;The is the amount of memory that the VM has been sized to.  The VM will never get more than this but most of the time will be using far less than this amount due to sharing, ballooning, and swapping.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory target&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;SZTGT&lt;/td&gt;
&lt;td&gt;The amount of memory that the kernel would like to provide to the VM.  This number is calculated by on the guest's memory usage.  When memory is over-committed, it may not equal the amount of memory that is actually provided due to ballooning and swapping.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Granted memory&lt;/td&gt;
&lt;td&gt;mem.granted.average&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;The amount of memory that has been provided to the VM.  Memory is not granted to the VM until it has been touched once.  In the case of Linux, which does not zero out pages upon boot, a 4G VM will only be granted the small portion (100M or so) needed to run the OS until the OS or applications start to access more.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Touched memory&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;TCHD&lt;/td&gt;
&lt;td&gt;The amount of memory (in MB) that has been "touched" (read from or written to) in the past X minutes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consumed memory&lt;/td&gt;
&lt;td&gt;mem.consumed.average&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;The amount of machine memory allocated to the VM.  For instance, a Linux VM might have been sized to 4G.  Half of the pages may not yet have been used by the OS.  Perhaps 1G of this remaining 2G can be shared.  That leaves a consumed memory of only 1G.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared memory&lt;/td&gt;
&lt;td&gt;mem.shared.average&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;Shared memory represents the entire pool of shareable memory.  For instance, if two VMs each have 500M of identical memory, the shared memory is 1G.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared common memory&lt;/td&gt;
&lt;td&gt;mem.sharedcommon.average&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;Shared common memory represents the footprint in machine memory as a result of memory sharing.  For instance, if two VMs each have 500M of identical memory, the shared common memory is 500M.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active memory&lt;/td&gt;
&lt;td&gt;mem.active.average &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;%ACTV, %ACTVS, %ACTVF&lt;/td&gt;
&lt;td&gt;The amount of memory (as a percentage of the entire host's memory) that has been used by the VM in the past sample period.  %ACTVS and %ACTVF are slow and fast counters showing recent and long-term averages.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ballon driver usage&lt;/td&gt;
&lt;td&gt;mem.vmmemctl.average &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;MCTLSZ&lt;/td&gt;
&lt;td&gt;The amount of memory claimed by the balloon driver for us in other VMs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swap rate&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;SWW/s&lt;br /&gt;
&lt;br /&gt;
			SWR/s&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;The rates at which memory is swapped out (written) or in (read).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swap Totals&lt;/td&gt;
&lt;td&gt;mem.swapout.average,&lt;br /&gt;
			mem.swapin.average&lt;br /&gt;&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;These are cumulative amounts of swapping that has occurred since the VM was powered on.  It's important to check if swapin and swapout are increasing, rather than just seeing if they are nonzero.  Because if they are non-zero, it could be the result of swapping in the past, and not swapping at the present time.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NUMA migrations&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;NMIG&lt;/td&gt;
&lt;td&gt;The number of NUMA migrations that have occurred since the VM's creation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NUMA memory&lt;/td&gt;
&lt;td&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;NLMEM, NRMEM&lt;/td&gt;
&lt;td&gt;The amount of the VM's memory that is on the local and remote NUMA nodes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overhead&lt;/td&gt;
&lt;td&gt;mem.overhead.average &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;OVHD&lt;/td&gt;
&lt;td&gt;The amount of memory required by the VMkernel to maintain and execute the VM.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;h1&gt;Evaluate the Data&lt;/h1&gt;
Memory analysis on an ESX Server means not just investigation of server-side statistics but also a solid understanding of the application that is running in the VM.  When memory is short on the host, ballooning and swapping may be visible in esxtop, with swapping having a great impact on performance.  When memory is short within the VM the guest will swap. &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;How much memory are the VMs actually using?  While they may have been allocated large amounts of memory, its likely that the OS and applications are only using a small percentage of what the VM was assigned.  Check the active and touched memory counters for accurate numbers on guest memory usage.&lt;/li&gt;
&lt;li&gt;Is memory short in the host?  Swapping (SWW/s and SWR/s) is a certain sign of this problem.  Heavy use of the balloon driver may also suggest this but ballooning has a very slight impact to guest performance.&lt;/li&gt;
&lt;li&gt;Can memory deficiencies be addressed through VM resizing?  Checking memory usage of critical apps within the VMs can help inform decisions to decrease the amount of RAM provided to those VMs.  Some operating systems will expand to utilize all available memory at little or no value to the application.  Reducing the memory space and correcting over-sized caches frees up memory for other VMs.&lt;/li&gt;
&lt;li&gt;Is the collection of all VMs' active memory (TCHD or %ACTV) sustaining at an amount that exceeds the total available memory?  If so, then either more memory must be added to the host or VMs must be migrated to another DRS cluster.&lt;/li&gt;
&lt;li&gt;Are the guests swapping?  If the VM has been sized with too little memory then the guest OS will swap inside the VM.  This will appear to ESX Server as any other disk activity but should be investigated and solved with traditional OS analysis tools.&lt;/li&gt;
&lt;li&gt;Can NUMA migrations (NMIG) be seen on the system?  NMIG reports total migrations since the VM has been powered on.  If this number continues to climb then the VM is being migrated from node to node which most certainly degrades performance.&lt;/li&gt;
&lt;li&gt;Does the amount of memory located on a remote NUMA node (NRMEM) remain at a non-zero number?  This may be a sign that the VM has been sized to exceed the memory of a single NUMA node.  If the VM is using more memory than fits on a single node, some of its memory is certain to be located on a remote node.  Remote memory access is quite slow relative to local memory access.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Correct the System&lt;/h2&gt;
The prescriptive advice for memory shortages is fairly simple: use less memory or buy more.  The following recommendations are variations on this theme:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Verify that VMware Tools has been installed on every VM on the system and that the memory balloon driver has not been disabled.  (The balloon driver is always on by default and disabled manually through text-based advanced configuration in extremely rare cases.)  When provide the ability to balloon memory within the guests, ESX Server is able to take memory from VMs that are not using it and make it available to those that do need it.&lt;/li&gt;
&lt;li&gt;Provide more memory to the DRS cluster.  As total resources go up, VirtualCenter will balance VMs across the cluster so VMs that need the memory are able to get it.&lt;/li&gt;
&lt;li&gt;Set memory reservations to minimally provide the amount of memory required of the OS and critical applications.  This will allow for sustained, fast access for critical code and provide hints to VirtualCenter for optimal VM positioning across the DRS cluster.&lt;/li&gt;
&lt;li&gt;Make sure the amount of memory used by the VMkernel to maintain the VMs is acceptable.  This value, reported for each VM with the overhead counter (OVHD), is dependent on the memory size of the VM, the number of vCPUs provided to it, and whether or not it is executing a 64-bit OS.  Fewer VMs on the host, fewer aggregate vCPUs, and lower precision OSes (32-bit as opposed to 64-bit) will lower this number.  Reducing any of these in the cluster will free up resources for every VM in the cluster.&lt;/li&gt;
&lt;li&gt;Size VMs on NUMA systems to guarantee that each VM's memory will fit on a single node.  This means either decreasing the memory allocated to a VM or increasing the node memory size.&lt;/li&gt;
&lt;li&gt;Size guests appropriately according to their needs.  For example:
&lt;ol&gt;
&lt;li&gt;Depending on the access pattern of the data, databases may not benefit from the last doubling of cache size.  Experiment with smaller cache sizes and see if performance drops.  If not, decrease the VM's available memory so it can be used by other VMs.&lt;/li&gt;
&lt;li&gt;Check the guest OS's statistics for in-guest swapping.  Provide memory as its needed and pay attention to esxtop statistics to see if the additional memory provided generates a new bottleneck in the host.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h1&gt;Understanding Page Sharing&lt;/h1&gt;
One cannot fully optimize an ESX Server's memory without understanding the performance implications of page sharing.  VMware's page sharing algorithm was presented at EMC World 2008 as resulting in a 2% increase in CPU load.  But the benefits of page sharing have been demonstrated to provide overcommitment of memory safely to 2X and beyond.&lt;br /&gt;
&lt;br /&gt;
The value of page sharing can be seen int the following counters:&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;esxtop&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;VirtualCenter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Description&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHRD&lt;/td&gt;
&lt;td&gt;memory.shared&lt;/td&gt;
&lt;td&gt;The amount of memory in the VM that is sharable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHRDSVD&lt;/td&gt;
&lt;td&gt;&lt;i&gt;No equivalent.&lt;/i&gt;&lt;/td&gt;
&lt;td&gt;The amount of memory saved due to page sharing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;i&gt;No equivalent.&lt;/i&gt;&lt;/td&gt;
&lt;td&gt;memory.sharedcommon&lt;/td&gt;
&lt;td&gt;The size of the memory after redundant pages have been removed.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
Note that missing counters can be calculated using the other two.  Shared memory minus shared common memory equals shared savings.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;References&lt;/h1&gt;
The top-level &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt; paper.&lt;br /&gt;
&lt;br /&gt;
The &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt; index.&lt;br /&gt;
&lt;br /&gt;
The &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt; page.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">memory</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <pubDate>Fri, 23 May 2008 23:37:14 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5430</guid>
      <dc:date>2008-05-23T23:37:14Z</dc:date>
      <clearspace:dateToText>1 year, 1 month ago</clearspace:dateToText>
    </item>
    <item>
      <title>CPU Performance Analysis and Monitoring</title>
      <link>http://communities.vmware.com/docs/DOC-5420</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
CPU load is generated by the guest and its applications as well ESX Server as it provides a virtual interface to the hardware.  While the work performed by the host does result in some increase in load, the great majority of processing is due to the applications in the VM.  A solid understanding of the workload profile regardless of the virtual environment can assist CPU analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Check Utilization&lt;/h1&gt;
Invoke esxtop.  By default, it should show CPU utilization but pressing &amp;lsquo;c' will ensure this data is being displayed.  The following figure shows example data produced on a test system.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5420-6-2692/esxtop-cpu-main.JPG" alt="esxtop-cpu-main.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5420-6-2692/esxtop-cpu-main.JPG');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
Observe the following: &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;The PCPU(%) line in the header shows utilization for the processor(s) by core and in total.  The comma-delimited data first displayed shows core utilization followed by "used total" which averages utilization of all cores.&lt;/li&gt;
&lt;li&gt;The LCPU(%) line shows the percentage of CPU utilization per logical CPU. The percentages for the logical CPUs belonging to a package add up to 100 percent. This line appears only if hyperthreading is present and enabled.&lt;/li&gt;
&lt;li&gt;The CCPU(%) line shows the percentages of total CPU time as reported by the ESX Server service console. Use of any third party software, such as management agents and backup agents, inside the service console, may result in high CCPU(%) number.&lt;/li&gt;
&lt;li&gt;There is an idle world running whose %USED entry displays the amount of CPU cycles that remain unused.  If the idle world is reported at less than 100% utilization then only a fraction of one physical core remains for additional work.  As this number can max out at many hundreds of percentages (100% for each core) small numbers here represent heavily loaded systems.&lt;/li&gt;
&lt;li&gt;Check the utilization (%USED) of the interesting VMs.  The VMs are reported here with the names specified at their time of creation.  Like the idle row, utilization for each VM can exceed 100%.  A VM that was provided two vCPUs, as an example, can max out at 200% CPU utilization.&lt;/li&gt;
&lt;li&gt;Expand the group data for the VM that is most interesting.  This is done by hitting &amp;lsquo;e' and then entering the group ID number (GID) for the VM.  The figure below contains a CPU-expanded version for GID "30" in the previous figure.  Once expanded, esxtop will expand rows and provide counter data for every world in the group.  This includes:
&lt;ul&gt;
&lt;li&gt;vmmX:  For each vCPU provided to the VM, a virtual machine monitor (VMM) world is displayed.  This world will perform the majority of the work required to execute and virtualize the guest code (OS, application, and hypervisor).&lt;/li&gt;
&lt;li&gt;vcpu-X: A vcpu-X world is created to assist the VMM world for each vCPU.  Primarily this work revolves around the virtualization of the IO devices.&lt;/li&gt;
&lt;li&gt;mks: Mouse, keyboard, and screen interrupt servicing.&lt;/li&gt;
&lt;li&gt;vmware-vmx:  The VMX worlds assist in maintenance and communications with other worlds and should not represent a material portion of the group utilization.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5420-6-2693/esxtop-cpu-main-expanded.JPG" alt="esxtop-cpu-main-expanded.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5420-6-2693/esxtop-cpu-main-expanded.JPG');return false;"/&gt;&lt;br /&gt;
&lt;h1&gt;Evaluate the Data and Correct the System&lt;/h1&gt;
The general flow for evaluation starts by considering the system's load.  Is the system overloaded with too many VMs?  Is the guest using all of its vCPUs and simply requires more or faster processors?  Are all guests waiting for IO?  For example:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Check the PCPU(%) line to see if all cores' utilization is near 100%.  In this case the system is saturated.  If multiple VMs are competing for the CPUs, try to reduce the VMs on the system or find other means of decreasing the load on the system.  See "CPU Saturation of Host" below.&lt;/li&gt;
&lt;li&gt;See if the PCPU(%) line shows an unequal load across processor cores with some at saturation and some remaining near idle.  This would indicate applications within the VM utilizing all of the cores provided to them.  Increase its vCPU count, if possible, and verify that the guest is making use of the additional cores. If the application supports horizontal scalability, you may run multiple VMs to use the additional cores.  See "CPU Saturation of VM" below.&lt;/li&gt;
&lt;li&gt;If all CPUs remain underutilized, either the application in the VM is misconfigured or the VM is waiting for IO operations to complete.  See "Low CPU Utilization" below.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h2&gt;CPU Saturation of Host&lt;/h2&gt;
As stated above, both the PCPU(%) and %USED counters can be used to identify systems hosts that are using all physical CPUs.  It is possible, however, for the VMs on the system to be utilized nearly all of the processor cycles without actually requesting more that is available.  This near-saturation case is the sign of a heavily loaded system.&lt;br /&gt;
&lt;br /&gt;
A better sign of over-utilization on a host is ready time (%RDY).  When any world's ready time starts to climb, that world is spending the reported percentage of its time waiting for some CPU to become available for work.  Ready time above 10% is worth investigation and may be a sign of an over-utilized host.  For a more detailed discussion on ready time, see &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-7390"&gt;Ready Time&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Host saturation is a clear sign that too much work has been loaded onto a single server.  This is usually due to overly aggressive consolidation ratios.  Overcommiting CPU resources in this case will only worsen the performance. Consider the following remedies:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Verify that VMware Tools has been installed on every VM on the system.  In addition to many other benefits, VMware Tools provides a network driver (vmxnet) without which guest networking will be unnecessarily inefficient.&lt;/li&gt;
&lt;li&gt;Verify that the all systems in the DRS cluster are carrying load when the server of interest is overloaded.  If they aren't, increasing aggression of DRS algorithm and check VM reservations against other hosts in the cluster to ensure migrations will happen.  Lastly, increase the number of servers in the DRS cluster so VMs from this server can be migrated to servers with available resources.&lt;/li&gt;
&lt;li&gt;Increase the CPU resources available to the VMs by increasing or improving CPUs or cores on some of the systems in the DRS cluster.&lt;/li&gt;
&lt;li&gt;Set CPU reservations for the VMs that most need the processing power to guarantee that they get the CPU cycles they need.&lt;/li&gt;
&lt;li&gt;Ensure the newest version of ESX Server is being used.  The newer versions of ESX Server provide better efficiency and CPU-saving features such as TCP segmentation offload (TSO), large memory pages, and jumbo frames.&lt;/li&gt;
&lt;li&gt;Reduce the CPU resource footprint of running VMs.  As examples:
&lt;ol&gt;
&lt;li&gt;Decrease disk and or network activity for applications that cache data by increasing the amount of memory provided to the VM.  This may lower IO and reduce ESX Server's responsibility to virtualize the hardware.&lt;/li&gt;
&lt;li&gt;Assist CPU by replacing software I/O with dedicated hardware (such as iSCSI HBAs or TCP segmentation offload NICs).&lt;/li&gt;
&lt;li&gt;Reduce vCPU count for guests to only the number required to execute the workload.  For instance, a single-threaded application in a 4-way guest will only benefit from a single vCPU.  But the hypervisor's maintenance of the three idle vCPUs takes CPU cycles that could be used for other work.&lt;/li&gt;
&lt;li&gt;For VMs created using P2V conversion, analyze the VM resources as well as the applications running inside the VM. Stop the unnecessary services that may running inside the P2V'ed VM. Also reduce the number of vCPUs and memory count to only the number required to execute the workload.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
The easiest general comment for addressing CPU bottlenecks given correctly-configured VMs is to address processing power at the cluster level.  If VirtualCenter reports fully utilized CPUs for all hosts in the cluster, there is little possibility of avoiding a need to increase cluster resources or decrease VM count.&lt;br /&gt;
&lt;br /&gt;
One last nuance of virtual system tuning, mentioned in item 6c above, is the correct balancing of virtual CPU count.  Few applications fully utilize two or more vCPUs and many VMs are often committed to a special purpose with a single application.  The guest OS and the hypervisor must expend CPU cycles managing multiple vCPUs.  If the applications are not using them, the system efficiency as a whole will improve by reducing vCPU count for VMs.&lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;CPU Saturation of VM&lt;/h2&gt;
Like host CPU saturation, VM CPU saturation can be seen when the %USED for a VM is high.  Unlike host CPU saturation, the idle world may report a large amount of free computational resources and the VM's ready time (%RDY) may remain low.  This behavior can be seen when a single VM utilizes all of the processors allocated to it but additional CPUs remain unused on the host.  The VM's utilization of all of its vCPUs can be confirmed by expanding the VM's world on the CPU screen.  Once this has been confirmed, the following options are available:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Verify that VMware Tools has been installed on every VM on the system.  In addition to many other benefits, VMware Tools provides a network driver (vmxnet) without which guest networking will be unnecessarily inefficient.&lt;/li&gt;
&lt;li&gt;If possible, increase the number of vCPUs provided to the VM. As the application in the guest is successfully using all of its vCPUs, it may continue to scale as the vCPU count is increased.  Pay attention to the vmmX world for each vCPU after increasing vCPU count to verify that the VM is making use out of its newly provided resources.  As detailed in item 6c in the "CPU Saturation of Host" section, the addition of vCPUs imposes an overhead on the host whether they are being used or not.  So carefully assess the guest's needs to avoid unneeded vCPU count increases.&lt;/li&gt;
&lt;li&gt;If possible, you can power on multiple VMs running the same application. This will depend upon whether how well an application supports horizontal scalable configuration. It is possible that an application may perform better when running as multiple single vCPU Vms, rather a single SMP VM.&lt;/li&gt;
&lt;li&gt;Utilize faster processors.  As processor performance is continually increasing the option of upgrading processors or migrating the VM to systems with newer processors can provide more total throughput to the VM.&lt;/li&gt;
&lt;li&gt;Set CPU reservations for the VMs that most need the processing power to guarantee that they get the CPU cycles they need.&lt;/li&gt;
&lt;li&gt;Decrease the work as a result of running the VM.  As examples:
&lt;ol&gt;
&lt;li&gt;Decrease disk and or network activity for applications that cache data by increasing the amount of memory provided to the VM.  This may lower IO and reduce ESX Server's responsibility to virtualize the hardware.&lt;/li&gt;
&lt;li&gt;Assist CPU by replacing software I/O with dedicated hardware (such as iSCSI HBAs or TCP segmentation offload NICs).&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Low CPU Utilization&lt;/h2&gt;
Assuming performance problems have been confirmed, low CPU utilization is usually a sign of inefficiently designed datacenter architecture.  The design could be flawed in an individual VM or in the connectivity between various components.  The &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt;  will walk through investigation of system-level components such as memory and then system-wide components such as network and storage.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">cpu</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <pubDate>Fri, 23 May 2008 19:05:48 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5420</guid>
      <dc:date>2008-05-23T19:05:48Z</dc:date>
      <clearspace:dateToText>1 year, 2 months ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
    </item>
    <item>
      <title>Storage Performance Analysis and Monitoring</title>
      <link>http://communities.vmware.com/docs/DOC-5490</link>
      <description>&lt;br /&gt;
This document is a living, wiki version of the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_analysis_methods_tn.pdf"&gt;performance analysis methods whitepaper&lt;/a&gt; .  That document will ultimately be replaced with this one.&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
Storage often bounds the performance of enterprise workloads.  More so than CPU or memory performance investigation, traditional means of analysis continue to be sound for storage performance in virtual deployments.  This section will introduce the tools for identifying heavily-used resources and VMs that have high demands of their storage system.  Traditional correction methods will then apply.&lt;br /&gt;
&lt;br /&gt;
iSCSI storage using software initiators is not covered in this section.  When accessed through the hypervisor's iSCSI initiator or an in-guest initiator traffic will show up on the VMkernel network or the VM's network stack.  Check the Network section for more information.&lt;br /&gt;
&lt;h1&gt;Navigating esxtop&lt;/h1&gt;
As before, esxtop is the best place to start when investigating potential performance issues.  To view the disk adapter information in esxtop, hit the &amp;lsquo;d' key once it is running.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5490-10-3051/esxtop-disk-main.jpg" alt="esxtop-disk-main.jpg" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5490-10-3051/esxtop-disk-main.jpg');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
On ESX Server 3.5, the storage system can be displayed per VM (using &amp;lsquo;v') or per storage device (using &amp;lsquo;u').  But the same counters are displayed on each.  Look at the following items: &lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;For each of the three storage views:
&lt;ul&gt;
&lt;li&gt;On the adapter view (&amp;lsquo;d'), each physical HBA is displayed on a row of its own with the appropriate adapter name.  This short name may be checked against the more descriptive data provided through the Virtual Infrastructure Client to identify the hardware type.&lt;/li&gt;
&lt;li&gt;On ESX Server 3.5's VM disk view (&amp;lsquo;v'), each row represents a group of worlds on the ESX Server.  Each VM will have its own row and rows will be displayed for the console, system, and other less-important (from a storage perspective) worlds.  The groups' IDs (GID) match those on the CPU screen and can be expanded by pressing &amp;lsquo;e'.&lt;/li&gt;
&lt;li&gt;On ESX Server 3.5's disk device view (&amp;lsquo;u'), each device is displayed on its own row.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;As with the other system screens, the disk displays can have groups expanded for more detailed information:
&lt;ul&gt;
&lt;li&gt;The HBAs listed on the adapter display can be expanded with the &amp;lsquo;E' key to show worlds that are using those HBAs.  By finding a VM's world ID the activity due to that world can be seen on the expanded line with the matching world ID (WID) column.&lt;/li&gt;
&lt;li&gt;The worlds for each VM can be displayed by expanding the VM row on the VM disk view with the &amp;lsquo;e' key.&lt;/li&gt;
&lt;li&gt;The disk devices on the device display can be expanded to show usage by each world on the host.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Relevant Counters&lt;/h1&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Type&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;VirtualCenter&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;esxtop&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Details&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queued Disk Commands&lt;/td&gt;
&lt;td&gt;disk.queueLatency.average&lt;/td&gt;
&lt;td&gt;QUED&lt;/td&gt;
&lt;td&gt;Queued commands are queued in the kernel queue.  They are awaiting an open slot in the device driver queue.  A large number of queued commands means a heavily loaded storage system.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for information on queues.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queue Usage&lt;/td&gt;
&lt;td&gt;&lt;i&gt;Not available&lt;/i&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;%USD&lt;/td&gt;
&lt;td&gt;This counter tracks the percentage of the device driver queue that is in use.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for info on this queue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command Rate&lt;/td&gt;
&lt;td&gt;disk.commands.summation &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;ACTV&lt;/td&gt;
&lt;td&gt;VirtualCenter reports the number of commands that have been issued in the previous sample period.  esxtop provides a live look at the number of commands that are being processed at any one time.  Consider these counters a snapshot of activity.  But don't consider any number here "too much" until large queues start developing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HBA Load&lt;/td&gt;
&lt;td&gt;&lt;i&gt;Not available&lt;/i&gt; &lt;br clear="all" /&gt; &lt;/td&gt;
&lt;td&gt;LOAD&lt;/td&gt;
&lt;td&gt;In esxtop the LOAD counter tracks how full the device queues are.  Once LOAD exceeds one, commands will start to queue in the kernel.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for information on these queues.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage Device Latency&lt;/td&gt;
&lt;td&gt;disk.deviceReadLatency&lt;br /&gt;
&lt;br /&gt;
			disk.deviceWriteLatency&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;DAVG/cmd&lt;/td&gt;
&lt;td&gt;These counters track the latencies of the physical storage hardware.  This includes everything from the HBA to the platter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel Latency&lt;/td&gt;
&lt;td&gt;disk.kernelReadLatency&lt;br /&gt;
&lt;br /&gt;
			disk.kernelWriteLatency&lt;br /&gt;&lt;/td&gt;
&lt;td&gt;KAVG/cmd&lt;/td&gt;
&lt;td&gt;These counters track the latencies due to the kernel's command processing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total Storage Latency&lt;/td&gt;
&lt;td&gt;&lt;i&gt;Not available&lt;/i&gt;&lt;/td&gt;
&lt;td&gt;GAVG/cmd&lt;/td&gt;
&lt;td&gt;This is the latency that the guest sees to the storage.  It is the um of the DAVG and KAVG stats.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aborts&lt;/td&gt;
&lt;td&gt;disk.commandsAborted.summation&lt;/td&gt;
&lt;td&gt;ABRTS/s&lt;/td&gt;
&lt;td&gt;These counters track SCSI aborts.  Aborts generally occur because the array is taking far too long to respond to commands.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;h1&gt;Evaluate the Data&lt;/h1&gt;
It is important to have a solid understanding of the storage architecture and equipment before attempting to analyze performance data.  Consider the following questions: &lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Is the host or any of the guests swapping?  The guest's swap activity must be checked with traditional OS tools and the host can be checked with SWR/s and SWW/s counters detailed in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5430"&gt;Memory Performance Analysis and Monitoring&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Are commands being aborted?  This is a certain sign that the storage hardware is overloaded and unable to handle the requests in a manner in line with the host's expectations.  Corrective action could include hardware upgrades, storage redesign (increasing spindles on the RAID), or guest redesign.&lt;/li&gt;
&lt;li&gt;Is there a large queue?  While less dangerous than abortions, queued commands are similarly a sign that hardware upgrades or storage system redesign is necessary.&lt;/li&gt;
&lt;li&gt;Is the array responding at expected rates?  Storage vendors will provide latency statistics for their hardware that can be checked against the latency statistics in esxtop.  When the latency numbers are high, the hardware could be overworked by too many servers.  As examples, 2-5 ms latencies are usually a sign of a healthy storage system reading data on the array cache, 5-12 ms latencies reflecting a healthy storage architecture were data is being randomly read across the disk, and 15 ms latencies or greater possibly representing an over-utilized or misbehaving array.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h2&gt;Identifying a Slow Array &lt;/h2&gt;
&lt;p /&gt;
Its worth pausing at this moment to point out that 95% of all storage performance problems are not fixed in ESX.  Believe me, I (Scott) have been called into a dozen performance escalations where poor storage performance was blamed on the hypervisor and not a single one was being caused by ESX.  If you're seeing high latencies in VirtualCenter or esxtop to the storage device, its worth treating this problem as an array configuration issue.  Check ESX's logs for obvious storage errors, check array stats, and make sure that there are no fabric configuration problems.&lt;br /&gt;
&lt;p /&gt;
At the point of high storage latencies you shouldn't be using complex benchmarks to reproduce and solve this problem.  Go with Iometer and make certain you're doing an apples-to-apples comparison against a physical system (ideally dual-booted from the ESX server under test) to make sure of what your expected, non-virtual results are.  Check &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3961"&gt;Storage System Performance Analysis with Iometer&lt;/a&gt;  for information on using Iometer for problems like this.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Correct the System&lt;/h1&gt;
Corrections for these problems can include the following:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Reduce the guests and host's need for storage.
&lt;ol&gt;
&lt;li&gt;Some applications such as databases can utilize system memory to cache data and avoid disk access.  Check in the VMs to see if they may benefit from increased caches and provide more memory to the VM if resources permit.  This may reduce the burden on the storage system.&lt;/li&gt;
&lt;li&gt;Eliminate all possible swapping to reduce the burden on the storage system.  First verify that the VMs have the memory they need by checking swap statistics in the guest.  Provide memory if resources permit.  Next, as described in the "Memory" section of this paper, eliminate host swapping.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Configure the HBAs and RAID controllers for optimal use.  It may be worth reading &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for information on how disk queueing works.
&lt;ol&gt;
&lt;li&gt;Increase the number of outstanding disk requests for the VM by adjusting the "Disk.SchedNumReqOutstanding" parameter. For detailed instructions, check the "&lt;a class="jive-link-external" href="https://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_san_cfg.pdf#page=110"&gt;Equalizing Disk Access Between Virtual Machines&lt;/a&gt; " section in the "Fibre Channel SAN Configuration Guide".  This step and the following one must both be applied for either to work.&lt;/li&gt;
&lt;li&gt;Increase the queue depths for HBAs. Check the section "&lt;a class="jive-link-external" href="https://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_san_cfg.pdf#page=112"&gt;Setting Maximum Queue Depth for HBAs&lt;/a&gt; " in the "Fibre Channel SAN Configuration Guide" for detailed instructions.  Note that you have to set two variables to correctly change queue depths.  This step and the previous one must both be applied for either to work.&lt;/li&gt;
&lt;li&gt;Make sure the appropriate caching is enabled for the disk controllers.  You will need to the vendor provided tools to verify this.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;If latencies are high, inspect array performance using the vendor's array tools.  When too many servers simultaneously access common elements on an array the disks may have trouble keeping up.  Consider array-side improvements to increase throughput.&lt;/li&gt;
&lt;li&gt;Balance load across the physical resources that are available.
&lt;ol&gt;
&lt;li&gt;Spread heavily used storage across LUNs being accessed by different adapters.  The presence of separate queues for each adapter can yield some efficiency improvements.&lt;/li&gt;
&lt;li&gt;Use multi-pathing or multiple links in case the combined disk I/O is higher than a single HBA capacity.&lt;/li&gt;
&lt;li&gt;Using VMotion, migrate IO-intensive VMs across different ESX Servers, if possible.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Upgrade hardware, if possible.  Storage system performance often bottlenecks storage-intensive applications but for the very highest storage workloads (many tens of thousands of IOs per second) CPU upgrades at the ESX Server will increase the host's ability to handle IO.&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Resources&lt;/h1&gt;
Top-level performance analysis page: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt;  &lt;br /&gt;
&lt;br /&gt;
VirtualCenter performance counters: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
esxtop performance counters:  &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
&lt;a class="jive-link-external" href="https://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_san_cfg.pdf"&gt;Fibre Channel SAN Configuration Guide&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <pubDate>Tue, 27 May 2008 19:02:23 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5490</guid>
      <dc:date>2008-05-27T19:02:23Z</dc:date>
      <clearspace:dateToText>1 year, 2 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Large Memory Pages</title>
      <link>http://communities.vmware.com/docs/DOC-6912</link>
      <description>In ESX Server 3.5 VMware introduced support for large memory pages in the guest. Large memory pages, an architecteral feature available in x86 microprocessors for decades, can be used to improve performance on workloads that make use of them. With CPU, hypervisor, OS, and application support, throughputs can go up and CPU utilization can go down. Since applications such as Oracle databases and Java have been using large pages on Linux and Windows for years, the introduction of this support on ESX Server allows for increased gains in performance over previous virtual installs. VMware is currently the only virtualization vendor to support large pages.&lt;br /&gt;
&lt;br /&gt;
VMware's support for large memory pages is detailed in the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/large_pg_performance.pdf"&gt;Large Page Performance&lt;/a&gt; performance study. That paper includes data on throughput gains in SPECjbb. The results are duplicated here:&lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-6912-3-3334/specjbb_lp.JPG" alt="specjbb_lp.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-6912-3-3334/specjbb_lp.JPG');return false;"/&gt;&lt;br /&gt;
&lt;p /&gt;
&lt;br /&gt;
At LinuxWorld 2008 VMware presented further data on the value of large memory pages with Oracle databases. Here is a chart showing those gains with VMware binary translation (BT): &lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3335/swingbench_lp_bt.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3335/swingbench_lp_bt.JPG" class="jive-image"  /&gt;&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;p /&gt;
Data also shared at LinuxWorld on the value of large memory pages with AMD Rapid Virtualization Indexing (RVI; formerly called NPT): &lt;br /&gt;
&lt;p /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-6912-2-3336/swingbench_lp_rvi.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-6912-2-3336/swingbench_lp_rvi.JPG" class="jive-image"  /&gt;&lt;br /&gt;
&lt;p /&gt;
The gains derived from the presence of large pages in these Oracle/Swingbench results are atypical.  Large pages have been documented in numerous locations to provide benefits between 5-20% on most database applications.  The increases shown here (of up to 350%) are due to the specialized configuration which is less about demonstrating real-world application performance and more about stressing the underlying configuration to uncover strengths and weaknesses.</description>
      <pubDate>Fri, 08 Aug 2008 16:02:31 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-6912</guid>
      <dc:date>2008-08-08T16:02:31Z</dc:date>
      <clearspace:dateToText>1 year, 3 months ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>Storage System Performance Analysis with Iometer</title>
      <link>http://communities.vmware.com/docs/DOC-3961</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
&lt;a class="jive-link-external" href="http://www.iometer.org/"&gt;Iometer&lt;/a&gt;  is an open source tool originally developed by Intel that remains the simplest and best means of generating load on a system for performance analysis.  Because of minute fluctuations in the in-guest timers many in-guest benchmarks are unable to produce accurate results.  We at VMware have used Iometer for years and find its results to be accurate in most situations.  But see the disclaimer below for a word of caution.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-3961-8-2234/iometer_1.JPG" alt="iometer_1.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-3961-8-2234/iometer_1.JPG');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Figure 1.  Iometer with disk targets tab selected.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Figure 1 shows the topology and disk target tab for a particular VM.  As the manager only has one host visible beneath it and only one worker is present on another host, you can tell that this is a single server with only one processor.  Had Iometer been started on a 4-way box, four workers would be visible.  As the UI remains operational, dynamo workers can be connected from other hosts to drive load across multiple VMs that may be on multiple servers.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Configuring the Test &lt;/h1&gt;
For all Iometer tests, under "Disk Targets" always increase the "# of Outstanding I/Os" per target.  When left at the default value of &amp;lsquo;1', a relative low load will be placed on the array.  By increasing this number some the OS will queue up multiple requests and really saturate the storage.  The ideal number of outstanding IOs can be determined by running the test multiple times and increasing this number all the while.  At some point IOPS will stop increasing.  Generally an increase in return diminishes around 16 IOs/target but certainly more than 32 IOs/target will have no value due to the default queue depth in ESX.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-6490"&gt;Storage Queues and Performance&lt;/a&gt;  for more information on queues.  In Figure 1 you can see that "# of Outstanding I/Os defaults to 1."&lt;br /&gt;
&lt;br /&gt;
When choosing the system to test in the topology frame, the "Disk Targets" tab will provide options as to the storage target.  The options here include formatted disks (yellow) or unformatted disks (blue).  In the former case Iometer address the storage through the OS's file system (FS).  In the latter, direct calls are made to the hardware without using a FS.  Storage specialists are usually more interested in just the hardware so evaluation of unformatted LUNs (blue) is preferable.  There is some cost of virtualizing the OS's interface to the disk through the FS so formatting the disk with the correct FS and testing the yellow target can be helpful.  Figure 1 shows two testable drives: the yellow-iconed C drive on which the OS was installed and the blue-iconed unformatted drive that is preferable for benchmarking. &lt;br /&gt;
&lt;br /&gt;
Always make certain that the "maximum disk size" in the "Disk Targets" tab is larger then the available memory!  For instance, when testing a formatted disk, setting the maximum size to 200,000 sectors (or 100 MB) could be cached by the guest OSes in a VM provided 1 GB of RAM.  In this case all Iometer calls to storage will be intercepted and cached by the guest, host, or storage cache.  Setting the disk maximum disk size to a number at least four times greater than the memory available in the largest cache will avoid caching.&lt;br /&gt;
&lt;br /&gt;
Under the "Access Specifications" tab, choose a workload that matches the most interesting profile.  Real workloads that are dominated by database performance often randomly read and write small, fixed-size block IO.  SQL Server on Windows, for instance, uses 16K blocks, 66% read (which implies 34% write), and 100% random (thus 0% sequential).  Exchange 2007 uses a similar profile but with an 8K block size.  Oracle on Linux has flexibility to use the block size set when the file system was created.  Depending on the DB specialists needs, this can range from 2K to 64K but will again be random with a 2:1 read-to-write ratio.  Note: you can approximate this Linux performance on a Windows guest but do not run Iometer on Linux (see "Iometer On Linux" below.)&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;Application&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Block Size&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Randomness&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Read/write Ratio&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exchange 2003&lt;/td&gt;
&lt;td&gt;4K&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;60% read (40% write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exchange 2007&lt;/td&gt;
&lt;td&gt;8K&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;55% read (45% write)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL Server&lt;/td&gt;
&lt;td&gt;16K, 64K&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;66% read (34% write)&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
&lt;b&gt;Table 1.  Example Iometer profiles.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Note the number of workers that has been specified in under the manager.  This will default to one worker (thread) for each physical or virtual processor on the system.  In the event that Iometer is being used to compare native to virtual performance, make sure that the worker numbers match!  For instance, the work count will be one on a UP VM but four for the same native measurements if the system is quad-core.  Correct the native worker count by detaching workers.&lt;br /&gt;
&lt;br /&gt;
Before invoking the test, be aware of the potential impacts of data alignment.  VMware has demonstrated substantial differences in performance based on alignment of data on storage arrays in our &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/esx3_partition_align.pdf"&gt;partition alignment paper&lt;/a&gt;.   Make sure that partitions and virtual disks have been created by Virtual Center to guarantee that the partitions and files are properly aligned.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Disclaimer (or: When Not To Trust Iometer)&lt;/h1&gt;
As discussed in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5581"&gt;Time-based Measurements in Virtual Machines&lt;/a&gt;, the hypervisor may introduce some inaccuracy to in-guest time measurement.  The likelihood of measurement error increases as the server's load increases.  Generally servers that are using less than 30% of their available CPU resources can be trusted.  In the event that a large VM on a small server is driving all CPUs to high utilization, Iometer results may suffer from some inaccuracy.  This is rarely the case with Iometer runs.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Analyzing the Results&lt;/h1&gt;
As mentioned above, the results provided by Iometer tend to be trustworthy.  But for unimpeachable results use the analysis techniques provide by VMware.  See the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt;  that's already been dedicated to this subject.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">iometer</category>
      <pubDate>Thu, 27 Mar 2008 18:02:53 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-3961</guid>
      <dc:date>2008-03-27T18:02:53Z</dc:date>
      <clearspace:dateToText>1 year, 3 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Storage Queues and Performance</title>
      <link>http://communities.vmware.com/docs/DOC-6490</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
VMware recently published a paper titled &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/scalable_storage_performance.pdf"&gt;Scalable Storage Performance&lt;/a&gt;  that delivered a wealth of information on storage with respect to the ESX Server architecture.  This paper contains details about the storage queues that are a mystery to many of VMware's customers and partners.   I wanted to start a wiki article on some aspects of this paper that may be interesting to storage enthusiasts and performance freaks.&lt;br /&gt;
&lt;h1&gt;Two Important Queues&lt;/h1&gt;
Let's use the following figure as a starting point for this discussion.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3142/strorage_queues.JPG" alt="http://communities.vmware.com/servlet/JiveServlet/downloadImage/3142/strorage_queues.JPG" class="jive-image"  /&gt; &lt;br /&gt;
&lt;br /&gt;
For the purposes of this paper, I'm going to call the two different queue types the "kernel queue" and the "device driver queue".  The device driver queue is specified in the device itself and has historically been configured through Linux-like module commands in the console operating system.  More on that in "Changing Queue Depth" below.  The kernel queue should be thought of as infinitely long, for all practical purposes.  Any time the device driver queue gets full, commands to the storage will queue up in the kernel.&lt;br /&gt;
&lt;br /&gt;
Note that each LUN gets its own queue.  This means that when you change the queue depth in the device driver, you're changing the queue depths for many queues.  The underlying device (HBA) is going to have a hard limit on the number of active commands it will allow at one time.  This should be considered when setting queue depth.  If your HBA can support only 2,000 active commands but it is addressing 40 LUNs, a specified queue depth of 64 won't allow that many commands to all LUNs.  This being due to the fact that 64*40 = 2,560--which is more than the 2,000 maximum commands.  In practice this is rarely a concern, though, as rarely are so many LUNs being simultaneously addressed through a single HBA and so many outstanding commands being issued to these LUNs.&lt;br /&gt;
&lt;h2&gt;Device Driver Queue Function&lt;/h2&gt;
The device driver queue is used for a low-level interaction with the storage device.  It controls how many active, or "in flight", commands there can be at any one time.  This is effectively the concurrency of the storage stack.  Set the device queue to 1 and each storage command becomes sequential: each one must complete before the next starts.&lt;br /&gt;
&lt;br /&gt;
But if the device queue is left at its default of 32, as an example, 32 commands will be concurrently processed by the storage system.  All 32 will be shipped off to the storage device by the kernel and new commands are queued when completions arrive.&lt;br /&gt;
&lt;h2&gt;Kernel Queue Function&lt;/h2&gt;
The kernel queue can be thought of as kind of an overflow queue for the device driver queues.  But it's not just an overflow queue.  ESX Server contains all kinds of cool optimizations to get the most out of your storage. And these features apply to commands in the kernel queue only. Here are some examples of features provided to commands queued at the kernel queues:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Multi-pathing for failover and load balancing.&lt;/li&gt;
&lt;li&gt;Prioritization of storage activities based on VM and cluster shares.&lt;/li&gt;
&lt;li&gt;Optimizations to improve efficiency for long sequential operations.&lt;/li&gt;
&lt;/ol&gt;
There are others, as well.&lt;br /&gt;
&lt;h1&gt;Impacts of Queue Depths&lt;/h1&gt;
So, increasing queue depths in the device driver can greatly improve the performance of the storage at the device level. Decreasing the device driver queue will result in increases in usage of the kernel queues.  This decreases the device efficiency, but introduces opportunities for optimizations across multiple VMs and devices.  So, what's the right ratio of these two depths?  We think that the sweet spot lies with a depth 32 device driver queue.  That's why we've set 32 as the default device driver queue length.&lt;br /&gt;
&lt;br /&gt;
But your configuration and workloads may benefit from a change to this default queue depth.  I'll refer you to the aforementioned &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/scalable_storage_performance.pdf"&gt;storage paper&lt;/a&gt;  for information on when you might want to change the driver queue depth.  I'll just point out a couple of broad observations here:&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;With fewer, very high IO VMs on a host, larger queues at the device driver will improve performance.&lt;/li&gt;
&lt;li&gt;As the VM count grows and storage performance features--like shares, load balancing, failover, etc.--become more important, the default queue depth is best.&lt;/li&gt;
&lt;li&gt;With too many servers each having too large of device queues, your storage array could easily be overloaded and see its performance suffer.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Improving Storage Performance&lt;/h1&gt;
Now that we've covered how storage queuing works, you may be wondering how you can monkey around with these queue sizes for optimal performance.  I can tell you as someone that has been involved with many, many performance analysis projects that changing queue size is rarely a fix to an acute storage performance problem.  You should first go through the analysis techniques in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5490"&gt;Storage Performance Analysis and Monitoring&lt;/a&gt;.  That may or may not lead to changing queue depths.&lt;br /&gt;
&lt;br /&gt;
But, in the event that you do end up changing queue depths...&lt;br /&gt;
&lt;h2&gt;Changing Queue Depth&lt;/h2&gt;
We have a &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1267"&gt;helpful knowledge base article&lt;/a&gt;  that describes the process of changing the device driver queue.  Unfortunately, as of today (7/24/08) this document only describes how to change queues through the console operating system.  No information is provided for ESXi.  I've contacted the KB owner and will have that document updated ASAP.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">disk</category>
      <pubDate>Thu, 24 Jul 2008 00:51:16 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-6490</guid>
      <dc:date>2008-07-24T00:51:16Z</dc:date>
      <clearspace:dateToText>1 year, 3 months ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
    </item>
    <item>
      <title>Network Performance Analysis and Monitoring</title>
      <link>http://communities.vmware.com/docs/DOC-5500</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
This page is a living, up-to-date version of the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_analysis_methods_tn.pdf"&gt;performance analysis methods whitepaper&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Check Utilization&lt;/h1&gt;
esxtop will provide network information on the network screen which is displayed with the &amp;lsquo;n' key.&lt;br /&gt;
&lt;br /&gt;
&lt;img src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5500-5-3052/esxtop-network-main.JPG" alt="esxtop-network-main.JPG" width="620" class="jive-image-thumbnail jive-image" onclick="myJiveImage.start(this, 'http://communities.vmware.com/servlet/JiveServlet/downloadImage/102-5500-5-3052/esxtop-network-main.JPG');return false;"/&gt;&lt;br /&gt;
&lt;br /&gt;
The following properties of this screen are worth particular attention:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Each row represents one of several relevant network items on the server: a physical NIC (vmnicX), a virtual switch interface (vswifX), a VM (contains the VM name), the VMkernel network stack (vmk-tcpip-A.B.C.D), and others.&lt;/li&gt;
&lt;li&gt;The network items are organized by the virtual switch to which they are attached.  The virtual switch name is listed under the DNAME column.&lt;/li&gt;
&lt;li&gt;Network traffic on the hypervisor's iSCSI initiator will show up on the VMkernel network row which will contain the name "vmk-tcpip-A.B.C.D", where A.B.C.D is the VMkernel IP address.&lt;/li&gt;
&lt;li&gt;Network traffic on an iSCSI initiators that were configured in the guest will show up on the vNIC displayed using the VM's name on the network panel.&lt;/li&gt;
&lt;li&gt;Total throughput for each item can be observed by summing the total transmitted data (MbTX/s) and received data (MbRX/s) for each item.  As the physical hardware becomes saturated transmitted and received packets will start to be dropped (%DRPTX and %DRPRX, respectively) which, depending on protocol, may result in a retransmission at a later time.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Evaluate the Data&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Does the physical NIC's reported speed and duplex setting match the expectation of the hardware?  Hardware connectivity issues may result in a NIC autonegotiating to a lower speed or half duplex mode.&lt;/li&gt;
&lt;li&gt;Is there a significant load on the appropriate network items?  For instance, is a network-intensive load in a guest actually generating the network activity on its vNIC that is expected?  Are storage-intensive loads generating traffic on the vNIC or vmkNIC when the hypervisor or guest initiators are used?&lt;/li&gt;
&lt;li&gt;Verify that the network traffic is flowing on appropriate NICs. A typical ESX host may have network traffic generated by VMs, network traffic from iSCSI protocol, VMotion related network traffic and service console associated network activity. It is recommended to have to separate NICs to handle these different network packets.&lt;/li&gt;
&lt;li&gt;During periods of saturation, is the total throughput (MbTX/s summed with MbRX/s) matching expectations?  Either the guest or the other end of the communication link may be throttling the performance.&lt;/li&gt;
&lt;li&gt;Are packets being dropped?  When overworked the hardware will refuse packets which get reported as dropped transmitted (%DRPTX) and received (%DRPRX) packets.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Correct the System&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Make sure that the hardware is configured to run at its maximum capability.  This means verifying that 1 Gb NICs are not autonegotiating down to 100 Mb/s for having been connected to an older switch.  Similarly, ensure that NICs are running in full duplex mode.&lt;/li&gt;
&lt;li&gt;When network throughput seems lower than expected, apply traditional network diagnosis techniques to investigate every link in the connection.  Low throughput at the ESX Server is not necessarily due to server configuration.&lt;/li&gt;
&lt;li&gt;Verify that VMware Tools is installed on the guests and TSO, Jumbo Frames, and 10 Gb Ethernet are enabled, where possible.&lt;/li&gt;
&lt;li&gt;Bond multiple physical NICs to virtual switches with high utilization.&lt;/li&gt;
&lt;li&gt;Provide separate virtual switches their own physical NICs and separate network-intensive VMs on their own vSwitches.&lt;/li&gt;
&lt;li&gt;If VMs running on the same ESX Server communicate with each other, connect them to a dedicated virtual switch so that all network transfers occur in memory and not packets are shipped over the wire.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;VMFS and RDM Considerations&lt;/h1&gt;
ESX Server supports the mapping of physical LUNs to virtual machines via a method called raw device mapping (RDM).  RDM eliminates VMFS from the stack which is incorrectly believed to be a source of performance problems.  Removing VMFS reduces the total number of addressable LUNs, eliminates the ability to perform storage migrations (storage VMotion), and greatly increases the effort required for simplified maintenance activities provided by site recovery manager.  And the performance benefits derived from the removal of VMFS are negligible.&lt;br /&gt;
&lt;br /&gt;
See the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf"&gt;performance characteristics of VMFS and RDM whitepaper&lt;/a&gt; for more information on this subject.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Resources&lt;/h1&gt;
The top-level performance analysis page: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
VirtualCenter performance counters: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt; &lt;br /&gt;
&lt;br /&gt;
esxtop performance counters: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">disk</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">storage</category>
      <pubDate>Tue, 27 May 2008 21:05:20 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5500</guid>
      <dc:date>2008-05-27T21:05:20Z</dc:date>
      <clearspace:dateToText>1 year, 4 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Guest-based Performance Measurement</title>
      <link>http://communities.vmware.com/docs/DOC-5661</link>
      <description>Because VMware products provide a virtual interface to the hardware, traditional performance instrumentation that is based on measuring hardware resources may not be accurate.  As a result, Perfmon (in Windows) and top (in UNIX variants) will not provide accurate measurements of CPU utilization.  The problems seen as a result of usage of traditional in-guest performance measurements come from three areas:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;They are unaware of work being performance by the virtualization software, they will not have complete information on the resources being used by the virtualization software.  This includes memory management, scheduling, and other support processes like the service console in ESX.&lt;/li&gt;
&lt;li&gt;The way in which guest OSes account time is different and ineffective in a virtual machine.&lt;/li&gt;
&lt;li&gt;Their visibility into available CPU resources is based on the fraction of the CPU that they have been provided by the virtualization software.&lt;/li&gt;
&lt;/ol&gt;
Items two and three are covered in more detail in &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/2032"&gt;a KB on the subject&lt;/a&gt;.  Performance analysis on virtual deployments should always use host-based tools.  On ESX Server, this means esxtop or VirtualCenter.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <pubDate>Tue, 03 Jun 2008 16:50:47 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5661</guid>
      <dc:date>2008-06-03T16:50:47Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Understanding VirtualCenter Performance Statistics</title>
      <link>http://communities.vmware.com/docs/DOC-5230</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
VirtualCenter (VC) is the entry point for virtual platform management but is less frequently used for performance analysis than esxtop. On the surface, VC is insufficient for performance analysis. But this is not necessarily the case. The VirtualCenter performance counter collection is reduced by default to minimize the data maintained by VC's database. The performance counters maintained by VC can be modified and detailed analysis can be performed based on those counters. This document will provide details necessary for understanding and enabling VC's performance monitoring capabilities.&lt;br /&gt;
&lt;br /&gt;
Refer to the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt; for information on using these counters.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;VirtualCenter Statistic Archival&lt;/h1&gt;
Our stats infrastructure has a lot of counters but our documentation has traditionally been quite thin in terms of descriptions. I got so sick of asking what stats are available at what stats level that I decided to start this page. Obviously it needs to be made more readable, but hopefully it is a start.&lt;br /&gt;
&lt;br /&gt;
Remember that stats in VC are generally organized into 2 archival categories: &lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Not archived: these are the "real-time" (past-hour) stats, which are refreshed every 20 seconds, and are displayed for the past hour in the VI client. These stats are not stored in the database.&lt;/li&gt;
&lt;li&gt;Archived stats. These stats are aggregations (rollups) of the real-time stats. They are aggregated at different sampling intervals and stored in the database. We follow the MRTG standard.
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Past day&lt;/b&gt;: past day stats take the real-time stats and roll them up so that there is 1 data point for every 5 minutes. Thus, there are 12 data points per hour and 288 per day.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Past week&lt;/b&gt;: past week stats take the past day stats and roll them up so that there is 1 data point for every 30 minutes. Thus, there are 48 data points per day and 336 per week.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Past month&lt;/b&gt;: past month stats take the past week stats and roll them up so there is 1 data point per 2 hours. Thus, there are 12 data points per day and 360 per month (30-day month).&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Past year&lt;/b&gt;: past year stats take the past month stats and roll them up so there is 1 data point per day. Thus, there are 365 data points per year.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
The basic flow is this: an ESX host stores statistics at 20s granularity for a period of 1 hour. Therefore, using the Host Client one can view the stats for a host/VM for the past-hour, or one can view those stats using the VI client attached to VirtualCenter. ESX will also aggregate the statistics into past-day statistics and store them for up to 1 day. These past-day statistics are sent to VC periodically and then stored in the database. The database is responsible for periodically taking these past-day stats and rolling them up into 30-minute weekly stats, and then doing the same for converting the weekly stats to monthly stats, etc. Because past-day, past-week, and past-month stats are stored in the database, I call them "archived" stats.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;VirtualCenter Statistics Level&lt;/h1&gt;
Statistics level is a means of organizing statistics for archiving purposes.  Its worth noting that only stats levels one and two are useful for deployment performance monitoring and analysis.  Levels three and four provide granularity and visibility that is useful only for developers.&lt;br /&gt;
&lt;br /&gt;
The concept of "stats level" applies only to the archived stats: we only store a stat in the database if we are at the appropriate stats level for that particular statistic. Non-archived stats are unaffected by stats level. In other words, every metric listed below is collected at 20s granularity and stored on the ESX host for 1 hour. However, unless VC is set to the stats level appropriate to that statistic, we will not store the data in the database or rollup the stat into a past-day stat on the ESX host. You can specify the stats level independently for each of the archiving interval. In other words, you might want to store level 4 stats for up to 1 day, but level 3 stats for 1 week.&lt;br /&gt;
&lt;br /&gt;
In practice, we use stats level to vary the level of detail for statistics that are archived. At stats level 1, we have pretty coarse-grained stats, while stats level 4 contains very detailed statistics, and also includes statistics for various instances (e.g., for each NIC of a VM).&lt;br /&gt;
&lt;br /&gt;
There are 3 important calls that I often use for stats (please refer to the SDK documentation for more information):&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;QueryStatsByLevel: this tells you what stats are available at what stats level. This is what I used to generate the tables below.&lt;/li&gt;
&lt;li&gt;QueryAvailableMetrics: this tells you what stats are available for a given entity during a specified time period.&lt;/li&gt;
&lt;li&gt;QueryPerf: this call takes a QuerySpec as an input and collects the stats for the specified entity over the specified time interval.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Let me give a concrete example of stats level. &lt;br /&gt;
&lt;br /&gt;
Suppose I want to know the value of mem.consumed.maximum for a given VM. This is the maximum amount of machine memory allocated to a VM (including overhead memory) over a specified interval. As shown below, this is a "level 4" statistic. This means that if I've set the stats level to 4 for past-day stats and then formulate a QuerySpec that asks for the value of this data 20 minutes ago at "past-day" granularity (i.e., at 5-minute granularity), then I will get a value. If the stats level is 2 for past-day (5-minute granularity) statistics, however, then such a query will not return a value, because it is level-4 stat and only level-2/level-1 stats are being stored at 5-minute granularity. In contrast, even if the stats level is 1, then if I formulate a QuerySpec with 20s (i.e., "real-time" or "past hour") as the interval of collection, I will get this value, because this data is stored for up to one hour at 20s granularity no matter what the stats level.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Update Interval&lt;/h1&gt;
Understanding the update interval is a key component to understanding the performance statistics.  The Virtual Infrastructure Client (VIC) displays live stats at a 20s update frequency.  Archived stats are archived at their archive frequency.  This is key to understanding the relative amounts of data presented by VC.&lt;br /&gt;
&lt;br /&gt;
For instance, a ready time of 1,000ms in the VIC's live stats graph translates into 5% ready time (1,000 / 20,000.)  The same amount of ready time in a five minute archival frequency would be 15,000 ms.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Counter Index&lt;/h1&gt;
For a list of all counters, see the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5600"&gt;vCenter Performance Counters&lt;/a&gt; page.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <pubDate>Thu, 15 May 2008 19:15:58 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5230</guid>
      <dc:date>2008-05-15T19:15:58Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
      <clearspace:replyCount>2</clearspace:replyCount>
    </item>
    <item>
      <title>Linux Timer Rate</title>
      <link>http://communities.vmware.com/docs/DOC-3580</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
A hardware timer is used by modern systems for a variety of fine-grained operations at the operating system level. VMware's virtualization platforms virtualize this timer in the kernel. Because the virtual timer provided to the VM is actually software, it is subject to the same resource restrictions as other processes. The busier the system the more the timer execution must contend with other hypervisor activities. There are two implications of this:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;When the system is very busy, the software timer may not execute as regularly and virtual time may fall behind.&lt;/li&gt;
&lt;li&gt;Depending on how frequently the OS wishes to be interrupted by the timer, the hypervisor must do different amount of work.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
From &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1420"&gt;VMware KB article #1420&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;Linux guest operating systems keep time by counting timer interrupts. Unpatched 2.4 and earlier kernels program the virtual system timer to request clock interrupts at 100Hz (100 interrupts per second). 2.6 kernels, on the other hand, request interrupts at 1000Hz - ten times as often. Some 2.4 kernels modified by distribution vendors to contain 2.6 features also request 1000Hz interrupts, or in some cases, interrupts at other rates, such as 512Hz. &lt;br clear="all" /&gt;	 Furthermore, an SMP-capable Linux kernel requests additional timer interrupts from the virtual local APIC timer. An SMP-capable kernel running on a one-CPU system generates twice as many total timer interrupts as the corresponding UP kernel, while such a kernel running on a two-CPU system requests three times as many. In general, an SMP-capable kernel running on &amp;lt;n&amp;gt; CPUs requests &amp;lt;n+1&amp;gt; times as many interrupts per second as a UP kernel. For example, an unmodified 2.6 Linux kernel running on a two-CPU virtual machine requests a total of 3000 clock interrupts per second. &lt;br clear="all" /&gt;	 When a guest asks for more than 1000 clock interrupts per second, it can be difficult for the virtual machine to keep up, especially if other applications are running on the host at the same time. This can cause the clock in the guest operating system to fall so far behind real time that it is unable to catch up. The overhead of delivering so many virtual clock interrupts can also hurt guest performance and increase host CPU consumption.&lt;/blockquote&gt;
The amount of work required to manage the virtual timer is greatest with &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5252"&gt;Red Hat Enterprise Linux&lt;/a&gt; 5 (RHEL5) SMP systems, which use a clock frequency of 1000 Hz and suffer from a multiplicative amount of work due to SMP support. For instance, the following table of timer interrupts was created on a 1000 Hz RHEL VM:&lt;br /&gt;
&lt;br /&gt;
&lt;table class="jive-wiki-table"&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;vCPU Count&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Interrupts/sec&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;6,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;72,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br clear="left" /&gt;
&lt;br /&gt;
So, the amount of work that needs to be done by the hypervisor increases dramatically with the addition of vCPUs.  In addition, decreasing the timer interrupt rate greatly decreases the work that needs to be done by the VMkernel to virtualize the timer.  In RHEL 5.1, a Linux kernel that enables reducing the timer rate was included. By adding the parameter "divider=10" to the boot parameters, the amount of work required of the VMkernel to virtualize the timer goes down by an order of magnitude.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">linux</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">timer</category>
      <pubDate>Fri, 14 Mar 2008 16:30:54 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-3580</guid>
      <dc:date>2008-03-14T16:30:54Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
      <clearspace:replyCount>6</clearspace:replyCount>
    </item>
    <item>
      <title>Performance Monitoring and Analysis</title>
      <link>http://communities.vmware.com/docs/DOC-3930</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
A common question that crosses my desk is, "how do I analyze and correct performance?" There are a variety of techniques for doing this and many sources of information spread around the interweb, so I'm going to collect a few thoughts here.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Monitoring&lt;/h1&gt;
Guest-based performance monitoring is an inaccurate and unhelpful means of evaluating performance in virtual deployments.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5661"&gt;Guest-based Performance Measurement&lt;/a&gt;  for more inforamtion.  Monitoring and analysis of VMware ESX Server should be performed with esxtop and VirtualCenter. &lt;br /&gt;
&lt;br /&gt;
&lt;h2&gt;esxtop&lt;/h2&gt;
esxtop is the tried-and-true means of collecting every performance stat needed and making it available in a way that is conducive to analysis. The best source of information on launching esxtop can be found in the &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/vi3_301_201_resource_mgmt.pdf#page=159"&gt;Resource Management Guide (page 159&lt;/a&gt;). It's worth nothing that the directions in that guide came from ESX Server 3.0.1. Since then, the developers have been kind enough to simplify the process of including all performance counters. This is done with the "-a" switch. So,&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;esxtop -a -b &amp;gt; analysis.csv&lt;/blockquote&gt;
&lt;br /&gt;
runs esxtop in batch mode and prints all performance counters.  Let me give you the quick "do-s" and "don't-s" of esxtop:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;With esxtop on ESX Server 3.5 and newer, always include the "-a" option to display all counters.&lt;/li&gt;
&lt;li&gt;With esxtop on versions of ESX Server prior to 3.5 always follow the resource management guide to enable all counters.  The storage latency statistics, for instance, are not displayed by default!&lt;/li&gt;
&lt;li&gt;Always start your VMs before running esxtop in batch mode.  If you start them after starting "esxtop -b", then esxtop will only produce VM data based on the VMs that were running at the time of its start.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;VirtualCenter&lt;/h2&gt;
VirtualCenter doesn't provide all of the performance counters you might need to analyze performance. But, it provides more than you might think! The default setup for VirtualCenter performance counter collection is fairly minimal. But this can be expanded by reconfiguring VC's performance counter collection. This is done as follows:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;From the VI client, Administration-&amp;gt;VirtualCenter Management Server Configuration...&lt;/li&gt;
&lt;li&gt;On the left, click "Statistics".&lt;/li&gt;
&lt;li&gt;Now increase the stats level as you see fit.&lt;/li&gt;
&lt;/ol&gt;
More information on VC performance counters and archival can be found in the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5230"&gt;Understanding VirtualCenter Performance Statistics&lt;/a&gt; wiki article.  A few things are worth considering before going crazy with this:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;You probably will never need level four stats. Those are mainly for debugging.&lt;/li&gt;
&lt;li&gt;Use the DB size estimator after you monkey around with these levels. The DBs can get quite large and you want to know this before it happens and your VC system performance suffers.&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
&lt;h1&gt;Analysis&lt;/h1&gt;
Performance analysis techniques were previously detailed in the performance analysis whitepaper.  That has been migrated to this wiki and new material will appear here.&lt;br /&gt;
&lt;br /&gt;
The following pages will provide guidance for identifying and correcting performance problems on ESX-based systems.  We recommend following each of these pages in the order they're presented here.&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Check and correct CPU utilization: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5420"&gt;CPU Performance Analysis and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Identify memory bottlenecks and remove: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5430"&gt;Memory Performance Analysis and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Characterize storage performance and correct: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5490"&gt;Storage Performance Analysis and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Understand and improve the network utilization profile: &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5500"&gt;Network Performance Analysis and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Within each of these pages are techniques for using counters from VirtualCenter and esxtop.  Information on those counters is provided in &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5600"&gt;vCenter Performance Counters&lt;/a&gt;   and &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5240"&gt;esxtop Performance Counters&lt;/a&gt;, respectively.&lt;br /&gt;
&lt;br /&gt;
Also, note that, while useless in collecting performance data, Perfmon can help with analysis of large esxtop output files.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5100"&gt;Using Perfmon for esxtop-based Performance Analysis&lt;/a&gt;  for more information.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">troubleshooting</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">virtualcenter</category>
      <pubDate>Wed, 26 Mar 2008 20:12:44 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-3930</guid>
      <dc:date>2008-03-26T20:12:44Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Windows Server 2003</title>
      <link>http://communities.vmware.com/docs/DOC-5253</link>
      <description>Please be patient while we build out this content.  If you have ideas for additions, mail me!&lt;br /&gt;
&lt;br /&gt;
Short list:&lt;br /&gt;
&lt;p /&gt;
&lt;ul&gt;
&lt;li&gt;Always install Service Pack 2.  Microsoft changed the interaction with the APIC to improve efficiency on virtual platforms.&lt;/li&gt;
&lt;li&gt;Read up on &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1730"&gt;idle loop behavior for Service Pack 1&lt;/a&gt; .&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">windows</category>
      <pubDate>Fri, 16 May 2008 22:04:43 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5253</guid>
      <dc:date>2008-05-16T22:04:43Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Using Perfmon for esxtop-based Performance Analysis</title>
      <link>http://communities.vmware.com/docs/DOC-5100</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
By now everyone knows that guest performance metrics are not reliably in virtual machines (&lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5661"&gt;Guest-based Performance Measurement&lt;/a&gt;). Slight fluctuations in very small time measurements and a lack of knowledge of the hypervisor's activities produces misleading numbers. Windows' performance monitoring tool, Perfmon, suffers from the same problems. However, Perfmon remains highly valuable for virtual machine performance analysis.&lt;br /&gt;
&lt;br /&gt;
The &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_analysis_methods_tn.pdf"&gt;performance analysis methods document&lt;/a&gt; that was published early in 2008 provided tips on esxtop-based performance analysis. To record esxtop data, &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/vi3_301_201_resource_mgmt.pdf#page=159"&gt;page 159 of the resource management guide&lt;/a&gt; describes running esxtop in batch mode. But on ESX Server 3.5, running esxtop in batch mode with all counters enabled results in an incredibly large CSV file that cannot easily be parsed. But Perfmon can help with this process.&lt;br /&gt;
&lt;br /&gt;
esxtop was constructed so that its CSV-formatted batch output file can be readily consumed by Perfmon. This means that Perfrmon can be used for two key activities:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Quickly analyzing results.&lt;/li&gt;
&lt;li&gt;Generating smaller CSV files of a subset of the data that can be more easily consumed by other analysis tools (such as Microsoft Excel.)&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h1&gt;Quick Results Analysis&lt;/h1&gt;
esxtop's batch output CSV file can be opened and viewed in Perfmon with the following steps:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Transfer the CSV file to a Windows sytem.&lt;/li&gt;
&lt;li&gt;Launch Perfmon (Run: "perfmon".)&lt;/li&gt;
&lt;li&gt;Right click on the graph and select "Properties..." from the drop-down menu.&lt;/li&gt;
&lt;li&gt;Select the "Source" tab.&lt;/li&gt;
&lt;li&gt;Select the "Log files:" radio button from the "Data source" section.&lt;/li&gt;
&lt;li&gt;Click the "Add..." button.&lt;/li&gt;
&lt;li&gt;Browse to and select the CSV file created by esxtop and click "OK".&lt;/li&gt;
&lt;li&gt;Click the "Apply" button.&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Optionally:&lt;/i&gt; reduce the range of time over which the data will be displayed by using the sliders under the "Time Range" button.&lt;/li&gt;
&lt;li&gt;Click "OK".&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
Once the data has been loaded into Perfmon you may select ESX performance counters and display them using Perfmon's graphing system just like normal Windows performance counters. Refer to the Perfmon documentation for instructions on doing this.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Pruning CSV File Data&lt;/h1&gt;
Because so many people use Microsoft Excel to analyze performance data but its row and column limitations are quickly exceeded when generating large esxtop batch files, a means of removing unneeded data will assist analysis. Once Perfrmon has been loaded with the esxtop data, it can be used to generate smaller CSV files that can be easily consumed by Excel. Follow these steps:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Start Perfmon and load the esxtop batch data as described above.&lt;/li&gt;
&lt;li&gt;Have the Perfmon graph display the data of interest (that you'd like to import into Excel.)&lt;/li&gt;
&lt;li&gt;On the graph, right click "Save Data As..."&lt;/li&gt;
&lt;li&gt;In the popup box, select "CSV" as type and save the file.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
This will save only the counters that were displayed in the graph.  By iteratively selecting subsets of counters and saving off individual CSV files it becomes possible to quickly build performance graphs in Excel using esxtop batch output data.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">esxtop</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">perfmon</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">analysis</category>
      <pubDate>Fri, 09 May 2008 16:48:20 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5100</guid>
      <dc:date>2008-05-09T16:48:20Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Benchmarking</title>
      <link>http://communities.vmware.com/docs/DOC-5520</link>
      <description>&lt;h1&gt;Introduction &lt;/h1&gt;
Many VMware users wish to perform analysis on their own virtual deployments.  This page will collect information on setting up and executing your own tests to analyze performance.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;General Best Practices&lt;/h1&gt;
&lt;i&gt;Always measure performance from a native (non-virtual) system.&lt;/i&gt;  Be aware that time measurements in virtual machines can be subject to minute fluctuations.  Many benchmarks produce results by summing times from large number of small operations so these small inaccuracies can be compiled to produce a large error.  See &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5581"&gt;Time-based Measurements in Virtual Machines&lt;/a&gt;  for more information on this subject.  The only way to guarantee correct measurement is to run the measurement tool on a native system.  This is easy for client-server test architectures but may require clever architecture for in-guest testing.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Always ensure apples-to-apples comparison.&lt;/i&gt;  Make sure that the benchmark or application under test are both constrained by the same resources.  For instance, if the virtual machine was configured with 512M of RAM and two virtual CPUs, restrict the native system to the same resources if a virtual-to-native comparison is desired.&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Collect accurate host-based performance statistics.&lt;/i&gt;  Guest OS performance metrics (such as CPU utilization) are not accurate.  Use VirtualCenter or esxtop to collect accurate performance counters during the test.  See the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3930"&gt;Performance Monitoring and Analysis&lt;/a&gt;  for more information on analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Application Benchmarking&lt;/h1&gt;
Microsoft Exchange.&lt;br /&gt;
&lt;br /&gt;
Microsoft SQL Server. &lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Subsystem Benchmarking&lt;/h1&gt;
&lt;h2&gt;Storage&lt;/h2&gt;
Internally at VMware we've used Iometer for a variety of storage analyses.  See the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3961"&gt;Storage System Performance Analysis with Iometer&lt;/a&gt; for more information.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <pubDate>Wed, 28 May 2008 19:12:13 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5520</guid>
      <dc:date>2008-05-28T19:12:13Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Time-based Measurements in Virtual Machines</title>
      <link>http://communities.vmware.com/docs/DOC-5581</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
All benchmarking relies on accurate time keeping so production can be measured with respect to the passage of time.  Because hosted and hypervisor products virtualize the hardware timer, minute fluctuations in guest time keeping can occur.  Details are provided on this in many locations including the following:&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/vmware_timekeeping.pdf"&gt;Timekeeping in Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="jive-link-external" href="http://www.vmware.com/pdf/WS6_Performance_Tuning_and_Benchmarking.pdf"&gt;Workstation 6.0 Performance Tuning and Benchmarking (page 16)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;
During benchmarking these fluctuations in time can cause unexpected results.  Performance measurements can appear inflated if time slows down while work occurs, depressed if time accelerates while work is occurring, or somewhere in between.  This topic will provide some details on this phenomenon.&lt;br /&gt;
&lt;br /&gt;
If the hypervisor (or host operating system for a hosted product) is busy with other tasks it may stall slightly when delivering timer interrupts to the VM. This means that the guest timer appears to run slow.  VMware products will correct for these deviations by pushing time back to its correct position but these "slow downs" and "catch ups" may occur at different points in a benchmark.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Artificially High Results&lt;/h1&gt;
Consider the case where a great number of IO operations are measured over a long period of time.  If a benchmark wants to run 10,000 IOPS and the native system takes 10ms to process each operation, the benchmark would measure 10ms on a native system.  However, if the benchmarking is run on a virtual system and the virtualization software is busy servicing the IO operation instead of updating time, time may not progress properly during the operation.  Although 10ms or more would have passed, perhaps the VM was only informed of the passing of 9ms.  In this case, the operation appeared to run faster on the VM than the native system.&lt;br /&gt;
&lt;br /&gt;
For benchmarks where many operations are measured of a large time window, this isn't a problem.  On our 10K operation benchmark, if time were started before operation one and stopped after operation 10,000, a fluctuation of 1ms will make no difference.  After all, that's only a 1ms inaccuracy on a sequence that would take 100s to run.&lt;br /&gt;
&lt;br /&gt;
However, if the benchmark measured each operation individually and summarized them all to product a result, each individual 1ms inaccuracy would be summed over the entire run.  The benchmark would report that the average IO length was 9ms even though observation of wall time would still show the passage of 100s during the 10,000 IO operation run.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">benchmarking</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">timekeeping</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <pubDate>Thu, 29 May 2008 18:05:22 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5581</guid>
      <dc:date>2008-05-29T18:05:22Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Hyper-Threading on ESX Server</title>
      <link>http://communities.vmware.com/docs/DOC-5101</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
Those of us at VMware that regularly engaged with our field or directly with customers are often asked how Hyper-Threading impacts the performance of a system. I've now been asked this question enough times to have my canned "It Depends" response on the tip of my tongue for every conference I present at. I'm going to use this document to elaborate on that point a bit and provide a little more detail.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;What Is Hyper-Threading?&lt;/h1&gt;
&lt;a class="jive-link-external" href="http://www.intel.com/technology/platform-technology/hyper-threading/index.htm"&gt;Hyper-Threading&lt;/a&gt; is a technology included by Intel first in their Netburst line of parts. Hyper-Threaded processors present their individual processing cores to the system as if they are two processing cores. To use Intel's parlance, that means that each &lt;i&gt;physical&lt;/i&gt; core appears in the operating system as two &lt;i&gt;logical&lt;/i&gt; cores. While the OS can distinguish between a system that has two logical cores (i.e. a single physical core with Hyper-Threading enabled) and two physical cores, applications cannot. It is up the the OS's scheduler to choose if it wishes to use logical cores in the same manner as physical cores.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Is It Supported In ESX Server?&lt;/h1&gt;
Hyper-Threading (HT) has been supported in ESX Server since version 2. ESX Server's scheduler is aware of the presence of HT and treats logical cores differently from physical cores. Virtual CPUs (vCPUs) requesting resources are assigned first to physical cores until all physical cores are loaded. If there are additional vCPUs requesting CPU resources they will then be assigned to the additional logical cores. By this method HT has no impact on performance until more vCPUs are concurrently executing than there exist physical cores.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;How Does It Perform on ESX Server?&lt;/h1&gt;
Understanding HT performance on native systems is tricky enough. Try Googling "hyperthreading performance" and you'll discover a world of information on this feature. By faking the presence of another processing core, Hyper-Threading removes some SMP scheduling from the guest operating system which is then handled by the processor's thread scheduler. Since the processor can manage context switches between its threads much faster than the OS can, this means that often heavy parallelism in applications results in improved performance due to Hyper-Threading.&lt;br /&gt;
&lt;br /&gt;
The exact gains due to HT even on native systems is dependent on the workload. The industry has cited numbers that range from 0% to 40% gain when HT is enabled on supported processors. For the most part, HT improves performance nominally. There are a few cases where performance can slow down, but these are exceptions rather than the norm.&lt;br /&gt;
&lt;br /&gt;
The best generalizations we can provide about HT on ESX are:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Until you have more vCPUs requesting processing power than there are physical cores, HT cannot hurt and provides no value.&lt;/li&gt;
&lt;li&gt;Once you have more vCPUs requesting CPU than physical cores on the system, HT usually provides small gains.&lt;/li&gt;
&lt;li&gt;While very early versions ESX may have sub-optimally handled HT, since ESX Server 2.5.3 robust support of HT in the scheduler means that it should not hurt performance.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h1&gt;For More Information&lt;/h1&gt;
Hyper-Threading configuration options for performance optimization were provided in ESX Server 3 and beyond.  See the &lt;a class="jive-link-external" href="http://www.vmware.com/pdf/vi3_301_201_resource_mgmt.pdf#page=123"&gt;Resource Management Guide (page 123)&lt;/a&gt; for more information.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <pubDate>Fri, 09 May 2008 21:31:11 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5101</guid>
      <dc:date>2008-05-09T21:31:11Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for Oracle</title>
      <link>http://communities.vmware.com/docs/DOC-5505</link>
      <description>While Oracle makes a lot of software for a bunch of purposes, the only best practice guidance available today is for Oracle databases.  We'll build out pages dedicated to other Oracle products if and when other best practices are available.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Real Time Scheduling for Oracle DBs on Linux&lt;/h1&gt;
&lt;blockquote&gt;&lt;i&gt;The following nugget came from a database expert in performance engineering.  Due to the possibility of this change screwing up your DBs it was only with great reluctance that he even committed this to ink.  You're taking your own job in your hands if you try this with your production DBs.  But, for those of you with a test/dev Oracle DB and an unrepentant need for speed, give this a try.  --drummonds&lt;/i&gt;&lt;/blockquote&gt;
Traditional Unix/Linux timeshare scheduler policies attempt to provide good interactive response times by favoring a process that has just had an I/O request completed over a running process. This can create havoc on a high transaction-rate database server, but can be avoided by manipulating the scheduling policies for the database processes.&lt;br /&gt;
&lt;br /&gt;
The scheduler policies date back to the 1970s when we would be running an editor session simultaneously with nroff (a text formatting program that often ran for minutes to process a document) on a single-processor system. Allowing the long-running process to execute without preemption would mean unacceptably long response times for interactive applications. So, the scheduler decays the priority of a process as it runs and accumulates CPU time. Furthermore, when a timeshare priority process goes to sleep waiting for I/O completion, it sleeps at a stronger priority than any running process with a timeshare priority. This ensures, for example, that a text editor user will get immediate feedback for every character typed on the keyboard even if the system is busy running CPU hungry tasks.&lt;br /&gt;
&lt;br /&gt;
The problem with this scheme comes to the forefront when we run an application such as Oracle on a modern computer system. In an online transaction processing environment, the database management system threads/processes typically run for a short period of time, in the order of milliseconds or tens of milliseconds, then issue a disk I/O or send a network packet. A typical large system may issue many thousands of disk accesses per second. Every time an I/O completes, the scheduler makes the issuing process (or thread) runable, and at a stronger priority than all running timeshare processes. So, we are guaranteed a preemption unless the system can find an idle CPU. And, the preempted process goes to the end of the run queue for its priority level.&lt;br /&gt;
&lt;br /&gt;
Now, if a preemption occurs, and the running process was holding an important database resource, say, a latch, all other processes that may need that latch, possibly including the preemptor process itself, will go into a spin loop waiting for the latch, and will eventually put themselves to sleep until the process holding the latch runs again and releases the latch. This is very expensive in term of CPU usage. The spinning and the context switches will drive up the CPU utilization without an increase in the system throughput.&lt;br /&gt;
&lt;br /&gt;
An even worse phenomenon can occur in a large system with a very high I/O rate, where the problem may actually exhibit itself as excess CPU idle due to processes putting themselves to sleep waiting for latches and not waking up when the latch is released. Another possible symptom is the thundering herd problem: once the process holding the latch runs and releases it, a large number of processes become runable, and chaos and inefficiency follows.&lt;br /&gt;
&lt;br /&gt;
The simplest solution is to maintain a constant priority for the DBMS processes, even when sleeping. This way, we allow the running process to voluntarily give up the CPU, at which time it has supposedly released all latches. Because of the nature of OLTP workloads, processes will voluntarily give up the CPU after a few, or at most tens of, milliseconds. So avoiding involuntary preemptions will fix the problem.&lt;br /&gt;
&lt;br /&gt;
One way of achieving this is using the real-time priority feature. We are not really interested in running the database processes at a strong priority. We only use real time priorities because of an additional feature that this scheduling policy offers: the priority of a real time process does not change when the process sleeps on I/O. As a result when an I/O completion occurs, a process becomes runable at the same priority as the currently running process, and is put at the end of the run queue. We avoid the involuntary preemption and the associated spinning and sleeping costs.&lt;br /&gt;
&lt;br /&gt;
Here is a sample RHEL4.4 script that accomplishes this. &lt;b&gt;This script will run the Oracle processes at a stronger priority than all timeshare priority processes and could result in starvation of other applications. It should be used only when you are certain that running these processes at a strong priority won't deny resources to other applications.&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;for DAEMON_PID in `ps -u oracle -f|grep -v grep|grep ora_|awk '{print $2}'` &lt;br clear="all" /&gt;		 do &lt;br clear="all" /&gt;		 sudo chrt --rr -p 82 --pid ${DAEMON_PID} &lt;br clear="all" /&gt;		 done &lt;br clear="all" /&gt;		 LGWR_PID=`ps -u oracle -f|grep -v grep|grep lgwr|awk '{print $2}'` &lt;br clear="all" /&gt;		 sudo chrt --rr -p 83 --pid ${LGWR_PID} &lt;br clear="all" /&gt;		 LSNR_PID=`ps -u oracle -f|grep -v grep|grep tnslsnr|awk '{print $2}'` &lt;br clear="all" /&gt;		 sudo chrt --rr -p 81 --pid ${LSNR_PID} &lt;br clear="all" /&gt;		 for SHADOW_PID in `ps -u oracle -f|grep -v grep|grep -v -E 'ora_|tnslsnr'|awk '{print $2}'` &lt;br clear="all" /&gt;		 do &lt;br clear="all" /&gt;		 sudo chrt --rr -p 81 --pid ${SHADOW_PID} &lt;br clear="all" /&gt;		 done&lt;/blockquote&gt;
To avoid starving the Oracle daemons, we run them at a stronger priority than the shadow processes, with particular attention paid to the logwriter.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">linux</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">oracle</category>
      <pubDate>Wed, 28 May 2008 00:01:33 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5505</guid>
      <dc:date>2008-05-28T00:01:33Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Co-scheduling SMP VMs in VMware ESX Server</title>
      <link>http://communities.vmware.com/docs/DOC-4960</link>
      <description>&lt;h1&gt;Background&lt;/h1&gt;
VMware ESX Server efficiently manages a mix of uniprocessor and multiprocessor VMs, providing a rich set of controls for specifing both absolute and relative VM execution rates.  For general information on cpu scheduling controls and other resource management topics, please see the official VMware &lt;a class="jive-link-external" href="http://www.vmware.com/support/pubs/resource_management/vi_pubs_res_mgmt.html"&gt;Resource Management Guide&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
For a multiprocessor VM (also known as an "SMP VM"), it is important to present the guest OS and applications executing within the VM with the illusion that they are running on a dedicated physical multiprocessor.  ESX Server faithfully implements this illusion by supporting near-synchronous coscheduling of the virtual CPUs within a single multiprocessor VM.&lt;br /&gt;
&lt;br /&gt;
The term "coscheduling" refers to a technique used in concurrent systems for scheduling related processes to run on different processors at the same time.  This approach, alternately referred to as "gang scheduling", had historically been applied to running high-performance parallel applications, such as scientific computations.  VMware ESX Server pioneered a form of coscheduling that is optimized for running SMP VMs efficiently.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Motivation&lt;/h1&gt;
An operating system generally assumes that all of the processors it manages run at approximately the same rate.  This is certainly true in non-virtualized environments, where the OS manages physical processor hardware.  However, in a virtualized environment, the processors managed by a guest OS are actually virtual cpu abstractions scheduled by the hypervisor, which time-slices physical processors across multiple VMs.&lt;br /&gt;
&lt;br /&gt;
At any particular point in time, each virtual cpu (VCPU) may be scheduled, descheduled, preempted, or blocked waiting for some event.  Without coscheduling, the VCPUs associated with an SMP VM would be scheduled independently, breaking the guest's assumptions regarding uniform progress.  We use the term "skew" to refer to the difference in execution rates between two or more VCPUs associated with an SMP VM.&lt;br /&gt;
&lt;br /&gt;
Inter-VCPU skew violates the assumptions of guest software. Non-trivial skew can result in severe performance problems, and may even induce failures when the guest expects inter-VCPU operations to complete quickly.  Let's first consider the performance implications of skew.  Guest OS kernels typically use spin locks for interprocessor synchronization.  If the VCPU currently holding a lock is descheduled, then the other VCPUs in the same VM will waste time busy-waiting until the lock is released.  Similar performance problems can also occur in multi-threaded&lt;br /&gt;
user-mode applications, which may also synchronize using locks or barriers.  Unequal VCPU progress will also confuse the guest OS cpu scheduler, which attempts to balance load across VCPUs.&lt;br /&gt;
&lt;br /&gt;
An extreme form of this performance problem may also lead to correctness issues.  For example, a guest kernel may perform inter-processor operations, such as TLB shootdowns, that are expected to complete quickly on physical hardware (e.g. several microseconds).  The guest OS may timeout if it finds that such operations have not completed after an unreasonably long period of time (e.g. several milliseconds).  Without coscheduling, we have observed this behavior in practice for several different guest operating systems, including Windows BSODs, and Linux kernel panics.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Strict coscheduling in ESX Server 2.x&lt;/h1&gt;
VMware introduced support for running SMP VMs with the release of ESX Server 2 in 2003.  ESX Server 2.x implemented coscheduling using an approach based on skew detection and enforcement.&lt;br /&gt;
&lt;br /&gt;
The ESX scheduler maintains a fine-grained cumulative skew value for each VCPU within an SMP VM.  A VCPU is considered to be making progress when it is running or idling.  A VCPU's skew increases when it is not making progress while at least one of its sibling VCPUs is making progress.  A VCPU is considered to be "skewed" if its cumulative skew value exceeds a configurable threshold, typically a few milliseconds.&lt;br /&gt;
&lt;br /&gt;
Once any VCPU is skewed, all of its sibling VCPUs within the same SMP VM are forcibly descheduled ("co-stopped") to prevent additional skew.  After a VM has been co-stopped, the next time any VCPU is scheduled, all of its sibling VCPUs must also be scheduled ("co-started").  This approach is called "strict" coscheduling, since all VCPUs must be scheduled simultaneously after skew has been detected.&lt;br /&gt;
&lt;br /&gt;
In some situations, such as when the physical machine has few cores, and is running a mix of UP and SMP VMs, coscheduling may incur "fragmentation" overhead.  For example, consider an ESX Server with two physical cores running one dual-VCPU VM and one single-VCPU VM.  When the UP VM is running, the scheduler cannot use the remaining physical core to run just one of the SMP VM's two VCPUs.  This effect is typically negligible in systems with larger numbers of cores (or with hyperthreading enabled), due to the increased flexibility available when mapping VCPUs to hardware execution contexts.&lt;br /&gt;
&lt;br /&gt;
Note that a VCPU executing in the guest OS idle loop can be descheduled without affecting coscheduling, since the guest OS can't tell the difference.  In other words, an idle VCPU does not accumulate skew, and is treated as if it were running for coscheduling purposes.  This optimization ensures that idle guest VCPUs don't waste physical processor resources, which can instead be allocated to other VMs.  For example, an ESX Server with two physical cores may be running one VCPU each from two different VMs, if their sibling VCPUs are idling, without incurring any coscheduling overhead.  Similarly, in the fragmentation example above, if one of the SMP VM's VCPU is idling, then there will be no coscheduling fragmentation, since its sibling VCPU can be scheduled concurrently with the UP VM.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Relaxed coscheduling in ESX Server 3.x&lt;/h1&gt;
The coscheduling algorithm employed by the ESX scheduler was significantly enhanced with the release of ESX Server 3 in 2006.  The basic coscheduling approach is still based on skew detection and enforcement.&lt;br /&gt;
&lt;br /&gt;
However, instead of requiring all VCPUs to be co-started, only those VCPUs that are skewed must be co-started.  This ensures that when any VCPU is scheduled, all other VCPUs that are "behind" will also be scheduled, reducing skew.  This approach is called "relaxed" coscheduling, since only a subset of a VM's VCPUs must be scheduled simultaneously after skew has been detected.&lt;br /&gt;
&lt;br /&gt;
To be more precise, suppose an SMP VM consists of multiple VCPUs, including VCPUs A, B, and C.  Suppose VCPU A is skewed, but VCPUs B and C are not skewed.  Since VCPU A is skewed, VCPU B can be scheduled to run only if VCPU A is also co-started.  This ensures that the skew between A and B will be reduced.  But note that VCPU C need not be co-started to run VCPU B.  As an optimization, the ESX scheduler will still try to co-start VCPU C opportunistically, but will not require this as a precondition for running VCPU B.&lt;br /&gt;
&lt;br /&gt;
Relaxed coscheduling significantly reduces the possibility of coscheduling fragmentation, improving overall processor utilization.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Conclusions&lt;/h1&gt;
ESX Server employs sophisticated cpu scheduling algorithms that enforce rate-based quality-of-service for both uniprocessor and multiprocessor VMs.  For multiprocessor VMs, coscheduling techniques ensure that virtual CPUs make uniform progress, faithfully implementing the illusion that the VM is running on dedicated multiprocessor hardware, ensuring efficient execution of guest software.  Optimizations such as relaxed coscheduling and descheduling idle VCPUs provide a high-performance execution environment that efficiently utilizes physical host resources.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;Appendix: ESX Server coscheduling statistics&lt;/h1&gt;
ESX Server 3.x exports statistics related to the coscheduling behavior of multiprocessor VMs.  The "esxtop" utility can be used to examine these statistics on a live ESX system.&lt;br /&gt;
&lt;br /&gt;
The %CSTP column in the CPU statistics panel shows the fraction of time the VCPUs of a VM spent in the "co-stopped" state, waiting to be "co-started". This gives an indication of the coscheduling overhead incurred by the VM.  If this value is low, then any performance problems should be attributed to other issues, and not to the coscheduling of the VM's virtual cpus.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">scheduling</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">smp</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">kernel</category>
      <pubDate>Fri, 02 May 2008 15:12:56 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-4960</guid>
      <dc:date>2008-05-02T15:12:56Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
      <clearspace:replyCount>5</clearspace:replyCount>
    </item>
    <item>
      <title>Best Practices for Apache</title>
      <link>http://communities.vmware.com/docs/DOC-5503</link>
      <description>Apache web servers are a common target for virtualization in almost every data center.  This page will collect best practices for setting up and configuring Apache for best performance.&lt;br /&gt;
&lt;br /&gt;
No Apache-specific information has yet been provided.  But tips are available on the &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-5502"&gt;Best Practices for Web Servers&lt;/a&gt; page.</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">apache</category>
      <pubDate>Tue, 27 May 2008 23:24:29 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5503</guid>
      <dc:date>2008-05-27T23:24:29Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Best Practices for Web Servers</title>
      <link>http://communities.vmware.com/docs/DOC-5502</link>
      <description>&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;br /&gt;
This page is a collection point for ESX Server configuration options that can maximize web server performance regardless of the choice of web server.&lt;br /&gt;
&lt;br /&gt;
&lt;h1&gt;TCP Transmit Coalescing  &lt;/h1&gt;
&lt;br /&gt;
In a recent &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/specweb_perf_final.pdf"&gt;whitepaper published on SPECweb&lt;/a&gt;  performance on VMware ESX Server, the use of TCP transmit coalescing was described to improve web server performance.  The general idea behind transmit coalescing is to buffer TCP transmits at the ESX Server for a brief period of time to allow multiple packets to be transmitted at one time.  This introduces a very slight increase in latency but can provide a dramatic increase in efficiency.&lt;br /&gt;
&lt;p /&gt;
Transmit coalescing can be turned on with the following steps:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Using the VMware Infrastructure Client, choose the ESX Server host on which the virtual machine is deployed.&lt;/li&gt;
&lt;li&gt;Click the &lt;b&gt;Configuration&lt;/b&gt; tab.&lt;/li&gt;
&lt;li&gt;Click &lt;b&gt;Advanced Settings&lt;/b&gt; in the Software panel.&lt;/li&gt;
&lt;li&gt;Click the &lt;b&gt;Net&lt;/b&gt; tab.&lt;/li&gt;
&lt;li&gt;Edit the &lt;b&gt;Net.vmxnetThroughputWeight&lt;/b&gt; value to 128, then click OK.&lt;/li&gt;
&lt;li&gt;Reboot the virtual machine.&lt;/li&gt;
&lt;/ol&gt;
Details and performance results are provided in the &lt;a class="jive-link-external" href="http://www.vmware.com/files/pdf/specweb_perf_final.pdf"&gt;SPECweb paper&lt;/a&gt; .&lt;br /&gt;
&lt;h1&gt;Resources&lt;/h1&gt;
&lt;br /&gt;
Apache Best Practices&lt;br /&gt;
&lt;p /&gt;
IIS Best Practices</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">network</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">apache</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">iis</category>
      <pubDate>Tue, 27 May 2008 23:10:36 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5502</guid>
      <dc:date>2008-05-27T23:10:36Z</dc:date>
      <clearspace:dateToText>1 year, 5 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Red Hat Enterprise Linux</title>
      <link>http://communities.vmware.com/docs/DOC-5252</link>
      <description>Page under construction.&lt;br /&gt;
&lt;p /&gt;
Quick notes:&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;RHEL5 always uses a &lt;a class="jive-link-wiki" href="http://communities.vmware.com/docs/DOC-3580"&gt;Linux Timer Rate&lt;/a&gt; timer rate of 1000 Hz.  This can be decreased in RHEL5.1 for greater efficiency of RHEL SMP guests.&lt;/li&gt;
&lt;li&gt;RedHat has not enabled VMI in their kernels, so out-of-the-box paravirtualization is not possible. But custom kernels can easily be built to take advantage of this feature. See &lt;a class="jive-link-external" href="http://kb.vmware.com/kb/1003644"&gt;VMware KB article #1003644&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">linux</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">bestpractice</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">redhat</category>
      <pubDate>Fri, 16 May 2008 21:55:13 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5252</guid>
      <dc:date>2008-05-16T21:55:13Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Ubuntu</title>
      <link>http://communities.vmware.com/docs/DOC-5254</link>
      <description>&lt;br /&gt;
Ubuntu supports paravirtualization!&lt;br /&gt;
&lt;p /&gt;
I'll have more to say about Ubuntu later.  &lt;img class="jive-emoticon" border="0" src="http://communities.vmware.com/images/emoticons/happy.gif" alt=":)" /&gt;</description>
      <category domain="http://communities.vmware.com/tags?communityID=1">ubuntu</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">paravirtualization</category>
      <category domain="http://communities.vmware.com/tags?communityID=1">linux</category>
      <pubDate>Fri, 16 May 2008 22:13:21 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/docs/DOC-5254</guid>
      <dc:date>2008-05-16T22:13:21Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>EMC World 2008</title>
      <link>http://communities.vmware.com/blogs/drummonds/2008/05/19/emc-world-2008</link>
      <description>I'm going to be at EMC World 2008 from May 19 through May 23.  I'll be presenting on VMware's VI3 architecture with respect to performance, tips and techniques for performance monitoring and analysis, and best practices for best performance.  My session, titled "VMware ESX Server Performance Analysis", is offered on Monday at 4:30 and Tuesday at 11:30.  I'll also be hosting birds-of-a-feather sessions on the same subject at 2:30 on Tuesday.  Please drop by if you're at the conference!</description>
      <pubDate>Mon, 19 May 2008 15:29:29 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/blogs/drummonds/2008/05/19/emc-world-2008</guid>
      <dc:date>2008-05-19T15:29:29Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
    <item>
      <title>Welcome to the VMware Performance Community Forum!</title>
      <link>http://communities.vmware.com/message/946665</link>
      <description>&lt;br /&gt;
Bring your performance topics to this forum.  Questions and comments from all are welcome.&lt;br /&gt;
&lt;p /&gt;
Scott</description>
      <pubDate>Fri, 16 May 2008 18:58:21 GMT</pubDate>
      <author>drummonds</author>
      <guid>http://communities.vmware.com/message/946665</guid>
      <dc:date>2008-05-16T18:58:21Z</dc:date>
      <clearspace:dateToText>1 year, 6 months ago</clearspace:dateToText>
    </item>
  </channel>
</rss>

