This document is a living, wiki version of the performance analysis methods whitepaper . That document will ultimately be replaced with this one.
Storage often bounds the performance of enterprise workloads. More so than CPU or memory performance investigation, traditional means of analysis continue to be sound for storage performance in virtual deployments. This section will introduce the tools for identifying heavily-used resources and VMs that have high demands of their storage system. Traditional correction methods will then apply.
iSCSI storage using software initiators is not covered in this section. When accessed through the hypervisor's iSCSI initiator or an in-guest initiator traffic will show up on the VMkernel network or the VM's network stack. Check the Network section for more information.
As before, esxtop is the best place to start when investigating potential performance issues. To view the disk adapter information in esxtop, hit the ‘d' key once it is running.
On ESX Server 3.5, the storage system can be displayed per VM (using ‘v') or per storage device (using ‘u'). But the same counters are displayed on each. Look at the following items:
|Queued Disk Commands||disk.queueLatency.average||QUED||Queued commands are queued in the kernel queue. They are awaiting an open slot in the device driver queue. A large number of queued commands means a heavily loaded storage system. See Storage Queues and Performance for information on queues.|
|Queue Usage||Not available||%USD||This counter tracks the percentage of the device driver queue that is in use. See Storage Queues and Performance for info on this queue.|
|Command Rate||disk.commands.summation||ACTV||VirtualCenter reports the number of commands that have been issued in the previous sample period. esxtop provides a live look at the number of commands that are being processed at any one time. Consider these counters a snapshot of activity. But don't consider any number here "too much" until large queues start developing.|
|HBA Load||Not available||LOAD||In esxtop the LOAD counter tracks how full the device queues are. Once LOAD exceeds one, commands will start to queue in the kernel. See Storage Queues and Performance for information on these queues.|
|Storage Device Latency||disk.deviceReadLatency|
|DAVG/cmd||These counters track the latencies of the physical storage hardware. This includes everything from the HBA to the platter.|
|KAVG/cmd||These counters track the latencies due to the kernel's command processing.|
|Total Storage Latency||Not available||GAVG/cmd||This is the latency that the guest sees to the storage. It is the um of the DAVG and KAVG stats.|
|Aborts||disk.commandsAborted.summation||ABRTS/s||These counters track SCSI aborts. Aborts generally occur because the array is taking far too long to respond to commands.|
It is important to have a solid understanding of the storage architecture and equipment before attempting to analyze performance data. Consider the following questions:
Its worth pausing at this moment to point out that 95% of all storage performance problems are not fixed in ESX. Believe me, I (Scott) have been called into a dozen performance escalations where poor storage performance was blamed on the hypervisor and not a single one was being caused by ESX. If you're seeing high latencies in VirtualCenter or esxtop to the storage device, its worth treating this problem as an array configuration issue. Check ESX's logs for obvious storage errors, check array stats, and make sure that there are no fabric configuration problems.
At the point of high storage latencies you shouldn't be using complex benchmarks to reproduce and solve this problem. Go with Iometer and make certain you're doing an apples-to-apples comparison against a physical system (ideally dual-booted from the ESX server under test) to make sure of what your expected, non-virtual results are. Check Storage System Performance Analysis with Iometer for information on using Iometer for problems like this.
Corrections for these problems can include the following:
Top-level performance analysis page: Performance Monitoring and Analysis
VirtualCenter performance counters: Understanding VirtualCenter Performance Statistics
esxtop performance counters: esxtop Performance Counters
Fibre Channel SAN Configuration Guide
Storage Queues and Performance