Using vscsiStats for Storage Performance Analysis

    Introduction

    esxtop is a great tool for performance analysis of all types.  However,  with only latency and throughput statistics, esxtop will not provide the  full picture of the storage profile.  Furthermore, esxtop only provides  latency numbers for Fibre Channel and iSCSI storage.  Latency analysis  of NFS traffic is not possible with esxtop.

     

    Since ESX 3.5, VMware has provided a tool specifically for profiling  storage: vscsiStats.  vscsiStats collects and reports counters on  storage activity.  Its data is collected at the virtual SCSI device  level in the kernel.  This means that results are reported per VMDK (or  RDM) irrespective of the underlying storage protocol.  The following  data are reported in histogram form:

     

     

    • IO size
    • Seek distance
    • Outstanding IOs
    • Latency (in microseconds)
    • More!

     

    Running vscsiStats

    vscsiStats collection and analysis requires two steps:

    1. Start statistics collection.
    2. View accrued statistics.

     

    Documentation on command-line parameters are available when running '/usr/lib/vmware/bin/vscsiStats -h'.

     

     

    Starting and Stopping vscsiStats Collection

    The tool is started with the following command:

    /usr/lib/vmware/bin/vscsiStats -s -w <world_group_id>

     

     

    This command starts the process that will accrue statistics.  The world  group ID must be set to a running virtual machine.  The running VMs' IDs  can be obtained by running '/usr/lib/vmware/bin/vscsiStats -l'.

     

     

    After about 30 minutes vscsiStats will stop running.  If the analysis is  needed for a longer period, the start command should be repeated above  in this window.  That will defer the timeout and termination by another  30 minutes.

     

     

    Since results are accrued and reported out in summary, the histograms  will include data since collection was started.  To reset all counters  to zero, run '/usr/lib/vmware/bin/vscsiStats -r'.

     

     

    Viewing Statistics

    Counters are displayed by using the following command:

    /usr/lib/vmware/bin/vscsiStats -p <histo_type> [-c]

     

     

    The histogram type is used to specify either all of the statistics or  one group of them.  Options include all, ioLength, seekDistance,  outstandingIOs, latency, interarrival.

     

     

    Results can be produced in a more compact comma-delimited list by adding the optional "-c" above.

     

     

    Using vscsiStats Results

    Use Case 1: Identifying Sequential IO

    Storage arrays can process sequential IO much faster than random IO.   You can therefore improve the performance of a sequential workload by  placing it on a dedicated LUN to allow the array to optimize access.   vscsiStats can help you identify your sequential workloads even if you  don't understand anything about the application in the VM.

     

    Take the following graph as example, which I generated by running '/usr/lib/vmware/bin/vscsiStats -p seekDistance':

     

     

    random_write_histo.png



    This graph shows that most of the commands are being issued a great  distance from the previous command.  It looks like all of the commands  were 50,000 or more logical blocks away from the previous command.  When  I looked at the raw data, I saw that over 99% of the commands were more  than 128 blocks away from the previous command.  That's random access  if I've ever seen it.  Here's the opposite example:

    sequential_write_histo.png

    In this case the logical block number (LBN) of each command is most  frequently exactly one larger than the previous command.  That's the  signature of a heavily sequential workload.  It shouldn't surprise you  to learn that both of these profiles were generated by Iometer using  random and sequential writes, respectively.

    Use Case 2: Optimizing for IO Sizes

    The IO size is an important characteristic of storage profiles.  A  variety of best practices have been provided by storage vendors to  enable customers to tune their storage to a particular IO size.  As an  example, it may make sense to optimize an array's stripe size to its  average IO size.  vscsiStats can provide a histogram of IO sizes to help  this process.  The following graph was generated by  '/usr/lib/vmware/bin/vscsiStats -p ioLength':

    io_size_4k.png

    From these results I can see that about a quarter of the commands came  in IOs smaller than 4k.  About half of the commands were sized to 4k  commands.  The minute number of remaining IOs were larger than 4k.  This  signature is common of a VMDK formatted to 4k blocks and supporting OS  and application execution.  The storage array should be optimized for 4k  blocks if this disk's performance is a priority.

    Use Case 3: Storage Latency Analysis (Including NFS!)

    esxtop is a terrific tool for latency-based storage analysis.  Fibre  Channel and iSCSI HBAs have device and kernel latencies in esxtop's  storage panel.  Software iSCSI initiators will show up as vmhba32 (ESX  3.5 and earlier) and vmhba33 (ESX 4.0 and later.)  But esxtop does not  provide latency statistics for NFS stores.

    Because vscsiStats collects its results where the guest interacts with  the hypervisor, it is unaware of the storage implementation.  Latency  statistics can be collected for all storage configurations with this  tool.

    latency.png

    The above graph shows that the server in my office with a single  direct-attached SCSI disk is performing as I would expect.  About half  of all the operations are completing in under 5 ms.  The other half take  5-15 ms to complete.  A few commands took longer than 15 ms, but the  number is so small that it doesn't concern me.  Similar results can be  seen with NFS arrays.

    vscsiStats on ESXi

    vscsiStats can be installed on ESXi hosts after putting the host into  tech support mode.  More information on this process is availalble on Scott's blog on the subject on vPivot.

    Additional Resources

    My colleagues Ajay Gulati, Chethan Kumar, and Irfan Ahmad presented at VPACT 09 Storage Workload Characterization and Consolidation in Virtualized Enviornments.  This paper serves as an excellent example of vscsiStats in action.

    I learned vscsiStats by reviewing Irfan's VMworld 2007 presentation (vscsiStats: Fast and Easy Disk Workload Characterization on VMware ESX Server) and playing with the tool.  Check out his presentation if you'd like more detail.