Disk Latency connecting to a FC SAN

Disk Latency connecting to a FC SAN

Let's start off with the basic's......To monitor disk latency in the VI client select the ESX host, performance, change chart options and select the below counters:

Physical Device Read Latency

Queue Read latency

Kernel Disk Read Latency

Disk Read latency

Disk Write Latency

Queue Write latency

Physical Device Write Latency

Kernel Disk Write Latency

The problem with the graphs is that they only update every 20 seconds. If you prefer the command-line like me then use esxtop to monitor the disk latency.

To monitor latency via esxtop:

esxtop

u (Selects the disk counters)

To add the disk latency queues type "f" and "h" then press any key to return to the esxtop stats screen

Disk latency queues:

DAVG/cmd - Latency at the HBA layer

KAVG/cmd - Latency at the vmkernel layer

GAVG/cmd - Latency at the guest layer

DQLEN = Maximum commands allowed the kernel is allowed to queue (Default length is set to 32)

ACTV = Commands currently being executed by the kernel

QUED = Amount of commands the kernel has waiting to join the disk queue

%USD = ACTV / QLEN * 100%

DAVG/cmd:

This is the latency recorded directly from the HBA's, by measuring how long it takes to read \ write data from the storage device

KAVG/cmd:

This is the latency recorded by the vmkernel. This is the time it takes the ESX kernel to pass on the read \ write commands to the HBA.

GAVG/cmd:

This is the latency affecting the VM's (guests) themselves. This is the total latency so KAVG/cmd + DAVG/cmd = GAVG/cmd

What do you do if you are experiencing disk latency?

High DAVG but low KAVG/cmd:

If this is high then the problem is most likely related to the storage device the ESX host is reading \ writing to \ from. Work with your SAN Team to investigate further. Below are some checks to make on all your ESX hosts:

If you are having problems with an Active \ Active SAN have you balanced the load? Check out http://communities.vmware.com/thread/136237

Check the firmware version of all your HBA's are at the same level, also check your vendor for the latest version of the firmware:

cat /proc/scsi/qla2300/2 | grep -i firmware

Check that there aren't any SCSI reversation locks:

cat /var/log/vmkernel

If there was a SCSI reservation lock what caused it? Maybe a 3rd party application? What about VCB? Was it backing up a VM at the time?

Check storageMonitor to see if there are any errors coming from the SAN unit:

/usr/lib/vmware/bin/storageMonitor

I have used this in the past to diagnose the performance issue being due to a failed disk being replaced in a SAN unit.

I have encountered a major write latency issue with Sun StorageTek 2540 SAN unit. This was caused by the SAN unit disabling the write cache to recharge the battery. The management GUI for the SAN only thought disabling the write cache was an informational message!!! I believe this has been fixed by Sun by a later firmware update.

Low DAVG/cmd but high KAVG/cmd:

This usually points to the ESX kernel taking a long time to process the VM's I/O commands.

In esxtop check if %USD is always 100% or a high percentage. If it is then check out . This is a great way to work out the correct disk queue length for your environment. Once you know what the queue length should be make sure you set the advanced setting Disk.SchedNumReqOutstanding as well as the disk queue length.

Make sure the HBA failover policy is set to fixed for Active \ Active SAN and MRU for Active \ Passive SAN

Reading Recommendations:

Check the compatiblity settings for your SAN

Read

Calculate the correct sized VMFS for your environment and how many VM's

esxtop guide http://communities.vmware.com/blogs/virtuallysi/2009/03/19/viops-esxtop-guide

Version history
Revision #:
1 of 1
Last update:
‎07-27-2009 08:11 AM
Updated by: