cmbwml1
Enthusiast
Enthusiast

Disk latency reporting error in vSphere 5

On some of our disk performance graphs I am seeing the highest latency metric recording almost 200 million milliseconds.  That equates to over 50 hour latency spikes.  The storage isn't reporting any significant latency.  The vSphere client shows these latency spikes on the VM's datastore and disk performance graphs but not for the virtual disk.   The host is showing 6 second maximum command latency but nothing else above 50 ms.

Could this be a VAAI primitive issue?  Simple performance metric calculation error?

repetative high latency spike (over 1 million seconds) v2.jpg

Thanks,

Chris

0 Kudos
7 Replies
BharatR
Hot Shot
Hot Shot

Hi,

The vSphere client shows these latency spikes on the VM's datastore and disk performance graphs

but not for the virtual disk

Rescan of all storage adapters so if it is Causing by any dead LUN can be removed from the  configuration

It happens when hosts were trying to connect to a dead LUN/path

Here is the Guide for highestlatency parameter for KB.

http://www.vmware.com/support/developer/vc-sdk/visdk400pubs/ReferenceGuide/disk_counters.html

Best regards, BharatR--VCP4-Certification #: 79230, If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
PUREJOY
Enthusiast
Enthusiast

Not really.

This is a legitmate problem in your system, you may have to isolate the VMs causing the spike.

I will start with the backend lun configuarion, move onto ESX configuration like Multipath config (SATP and Path policy) and finally see if the VMs are doing some kind of burst IOs.

In all cases, i would use esxtop/resxtop to gather disk data (option u,v,and d)

let us know if you see something fishy

--Ravi

Architect @ Pure Storage || www.purestorage.com || http://www.purestorage.com/blog/ || http://twitter.com/#!/purestorage ||@ravivenk || VCAP-DCA5, VCP 4, VCP 5
0 Kudos
MK22
Contributor
Contributor

This has been happening to me ever since I upgraded 2 of our datacenters to vSphere 5. It is impossible to have 55 hours of latency in a span of 20 seconds, so I'd say this has to be something wrong with the statistics math on the data that is in the database, even if the stats lagged beind on a change of path, it is impossible. The machine in this screenshot has had a maximum of < 1ms of latency but has been caclulated out to 55 hours. Have you found anything out? I've done numerous searches, but I think I'm going to have to open a service request.

1989530.png

VCP
0 Kudos
MK22
Contributor
Contributor

0 Kudos
SteveBeal
Contributor
Contributor

Having the same issue. However when I look at esxtop everything appears to be fine.

Latency.JPGLatency2.JPG

0 Kudos
MK22
Contributor
Contributor

Yeah, that's the workaround they've presented, I guess we'll have to wait until 5.1 to have this fixed....

Resolution

This is a known issue.
To work around this issue, use ESXTOP to accurately measure the  disk latency. You can also use a third party utility in the guest  operating system to measure disk latency.
VCP
0 Kudos
AdamKski
Contributor
Contributor

I'm seeing these same issues in 5.1. Anyone else??

0 Kudos