On some of our disk performance graphs I am seeing the highest latency metric recording almost 200 million milliseconds. That equates to over 50 hour latency spikes. The storage isn't reporting any significant latency. The vSphere client shows these latency spikes on the VM's datastore and disk performance graphs but not for the virtual disk. The host is showing 6 second maximum command latency but nothing else above 50 ms.
Could this be a VAAI primitive issue? Simple performance metric calculation error?
The vSphere client shows these latency spikes on the VM's datastore and disk performance graphs
but not for the virtual disk
Rescan of all storage adapters so if it is Causing by any dead LUN can be removed from the configuration
It happens when hosts were trying to connect to a dead LUN/path
Here is the Guide for highestlatency parameter for KB.
This is a legitmate problem in your system, you may have to isolate the VMs causing the spike.
I will start with the backend lun configuarion, move onto ESX configuration like Multipath config (SATP and Path policy) and finally see if the VMs are doing some kind of burst IOs.
In all cases, i would use esxtop/resxtop to gather disk data (option u,v,and d)
let us know if you see something fishy
This has been happening to me ever since I upgraded 2 of our datacenters to vSphere 5. It is impossible to have 55 hours of latency in a span of 20 seconds, so I'd say this has to be something wrong with the statistics math on the data that is in the database, even if the stats lagged beind on a change of path, it is impossible. The machine in this screenshot has had a maximum of < 1ms of latency but has been caclulated out to 55 hours. Have you found anything out? I've done numerous searches, but I think I'm going to have to open a service request.
Yeah, that's the workaround they've presented, I guess we'll have to wait until 5.1 to have this fixed....