At this link at Yellow Bricks: http://www.yellow-bricks.com/esxtop/ are those thresholds intended to be a spike value or sustained value? If sustained, how long before you would call it something to investigate? For instance, at 10 second intervals, my DAVG, GAVG, KAVG, and QUED all Averages are within the specified thresholds. However, I have had a KAVG ten second spike of 20, DAVG ten second spike of 30, and GAVG ten second spike of 40. However, none of them remained over the threshold for more than one ten second interval. Is this cause for concern?
Also, when using ESXPLOT, do you look at the Physical Disk Path DAVG, GAVG, KAVG, and QUED numbers, or the Physical Disk Partition DAVG, GAVG, KAVG, and QUED numbers?
My experience when working with VMware escalation engineers on this topic is that they focus on the "Dev" entry in esxplot this is the LUN and the value is and average over the interval measured.
There is not supposed to have peak averages over 25 milliseconds. That being said if a response is not received in 5000 milliseconds i/o is halted. We had major issues but our latency numbers were over 100 milliseconds.