We recently had an issue with I/O latencies significant enough to cause some guest issues. We were alerted to this by the end users and guest OS engineers. There were no alerts generated from the VI. Increased latencies and, in this case, the temporary suspension of I/O to a datastore is something I'd like to be alerted to. I can't seem to find a native vCenter alarm configuration that triggers alerts for these events. Is anyone monitoring for these? Are you using the vCenter native alarms to generate alerts?
If you observe that this latency is too high for a consistent period of time, this indicates that there is a concern about storage performance and you must check the logs on the storage array for any indication of a failure. If failures are logged on the storage array side, corrective action should be taken. Contact your storage vendor for information regarding checking logs on the array.
Also check if these messages are generated when there were any scheduled tasks, such as backups, replications, etc., as these can also cause intermittent performance hits.
If the message is generated because of an overload condition, attempt to reduce the load on the affected storage device. If the storage device is overloaded, reduce the load on the storage device.
If running a LUN replication tool, pause the task from the storage end and attempt a storage vMotion to a different datastore. This should help improve the I/O operations.
For more/related information on diagnosing overload/performance issues, see Using esxtop to identify storage performance issues (1008205).
That sounds nasty and very unpleasant!
For your information we do use the native vCenter alarms to alert us of any potential storage issues with our VMs or hosts. We also have alarms set on our storage however the vCenter ones generally suffice.
Have you looked at the following two?
Both of these are default alarms however may need further configuration to get them to trigger when you want. For example we have modified the Host storage status alarm so that an email is sent otherwise once the storage "comes back" we may not know it ever happened.
For the Virtual machine total disk latency alarm the defaults sufficient for sustained issues but not for quick load events. In this case you might want to decrease the time threshold from 5 minutes something else (30 seconds is the minimum) in order to get the alarm to trigger for your situation.
If you have already explored the above two then apology's for your time