If you observe that this latency is too high for a consistent period of time, this indicates that there is a concern about storage performance and you must check the logs on the storage array for any indication of a failure. If failures are logged on the storage array side, corrective action should be taken. Contact your storage vendor for information regarding checking logs on the array.
Also check if these messages are generated when there were any scheduled tasks, such as backups, replications, etc., as these can also cause intermittent performance hits.
If the message is generated because of an overload condition, attempt to reduce the load on the affected storage device. If the storage device is overloaded, reduce the load on the storage device.
If running a LUN replication tool, pause the task from the storage end and attempt a storage vMotion to a different datastore. This should help improve the I/O operations.
For more/related information on diagnosing overload/performance issues, see Using esxtop to identify storage performance issues (1008205).
That sounds nasty and very unpleasant!
For your information we do use the native vCenter alarms to alert us of any potential storage issues with our VMs or hosts. We also have alarms set on our storage however the vCenter ones generally suffice.
Have you looked at the following two?
- Host storage status
- Virtual machine total disk latency
Both of these are default alarms however may need further configuration to get them to trigger when you want. For example we have modified the Host storage status alarm so that an email is sent otherwise once the storage "comes back" we may not know it ever happened.
For the Virtual machine total disk latency alarm the defaults sufficient for sustained issues but not for quick load events. In this case you might want to decrease the time threshold from 5 minutes something else (30 seconds is the minimum) in order to get the alarm to trigger for your situation.
If you have already explored the above two then apology's for your time