Hypothetical situation but we face similar to this in our production environment on many occasions.
We receive a Health alert from a datastore investigation shows health has dropped due to upturn in anomalies, further investigation shows that the datastore was subject to high IOPS about an hour ago, a very sudden peak and then it fell away a few minutes later. The datastore contains 20 VM's any of which could of caused the high IO, how do we easily find the guilty VM without going through and looking at graphs of all 20 VM's seperatley?
Thanks in Advance.
If you look at the datastore resource, the root cause analysis hierarchy should give you an indication of which VMs upon it were also anomalous. Also, if you have Advanced+, in the Custom UI you can create a quick dashboard to populate a metric selector+metric graph using a multiselector.. thus giving a quick stack graph or top-down list of the virt disk:cmd/s for all vms.
Its not relevant to your question directly. You may be aware of this as well.
To avoid such situation, you could leverage SIOC which is based on VM shares @IOPS. No VM will get more IOPS above the configured IOPS shares on each VMDK. SIOC comes in picture only when there is IO contention on datastore.
You can create a custom top- N analysis dashboard that contains all of the VM in datastore. It gives you to view high IOPs in one screen. Also you can use latency metrics for analysis.