Tracking Down Anomalies in 5.7.1?

jbenson81 · ‎03-14-2014

We're running vCOPs 5.7.1 in several vCenter instances and during the past two days or so have been receiving the following alerts on one of the vCenters:

New alert was generated at Fri Mar 14 07:57:23 EDT 2014:
Info:High number of metrics outside their normal bounds: 164 abnormal metrics out of 5832 metrics monitored.

Alert Type : Health
Alert Sub-Type : Anomalies
Alert State : Critical
Resource Kind : VMwareAdapter Instance
Resource Name : NGRVAGVC1S
Alert ID : 1446595

When I log in to the vSphere UI and look at the vCenter in question, I'm having a tough time getting to the source of these anomalies. I may not be looking at the correct tab or view. Looking at the dashboard, I select the Health Badge on my vCenter instance, and then the Anomalies Badge on the subsequent Operations - Details page. This brings up a list of resource types that have experienced recent anomalies, but when selecting a specific resource type and the corresponding anomaly, my request always seems to time out.

Is there a better way of drilling down to the source(s) of this alert in terms of what resources and metrics are experiencing issues?

gradinka · ‎03-15-2014

What about changes in any of the Health or Workload badges?

Getting alert on anomalies doesn't really mean something is wrong, it means just that something behaves differently than before

As to drill-down, you can try the "Events" page or the "AllMetrics" page.

On the Events page, you can see events for the self, children or peer objects - use the "E" button to enable those;

then "target, up/down/left-right" buttons to filter. This might help you see what happened.

Make sure to select a suitable time window for the data - e.g. few hours/ before the anomalies started going up and see.

All metrics is also very useful screen, but you need some starting point otherwise it will take some time looking at different graphs