Solved: Anomolies for a VM "Not Calculated Yet"

headgeek · ‎10-10-2014

First, I read the other thread that asked this same question but the answer there does not apply to my situation.

I am running VCOPs 5.8.1 in a test environment. To test out VCOPs behavior I have set up 5 scenarios which excercise the CPU using a freeware tool called HeavyLoad. All scenarios are different but repeat their behavior on no more than a weekly pattern. For example, scenario 1 runs the CPU at a constant 25% and has 2 peaks - one at 75% on Tue from 5-11 p.m and one on Fri from 8-12 a.m. Scenario 5 executes the CPU at 100% for 3 minutes and then stops for 3 minutes and repeats this continuously. The scnearios started running in the beginning of August so they have all been running for 2 months. No other applications are on the servers and no users are logging on. It shows the spikes, etc as anomolies and that makes sense howerver all 5 have not established "Normal" for anomolies. Interestingly, when you look at the Dynamic thresholds it "knows", for example, that Friday from 8-12 has a higher usage because the dynamic threshold has increased.

Supossedly VCOPs learns what the "normal" anomolies are and then can tell you when you have something that is true anomoly. What I want to test is once anomolies have established "normal", I want to add an additional spike and that should be able to there is a new anomoly. Basically I want to prove that it can properly identify and alert on true anomolies. Anyone have any ideas?

mark_j · ‎10-14-2014

Dynamic thresholds and anomalies are closely related. Dynamic thresholds are the thresholds that are learned after 7 days and establish tighter bands over the following 4 or so weeks. When a measurement value drops outside that dynamic threshold, THEN it becomes an anomaly alarm. If a value falls within that dynamic threshold, it is not an anomaly. So your recurring data (doesn't matter what time specifically, it just matters if it's a pattern vC Ops can match an algorithm to expect/predict it) that has become established a dynamic threshold won't be considered an anomaly. However, if you've got measurement values that occurs ad-hoc or irregularly, it'll likely pop up as an anomaly alarm.

If you want to test anomalies.. simply make a value (metric value) fall outside the dynamic threshold (indicated by grey band on graphs). vC Ops will call it out.

If you find this or any other answer useful please mark the answer as correct or helpful.

View solution in original post

mark_j · ‎10-14-2014

Dynamic thresholds and anomalies are closely related. Dynamic thresholds are the thresholds that are learned after 7 days and establish tighter bands over the following 4 or so weeks. When a measurement value drops outside that dynamic threshold, THEN it becomes an anomaly alarm. If a value falls within that dynamic threshold, it is not an anomaly. So your recurring data (doesn't matter what time specifically, it just matters if it's a pattern vC Ops can match an algorithm to expect/predict it) that has become established a dynamic threshold won't be considered an anomaly. However, if you've got measurement values that occurs ad-hoc or irregularly, it'll likely pop up as an anomaly alarm.

If you want to test anomalies.. simply make a value (metric value) fall outside the dynamic threshold (indicated by grey band on graphs). vC Ops will call it out.

If you find this or any other answer useful please mark the answer as correct or helpful.

headgeek · ‎10-14-2014

Thanks for a great clear response. It makes sense. So basically there will never be a "Normal" Normal for Anomalies?

So I actually came up with the same idea as your suggestion. I purposely forced a 30 minute cpu 50% spike which exceeds the dynamic threshold grey area. The idea was to be less than the preset thresholds but outside of "Normal". A couple of things happened. First it seemed to temporarily destroyed my dynamic thresholds. I did this on Friday at 6:00 p.m. and checked on Monday and it showed the VM as no longer having "Normal" for the workload. After a few days it came back. Is that what I should expect?

I think I got anomalies but how do I tell? I am a little confused as to how to interpret what I am looking at. Any suggestions of good articles to help with interpretation and use. This is the graph of CPU Usage

It shows it above the threshold as designed but what is the yellow block telling me?.

When I select Events and Anomalies, I get a graph which looks like the following:

So I am assuming that I am getting anomalies but what are they? There must be some text somewhere? Also, If I want to alert on something, I would alert on anomalies? I also noticed that my previous spikes even though they were much larger did not show as anomalies because they were predictable, so that is good.

The guides are pretty mechanical but aren't much help for interpretation. Some good case studies with detailed analysis would be great if you know of any. Thanks again.

mark_j · ‎10-14-2014

The yellow band on your first screenshot is an anomaly. Select the "show values on point" picked and grab the top left corner of that yellow box (warning=yellow). The color box signifies the severity of the alarm.. typically it'll be yellow, with other colors generated (immediate=orange, critical=red) if you're using KPIs in the Custom UI. See pic:

You will get "normal" anom levels eventually, thought it's just a threshold at which health become more significantly impacted by overal anom count. See pic:

If you find this or any other answer useful please mark the answer as correct or helpful.