Evaluating vCOPs 5 and have a few questions (this is all using the out-of-the-box vSphere GUI, not custom):
I get that the system learns "normal" over time and if enough metrics drift from "normal" it creates an alert. Well, how long before those new metrics become the "new normal"?
You can suppress those alerts, but cannot cancel them. So, how are people handling cases where the alert reflects what you know to be a new normal for the server? For example, I had a server that was pre-production and had low load. Once it went production, load went up and caused an alert. This load level is fine. But how long will that alert stay active because it's deemed not normal? I could suppress it, but don't know how long to do it before it decides its normal.
Also, other products have an ability to set a monitored object in "maintenance mode" to prevent alerts when you're about to make some changes. I don't see that in vCOPs. Am I missing it?
Its does take time for a change in performance to become normal. Usually about 3 cycles. As you describe a major change in workload I would suspect it will take a couple weeks where the health score will climb back to the 90s range.
Alerts can be cancelled. On the alerts overview page its the fifth from the right between color row icon and suspend clock like icon. New normal and the alerts with those objects can be suppressed for a period of time...say a week, then let the system let you know if they still persist. Even if it takes 2-3 weeks to regain "normal" health the alerts will be less likely to re-occur as the system is learning the new normal.
vcops does have buttons to start and end maintenance modes.
Good luck. Hope you find it as useful as I do. Much better than waiting for the classic failure to begin taking corrective action.
Again, I'm only refering to the vSphere UI, which does not appear to have any function to put a VM into "maintenance mode" to prevent alerts.
Also, "fault" alerts can be canceled, but not other types of alerts. A Badge changing color will generate an alert that can't be canceled. If you try it with the button you point to it will say "Please select fault alerts to cancel." Seems the best answer is to supress it long enough for the new "normal" to kick in.