I've created an alert using the below expression. However, we're investigating ways to prevent the initial alert from firing when intentionally terminating an node or when autoscaling brings up a new node to replace the problem node.
mcount(5m, ts(my.metric)) = 0 and mcount(1h, ts(my.metric)) !=0
Hi Eric,
The best way to account for terminated nodes would be to tag the sources (using our API) as "active" when nodes are provisioned, and remove that tag when they are terminated. The query in this case can be even simpler: mcount(5m, ts(my.metric, tag=active)) = 0
Please let us know if this helps!
-Vasily
Hi Eric,
The best way to account for terminated nodes would be to tag the sources (using our API) as "active" when nodes are provisioned, and remove that tag when they are terminated. The query in this case can be even simpler: mcount(5m, ts(my.metric, tag=active)) = 0
Please let us know if this helps!
-Vasily