Solved: Comparison of dynamic metric names for alerts

thberry10 · ‎07-27-2017

I'm trying to create an alert that will trigger whenever the difference between two metrics is greater than a set threshold. The issue is that these metric names are dynamic - based upon ever changing version numbers.

Example:

com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.2_24_42.200
vs.
com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.2_24_43.200

I tried using taggify to get these values, but I feel like I need an example of how to use taggify in conjunction with the comparision expression.
taggify(ts('com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.*.*'),metric,version,8,".")

So what I would like to happen is for this alert to check all metrics in com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.*.* for the past hour or so. If there are multiple versions present then check the difference between the different expression values and alert if greater than the threshold. (Furthermore, I would want to check it for all environments and artifacts, not just qa-21 and calculators-web.)

I'm a bit confused how to obtain the values from the taggify expression and use those values as part of the namespace in the difference comparison expression. I'm assuming this all needs to fit in the 'Condition' field of alert.

thberry10 · ‎07-27-2017

Hi Humsheen,

You are on the right track with taggify! Assuming you would like to be alerted on the difference between versions for every environment/artifact/http status code combination, the simplest approach would be to calculate the difference between the highest and lowest values for each of these groups since the only variable left would be the version:

max(ts(...) as data, environment, artifact, status) - min($data, environment, artifact, status) > 0

To make this query work without having these point tags already defined on your metrics, you can use multiple taggify() functions to extract them:

taggify(taggify(taggify(ts('com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.*'),metric,environment,6),metric,artifact,7),metric,status,9)

Example alert condition: https://metrics.wavefront.com/u/9JpK0hVt3j (click on "Create Alert" under "delta alert" query line) - please let us know if this is what you're looking for!

Having said that, we strongly advise against storing any kind of data that evolves over time (like versions in your example) in the metric name and use point tags for this purpose instead, i.e

com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.200 version=2_24_42

instead of

com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.2_24_42.200

As the number of versions grows, your metrics namespace keeps growing with it, potentially causing query performance issues in the future, and having version in a point tag will help you avoid that. There are a couple of options available, please shoot us an email at support@wavefront.com and we can help you out!

-Vasily

View solution in original post

thberry10 · ‎07-27-2017

Hi Humsheen,

You are on the right track with taggify! Assuming you would like to be alerted on the difference between versions for every environment/artifact/http status code combination, the simplest approach would be to calculate the difference between the highest and lowest values for each of these groups since the only variable left would be the version:

max(ts(...) as data, environment, artifact, status) - min($data, environment, artifact, status) > 0

To make this query work without having these point tags already defined on your metrics, you can use multiple taggify() functions to extract them:

taggify(taggify(taggify(ts('com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.*'),metric,environment,6),metric,artifact,7),metric,status,9)

Example alert condition: https://metrics.wavefront.com/u/9JpK0hVt3j (click on "Create Alert" under "delta alert" query line) - please let us know if this is what you're looking for!

Having said that, we strongly advise against storing any kind of data that evolves over time (like versions in your example) in the metric name and use point tags for this purpose instead, i.e

com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.200 version=2_24_42

instead of

com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.2_24_42.200

As the number of versions grows, your metrics namespace keeps growing with it, potentially causing query performance issues in the future, and having version in a point tag will help you avoid that. There are a couple of options available, please shoot us an email at support@wavefront.com and we can help you out!

-Vasily

thberry10 · ‎07-27-2017

Thanks for the example alert condition! It helps a lot. I was originally trying to do something like this, so I was confused on how to use taggify: https://metrics.wavefront.com/u/qXS4JYN36v

The reason we added the version number to the name was due to the documentation mentioning to not use tags that had high cardinality due to performance issues. This seems to be conflicting information to what you are saying. Over the long run, would having the version number be a tag take less of a performance hit than having it in the name?

"Point tags are meant to be used only when the expected number of possible tag values (cardinality) for a given tag key is < 1000 over the lifetime of that tag key. While Wavefront does not enforce a hard limit on the number of distinct tag values, using point tags to store high-cardinality data like timestamps, login emails, or web session ids will eventually cause performance issues with querying your data."

thberry10 · ‎07-27-2017

Humsheen,

I see - in your example of taggify() the issue was that these tags only exist within the same query. Here's how you would use a taggify() function with a subsequent filter: https://metrics.wavefront.com/u/svDTR52t2P

As for cardinality issues, it can be very confusing indeed - please bear with me and I'll try my best to explain:

The fastest way to retrieve data in Wavefront is to query by a known metric name + source name. If your metrics have additional dimensions (i.e. are further broken down by point tags), then even if you are querying for only 1 particular combination of point tags, we still have to load all the data points for under that metric+source combination. So, when the number of distinct timeseries within metric+source exceeds 1000, there may be a noticeable performance hit - the absolute worst case hypothetical scenario would be having just 1 metric name and 1 source name for your entire system, and use nothing but point tags to structure the data, as every query would be forced to scan every data point reported within the specified timeframe, which can be incredibly slow.

As for your use case, as long as the number of distinct versions within a 4 week window stays reasonably low, it would not cause any issues.

On the other hand, if you keep the version name in the metric name, querying for data relevant to a particular version by using a full metric name ts(com.edmunds.ops.traffic.performance.status_code.qa_21.calculators_web.2_24_42.200) will always be really fast - however, your queries are far more likely to use a wildcard - ts('com.edmunds.ops.traffic.performance.status_code.*'), especially since you mentioned alerts, and these may get slower over time as the metric namespace grows. Queries would still finish within reasonable time, but you'll be able to notice performance difference. Hope this makes sense - please let us know if this is helpful!

-Vasily

All

Comparison of dynamic metric names for alerts