Hi justin_rowles! I believe I understand your question correctly and would love to help. I see that the alert you have questions about has already been updated to the alternative version you referred to, so I will focus on the time where the original version was being used.
The following link shows that while values are being returned at times, there are several gaps of missing data which correlate with sections of 'No Data'. I added align(1m,) to the query to signify minutely summarized values being evaluated for an alert.
For each alert check, our system is looking at the last "Minutes to Fire" window (in this case 15 minutes) and is checking to see if there is at least 1 true value reported and no false values. Since there is no specific condition associated with this alert (e.g. > 24), any non-zero reported value is considered true and any zero reported value is considered false. Each time the alert fired, there was at least 1 non-zero reported value present in the last 15 minutes. Since gaps of missing data are considered neither true nor false in our system, this behavior represents a time when the alert should fire. For those gaps of missing data, you can apply a default(0,) around the entire equation. This would replace those gaps with a value of 0, which would be considered false in this scenario.
I also noticed that the alert seemed to go back and forth from fired to resolved quite often. This seems to be tied to the 'Minutes to Fire' and 'Minutes to Resolved' fields being 15 minutes and 2 minutes, respectively. I'd be glad to provide some additional context around this topic as well if you had questions about that as well.
Did this explanation help to answer your question?
Thank you for that, it's improved my understanding. I had naively thought that a trigger would fire if 'continuously positive for n minutes', but I can see that that is not valid, as the data is discrete.
It makes sense for 'any positives and no negatives' for a trigger period causing an alert to start, which combined with the 'no data is not negative data' clears up a lot of my understanding problem.
I can also see that wrapping in default(0, lowpass()) would have fixed the problem, by forcing the 'no data' to be 'not a problem'.
However, it doesn't seem right that I have the event at, for example 11:34:26:
The 'unfire' rule would appear to be 'no positives during a period'. This means that I can see a new event caused by data which occurred in a previous, now closed, event. This is definitely counter-intuitive.
Perhaps the fire rule could be 'any positives and no negatives during a trigger period unless a subsequent un-trigger period of no positives has been seen.