Some of the metrics from my services record ongoing progress towards completing a large, long-term task. While running, each instance of this service reports, among other metrics, a count of items completed. As a long-term, stateful process, it checkpoints its progress as it goes so that when restarted (or when it crashes) it can resume close to the point where it left off.
Naturally, the counter metrics it reports reset when it restarts (or crashes).
I would like a long-term sum of these counts. So far, I’ve not been able to accomplish this. I have tried applying integral to the sum of the rate of all instances, but the result is clearly not accurate.
Can someone suggest a means to track such a long-term count?
Integral() over the sum() of rate() sounds like the right approach - could you please share the short URL to your chart so we could take a look and figure out why are you getting an inaccurate result? You can always send it to email@example.com if you prefer not to share it here. Please let us know!