VMware Modern Apps Community
admin
Immortal
Immortal

Chart API

I need to acquire metric (10 metrics total) values for all available servers over a 1.5 year period.  I need to align the data to an hourly level.  Is there a standard approach to acquire this volume of data.  In testing a query over a 1 month period via Wavefront API Documentation, I receive response code : 504.

0 Kudos
18 Replies
admin
Immortal
Immortal

Hi Peter,

For aggregating large volumes of data over large periods of time your best bet would be to use much smaller time windows (1 week like in the example below), and iterate over these time windows:

import time

import json

import urllib2

import urllib

import datetime

auth_token = "TOKEN"
api_url = "https://<your_domain>.wavefront.com"

TIME_WINDOW_DAYS = 7
start_ts = 1451635200  # 2016-01-01 00:00 -0700
QUERY = "100-ts(\"cpu.cpuidle\")"

output_file = open("query_output.txt", "w")

while start_ts + 60 * 60 * 24 * TIME_WINDOW_DAYS < time.time():

     end_ts = start_ts + 60 * 60 * 24 * TIME_WINDOW_DAYS

     print "Getting data from {} to {}".format(

         datetime.datetime.fromtimestamp(start_ts).strftime('%Y-%m-%d %H:%M %z'),
         datetime.datetime.fromtimestamp(end_ts).strftime('%Y-%m-%d %H:%M %z')

     )

     request_object = {"n": "query", "q": QUERY, "s": start_ts, "e": end_ts, "g": "h", "p": int(24 * TIME_WINDOW_DAYS),

                       "i": False, "autoEvents": False, "summarization": "MEAN", "strict": True}

     req = urllib2.Request("{}/chart/api?{}".format(api_url, urllib.urlencode(request_object)))

     req.add_header('X-AUTH-TOKEN', auth_token)

     response = json.loads(urllib2.urlopen(req).read())

     csv = []

     for ts in response["timeseries"]:

          base_row = ts['host'] + "," + ts['label']

          if 'tags' in ts:

               base_row += ",\"{}\"".format(" ".join(["[{}]=[{}]".format(tag, ts['tags'][tag]) for tag in ts['tags']]))

          for d in ts['data']:

               csv.append("{},{},{}".format(str(d[0]), str(d[1]), base_row))

          output_file.write("\n".join(csv))

          output_file.write("\n")

     start_ts = end_ts

output_file.close()

print "Done!"

0 Kudos
admin
Immortal
Immortal

Vasily,

This script seems very helpful.  I'm actually trying to leverage this .py script now, with the below query, and my output .txt file only shows a few records.  File size <150 bytes.

'align(1h,mean,100-ts("cpu.cpuidle"))'

Any suggestions?

Peter

0 Kudos
admin
Immortal
Immortal

I see that you are trying to retrieve multiple time series - the original script was an example and worked with a single time series only. I've updated the script in my original response - it should work in your particular use case.

PS: It's also probably worth mentioning that align() in your query is not necessary, as it's applied by the query parameters (g=h, summarization=MEAN) anyway.

0 Kudos
admin
Immortal
Immortal

Vasily,

Thank you.  I'm currently testing this solution, however I'm not sure if my machine can handle this data volume.

In the meantime, I have been experimenting retrieval of data aligned to the hour using the cURL command, in conjunction with your JSON to CSV script.  Chart data download (eg CSV)?

Something curious is happening with my below command.  For some reason I'm not aligning hourly, and instead I'm retrieving results every other hour.   I tested this result in the Wavefront app, however when I use the Chart API, I seem to be missing odd hours.

curl -G "https://<domain>.wavefront.com/chart/api?q=align(1h%2Cmean%2Cts(%22diskdata.data%22))&s=1420070400&e=1422748799&g=h&i=false&includeObsoleteMetrics=false&strict=true" -H 'X-AUTH-TOKEN: <token>' --data-urlencode 'q=ts("diskdata.data")' | python <vv script> > <test.txt>

Any idea how I can ensure that I retrieve values by hour?

Peter

0 Kudos
admin
Immortal
Immortal

Peter,

It looks like you have two queries at the same time - one in the URL (align(1h%2Cmean%2Cts(%22diskdata.data%22))), and another one (non-aligned) in the form data: ts("diskdata.data")

Could you please try removing the second one and run the command again?

0 Kudos
admin
Immortal
Immortal

I ran the command below, but am still getting the same results.  Values seem to be returned once every 2 hours instead of the expected 1 hour.

curl -G "https://<domain>.wavefront.com/chart/api?q=align(1h%2Cmean%2Cts(%22diskdata.data%22))&s=1420070400&e=1422748799&g=h&i=false&includeObsoleteMetrics=false&strict=true" -H 'X-AUTH-TOKEN: <token>' | python <vvscript> > <test.txt>

0 Kudos
admin
Immortal
Immortal

Peter,

Please change e=1422748799 to e=14227487800, and it will work as expected!

0 Kudos
admin
Immortal
Immortal

Just want to confirm, these cURL commands should retrieve the same data set, correct?

curl -G 'https://<domain>.wavefront.com/chart/api?s=1420070400&e=1422748800&g=h&i=false&includeObsoleteMetrics=false&strict=true' -H 'X-AUTH-TOKEN: <token>' --data-urlencode 'q=align(1h,mean,ts("diskdata.data"))'  | python <vvscript> > <test.txt>

curl -G "https://<domain>.wavefront.com/chart/api?q=align(1h%2Cmean%2Cts(%22diskdata.data%22))&s=1420070400&e=1422748800&g=h&i=false&includeObsoleteMetrics=false&strict=true" -H 'X-AUTH-TOKEN: <token>' | python <vvscript> > <test.txt>

0 Kudos
admin
Immortal
Immortal

Hi Peter,

Yes, these two cURL commands should give you same result.

The only difference is, unlike the second example, the first command has the --data-urlencode option which performs the URL encoding for you this way you can copy/paste queries from the Wavefront UI with spaces, “ characters and so on without worrying about encoding these yourselves.

Hope this helps,

Salil D

0 Kudos
admin
Immortal
Immortal

Thank you!

On another note, I'm finding an issue in retrieving data via cURL command.  For some reason, I'm only retrieving a few records.  Any thoughts?

curl -G 'https://<domain>.wavefront.com/chart/api?s=1422748800&e=1425168000&g=h&i=false&includeObsoleteMetrics=false&strict=true' -H 'X-AUTH-TOKEN: <token>' --data-urlencode 'q=align(1h,max,ts("diskdata.data"))' > ~file.txt

0 Kudos
admin
Immortal
Immortal

Hi Peter,

These few numbers you see in the response are actually internal stats, not data points - that part probably looks similar to this:

{"keys":27,"points":4075,"summaries":4064,"buffer_keys":10222,"compacted_keys":16,"compacted_points":4064,"latency":0,"queries":20458,"s3_keys":13319,"cpu_ns":889877189,"skipped_compacted_keys":0,"cached_compacted_keys":45301,"query_tasks":0}

So your query actually doesn't return any data points (the "timeseries" part of the response is empty), since it covers the month of February 2015, and these metrics haven't started reporting until late November 2015.

Hope this helps!

0 Kudos
admin
Immortal
Immortal

What's really odd is that I have data retrieved from Wavefront on 04/22/16 that reports from this very metric for the time period of 02/15.  I'm also finding that data from other metrics around this same time (in 2015) is also missing.  A few weeks ago I was also able to retrieve data for this time period too  Any chance there was a data purge for data older than 6 months?

0 Kudos
admin
Immortal
Immortal

We never delete the data, so that's definitely not the case here - we're looking into this issue and we'll keep you posted!

0 Kudos
admin
Immortal
Immortal

vasily@wavefront.com, any update regarding the missing data?  It would be great if I could start retrieving the historical data from a handful of metrics.

0 Kudos
admin
Immortal
Immortal

Hi Peter,

I apologize for the delay! We've made some performance improvements related to your use case right before the weekend, and we wanted to make sure that everything works properly.

To address your immediate concern, please change the "includeObsoleteMetrics" parameter value to "true", and it should work as expected. Please note that in some cases the first query with obsolete metrics enabled might take longer than usual as the cache is being populated, but all subsequent queries should be faster.

Now, the reason why you need to enable this option is somewhat tricky, so please bear with me! "Obsolete metrics" in this context is a bit of a misnomer, as what we're really talking about here is "time series", not "metrics".  A "time series" in Wavefront is a unique combination of a metric name, source name, and all of the point tags with their respective values, if any. By default (with includeObsoleteMetrics option turned off), for any time period, queries will return data only for time series that are considered active now - and the criteria for "active" is "have reported data within the last 4 weeks". 

In your particular case, it looks like around the end of November / beginning of December you guys have started the process of adding point tags to your metrics, and this process continued well into year 2016. In your case, the metric name stayed the same, but the point tag was added so from our perspective these are considered different time series - one stopped reporting and the other one started reporting right after that. As a result, after 4 weeks have passed since the moment the last time series for your metric (diskdata.data) without the point tags stopped reporting, your original query stopped returning any data, as in February 2015 none of the metrics had point tags and none of these original time series are considered active anymore.

Based on the above, when extracting historic data from more than 4 weeks ago, we highly recommend enabling obsolete metrics (unless you specifically want to exclude them).

We apologize for the inconvenience caused - we understand that these nuances may not be obvious and can be very confusing, so we're going to use this as an opportunity to improve our documentation as well.

Hope this helps - please don't hesitate to reach out to our team or to me directly if you have any questions.

0 Kudos
admin
Immortal
Immortal

vasily@wavefront.com, thank you for the comprehensive explanation regarding the "includeObsoleteMetrics." This seemed to do just the trick for "diskdata.data"!  There are two additional metrics that I need historical data from : "cpu.cpuidle" and "memory.used."  I'm noticing that even when using "includeObsoleteMetrics," I'm not retrieving data for "memory.used" prior to Sept. 2015.  Was the performance improvement related to all metrics?

0 Kudos
admin
Immortal
Immortal

Yes, the performance improvement should be across the board - looks like your query for memory.used is timing out, we're looking into it.

0 Kudos
admin
Immortal
Immortal

vasily@wavefront.com. thank you!  Sorry forgot to mention, I'm also using memory.total, too.  This metric also seems to be timing out.

0 Kudos