Kicking the Tires: Bluemix Insights for Weather
April 14, 2016
This post comes from a Jupyter notebook I wrote to help a colleague learn how to access a Bluemix service from Python. Along the way, I learned about pandas.io.json.json_normalize
and how great it is at turning nested JSON structures into flatter DataFrames. (It deserves a short post of its own.)
In this notebook, we're going to poke at the Bluemix Insights for Weather service from Python. We'll look at what kinds of queries we can make and do a few basic things with the data. We'll keep a running commentary that can serve as an introductory tutorial for developers who want to go off and build more sophisticated apps and analyses using the service.
Get some handy libs¶
Let's start by getting some handy Python libraries for making HTTP requests and looking at the data. You can install these with typical Python package management tools like pip install <name>
or conda install <name>
.
# comes with Python
import json
# third-party
import requests
from requests.auth import HTTPBasicAuth
import geocoder
import pandas as pd
Get credentials¶
Next, we need to provision an Insights for Weather service instance on Bluemix and get the access credentials. We can follow the instructions about Adding Insights for Weather to your application to do so.
To keep our credentials out of this notebook, we can copy them from the Bluemix UI put them in a weather_creds.json
file alongside this notebook. The credentials JSON should look something like the following.
{
"credentials": {
"username": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"password": "yyyyyyyyyy",
"host": "twcservice.mybluemix.net",
"port": 443,
"url": "https://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:yyyyyyyyyy@twcservice.mybluemix.net"
}
}
Now we can load that file into memory without showing its contents here.
with open('weather_creds.json') as f:
creds = json.load(f)
Try a request¶
Now we're ready to try a request against the REST API. We'll do this using the requests Python package.
Note: At the time of this writing, the paths documented in the REST API section of the docs are incorrect. Remove the `/geocode` section of the path and it should work. The other samples and API references in the docs are correct.
First we need to build a basic auth object with our service username and password.
auth = HTTPBasicAuth(creds['credentials']['username'],
creds['credentials']['password'])
Then we can build the base URL of the service.
url = 'https://{}/api/weather/v2'.format(creds['credentials']['host'])
url
The API documentation says we need to pass latitude and longitude coordinates with our queries. Instead of hardcoding one here, we'll use the Google geocoder to get the lat/lng for my hometown.
g = geocoder.google('Durham, NC')
g
With these in hand, we'll build up a dictionary which requests will convert into URL query args for us. Besides the geocode, we'll include other parameters like the desired unit of measurement and language of the response. (These are all noted in the API docs.)
params = {
'geocode' : '{:.2f},{:.2f}'.format(g.lat, g.lng),
'units': 'e',
'language': 'en-US'
}
Let's make the request to one of the documented resources to get the 10 day forecast for Durham, NC. We pass the query parameters and our basic auth information.
resp = requests.get(url+'/forecast/daily/10day', params=params, auth=auth)
We can sanity check that the response status is OK easily. If anything went wrong with our request, the next line of code will raise an exception
resp.raise_for_status()
Look at the response¶
If we made it this far, our call was successful. Let's look at the results. We parse the JSON body of the response into a Python dictionary.
body = resp.json()
Let's see what keys are in the body without printing the whole thing.
body.keys()
And let's take a closer look at the forecasts. Instead of printing the raw, potentially nested Python dictionary, we'll ask pandas to take a crack at it and give us a nicer table view of the data in our notebook.
df = pd.io.json.json_normalize(body['forecasts'])
How many columns do we have?
len(df.columns)
Quite a few. Let's make sure pandas shows them all.
pd.options.display.max_columns = 125
And now we can emit the entire thing as a nice HTML table for inspection.
df
Of course, if we weren't in a notebook, we probably wouldn't want to do this. Rather, we'd want to filter the DataFrame or the original JSON down to whatever values we needed in our application. Let's do a bit of that now.
Look at some specific columns¶
Now that we have an idea of all the available columns, let's dive into a few.
One of the columns appears to be a human readable forecast. Before we show it, let's make sure pandas doesn't ellipsize the text.
pd.options.display.max_colwidth = 125
Now we can look at the narrative along side the day it describes.
df[['day.alt_daypart_name', 'narrative']]
There's a few mentions of the word golf
in the big table columns. Let's find those columns in particular.
df.columns[df.columns.str.contains('golf')]
Let's look at those alongside the day names.
df[['day.alt_daypart_name'] + df.columns[df.columns.str.contains('golf')].tolist()]
The day time golf category and index are interesting. Night golf is ... well ... unexplained. 🌛
How about temperatures? Let's get a summary of the values for the next ten days.
df[['max_temp', 'min_temp']].describe()
Try another endpoint¶
So far we've poked at the 10-day forecast resource. Let's try another just to see how similar / different it is. Here we'll fetch historical observations for the same location.
resp = requests.get(url+'/observations/timeseries/24hour', auth=auth, params=params)
resp.raise_for_status()
body = resp.json()
body.keys()
This time, the key of interest is observations
.
obs = pd.io.json.json_normalize(body['observations'])
Fewer columns this time. Let's poke at the blunt_phrase
and `valid_time_gmt.
obs.columns
obs.valid_time_gmt.head()
We can make the times more human readable. They're seconds since the epoch expressed in UTC.
obs['time_utc'] = pd.to_datetime(obs.valid_time_gmt, unit='s', utc=True)
Now we can check the summary of the available observations within the last 24 hours. We'll reverse them so that they're sorted newest to oldest.
obs[['blunt_phrase', 'time_utc']].iloc[::-1]
🌼 I guess it's seasonal right now. 🌼
Go further¶
We'll stop here. What we did in this notebook is a prelude to what's possible. Here's some ideas for further experimentation:
- Write a simple Python function (or functions) that wrap the few lines of request logic needed to query any of the API endpoints.
- Funnel observations into a persistent store to collect them over time. Use that historical data to try to build a predictive model (e.g., using scikit-learn).
- Combine weather data with data from other sources in domain specific notebooks or applications.