Mauna Loa Example 2: Ice core data

Wed 13 June 2018

This GP example shows how to:

Fit fully Bayesian GPs with NUTS
Model inputs which are themselves uncertain (uncertainty in 'x')
Defining custom mean and covariance functions
Semiparametric Gaussian process models

Example: Mauna Loa CO2 continued

Earth Systems Research Laboratory

Ice Core Data

The first data set we'll look at is CO2 measurments from ice core data. This data goes back to the year 13 AD. The data after the year 1958 is an average of ice core measurments and more accurate data taken from Mauna Loa.

This data is less accurate than the Mauna Loa atmospheric CO2 measurments. The dates of the ice core measurements have some uncertainty. They may be accurate on a yearly level due to how the ice layers on it self every year, but the date isn't likely to be reliable as to the season when the measurment was taken. Also, the CO2 level observed may be some sort of average of the overall yearly level. As we saw in the previous example, there is a strong seasonal component in CO2 levels that won't be observable in this data set.

In PyMC3, we can easily include both errors in \(y\) and errors in \(x\). To demonstrate this, we remove the latter part of the data (which are averaged with Mauna Loa readings) so we have only the ice core measurements. We fit the Gaussian process model using the No-U-Turn MCMC sampler.

import pymc3 as pm
import pandas as pd
import numpy as np
import theano
import theano.tensor as tt

import matplotlib.pyplot as plt
%matplotlib inline

/home/bill/anaconda3/envs/pymc3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

icedata = "/home/bill/Dropbox/projects/maunaloa/merged_ice_core_yearly.csv"
ice = pd.read_csv(icedata, comment='"', header=None)
ice.columns = ["year", "CO2"]
ice["CO2"] = ice["CO2"].astype(np.float)

#### DATA AFTER 1958 is an average of ice core and mauna loa data, so remove it
ice = ice[ice["year"] <= 1958]
print("Number of data points:", len(ice))

Number of data points: 111

fig = plt.figure(figsize=(9,4))
ax = plt.gca()

ax.plot(ice.year.values, ice.CO2.values, '.k');
ax.set_xlabel("Year")
ax.set_ylabel("CO2 (ppm)");

png

The industrial revolution era occured around the years 1760 to 1840. This point is clearly visible in the graph, where CO2 levels rise dramatically after being fairly stationary at around 280 ppm for over a thousand years.

Uncertainty in 'x'

To model uncertainty in \(x\), or time, we place a prior distribution over each of the observation dates. So that the prior is standardized, we specifically use a PyMC3 random variable to model the difference between the date given in the data set, and it's error. We assume that these differences are normal with mean zero, and standard deviation of 1 year.

For just the ice core data, the uncertainty in \(x\) is not very important. Later when we include the more accurate Mauna Loa measurments, allowing for this uncertainty will enable us to back-cast seasonal variation.

t = ice.year.values
y = ice.CO2.values

# normalize the CO2 readings prior to fitting the model 
y_mu, y_sd = np.mean(y), np.std(y)
y_n = (y - y_mu) / y_sd

# scale t to have units of centuries
t_n = t / 100

We use an informative prior on the lengthscale that places most of the mass between a few and 20 centuries.

fig = plt.figure(figsize=(8,5))
ax = plt.gca()
ax.hist(pm.Gamma.dist(alpha=2, beta=0.25).random(size=10000), 100)
ax.set_xlabel("Time (centuries)")
ax.set_title("Lengthscale prior");

png

with pm.Model() as model:
    η = pm.HalfNormal("η", sd=5)
    ℓ = pm.Gamma("ℓ", alpha=2, beta=0.25)
    cov = η**2 * pm.gp.cov.Matern32(1, ℓ)
    gp = pm.gp.Marginal(cov_func=cov)

    # x location uncertainty (sd = 0.01 is a standard deviation of one year)
    t_diff = pm.Normal("t_diff", mu=0.0, sd=0.01, shape=len(t))
    t_uncert = t_n - t_diff

    # white noise variance
    σ = pm.HalfNormal("σ", sd=5, testval=1)
    y_ = gp.marginal_likelihood("y", X=t_uncert[:,None], y=y_n, noise=σ)

Next we can sample with the NUTS MCMC algorithm. We run two chains but set the number of cores to one, since the linear algebra libraries used internally by Theano are multicore.

with model:
    tr = pm.sample(500, chains=2, cores=1)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [σ, t_diff, ℓ, η]
100%|██████████| 1000/1000 [01:08<00:00, 14.66it/s]
100%|██████████| 1000/1000 [01:05<00:00, 15.32it/s]
The acceptance probability does not match the target. It is 0.8871619176036385, but should be close to 0.8. Try to increase the number of tuning steps.

pm.traceplot(tr, varnames=["t_diff"]);

png

In the traceplot for t_diff, we can see that the posterior peaks for the different inputs haven't moved much, but the uncertainty in location is accounted for by the sampling.

The posterior distributions for the other unknown hyperparameters is below.

pm.traceplot(tr, varnames=["η", "ℓ", "σ"]);

png

Predictions

tnew = np.linspace(-100, 2150, 2000)/100
with model:
    fnew = gp.conditional("fnew", Xnew=tnew[:,None])

with model:
    ppc = pm.sample_ppc(tr, samples=500, vars=[fnew])

100%|██████████| 500/500 [02:30<00:00,  3.32it/s]

samples = y_sd * ppc["fnew"] + y_mu

fig = plt.figure(figsize=(12,5))
ax = plt.gca()
pm.gp.util.plot_gp_dist(ax, samples, tnew, plot_samples=True, palette="Blues")
ax.plot(t/100, y, "k.");
ax.set_xticks(np.arange(0,23));
ax.set_xlim([-1, 22]);

png

Two features are apparent in this plot. One is the little ice age, whose effects on CO2 occurs from around 1600 to 1800. The next is the industrial revolution, when people began releasing large amounts of CO2 into the atmosphere.

Semiparametric Gaussian process

The forecast past the latest data point in 1958 rises, then flattens, then dips back downwards. Should we trust this forecast? We know it hasn't been born out (see the previous notebook) as CO2 levels have continued to rise.

We didn't specify a mean function in our model, so we've assumed that our GP has a mean of zero. This means that as we forecast into the future, the function will eventually return to zero. Is this reasonable in this case? There have been no global events that suggest that atmospheric CO2 will not continue on its current trend.

A linear model for changepoints

We adopt the formulation used by Facebooks prophet time series model. This is a linear peicewise function, where each segments endpoints are restricted to be connect to one another. Some example functions are plotted below.

def dm_changepoints(t, changepoints_t):
    A = np.zeros((len(t), len(changepoints_t)))
    for i, t_i in enumerate(changepoints_t):
        A[t >= t_i, i] = 1 
    return A

For later use, we reprogram this function using symbolic theano variables. The code is a bit inscrutible, but it returns the same thing as dm_changepoitns while avoiding the use of a loop.

def dm_changepoints_theano(X, changepoints_t):
    return (0.5 * (1.0 + tt.sgn(tt.tile(X, (1,len(changepoints_t))) - changepoints_t)))

From looking at the graph, some possible locations for changepoints are at 1600, 1800 and maybe 1900. These bookend the little ice age, the start of the industrial revolution, and the start of more modern industrial practices.

changepoints_t = np.array([16, 18, 19])

A = dm_changepoints(t_n, changepoints_t)

There are several parameters (which we will estimate), some test values and a plot of the resulting function is shown below

# base growth rate, or initial slopw
k = 0.0

# offset
m = 0.1

# slope parameters
delta = np.array([0.05, -0.1, 0.3])

x = (k + np.dot(A, delta))*t_n + (m + np.dot(A, -changepoints_t * delta))
plt.plot(t, x);

png

Custom mean function

We could encode this mean function directly, but if we wrap it inside of a Mean object, then it easier to use other Gaussian process functionality, like the .conditional and .predict methods. Look here for more information on custom mean and covariance functions. We only need to define __init__ and __call__ functions.

class PeicewiseLinear(pm.gp.mean.Mean):
    def __init__(self, changepoints, k, m, delta):
        self.changepoints = changepoints
        self.k = k
        self.m = m
        self.delta = delta

    def __call__(self, X):
        # X are the x locations, or time points
        A = dm_changepoints_theano(X, self.changepoints)
        return (self.k + tt.dot(A, self.delta))*X.flatten() + (self.m + tt.dot(A, -self.changepoints * self.delta))

It is inefficient to recreate A every time the mean function is evaluated, but we'll need to do this when the number of inputs changes when making predictions.

Next is the updated model with the changepoint mean function. Like Prophet, we use a heavy tailed Laplace prior on the slope parameters.

with pm.Model() as model:
    η = pm.HalfNormal("η", sd=5)
    ℓ = pm.Gamma("ℓ", alpha=2, beta=0.25)
    cov = η**2 * pm.gp.cov.Matern32(1, ℓ)

    # peicewise linear mean function
    k = pm.Normal("k", mu=0, sd=2)
    m = pm.Normal("m", mu=0, sd=2)
    delta = pm.Laplace("delta", mu=0, b=1, shape=len(changepoints_t))
    mean = PeicewiseLinear(changepoints_t, k, m, delta)   

    # include mean function in GP constructor
    gp = pm.gp.Marginal(cov_func=cov, mean_func=mean)

    # x location uncertainty (sd = 0.01 is a standard deviation of one year)
    t_diff = pm.Normal("t_diff", mu=0.0, sd=0.01, shape=len(t))
    t_uncert = t_n - t_diff

    # white noise variance
    σ = pm.HalfNormal("σ", sd=5, testval=1)
    y_ = gp.marginal_likelihood("y", X=t_uncert[:,None], y=y_n, noise=σ)

with model:
    tr = pm.sample(500, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [σ, t_diff, delta, m, k, ℓ, η]
100%|██████████| 1000/1000 [03:25<00:00,  4.86it/s]
100%|██████████| 1000/1000 [03:10<00:00,  5.24it/s]
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.

pm.traceplot(tr, varnames=["η", "ℓ", "σ", "k", "m", "delta"]);

png

Predictions

tnew = np.linspace(-100, 2200, 2000)/100

with model:
    fnew = gp.conditional("fnew", Xnew=tnew[:,None])

with model:
    ppc = pm.sample_ppc(tr, samples=500, vars=[fnew])

100%|██████████| 500/500 [02:30<00:00,  3.31it/s]

samples = y_sd * ppc["fnew"] + y_mu

fig = plt.figure(figsize=(12,5))
ax = plt.gca()
pm.gp.util.plot_gp_dist(ax, samples, tnew, plot_samples=True, palette="Blues")
ax.plot(t/100, y, "k.");
ax.set_xticks(np.arange(0,23));
ax.set_xlim([-1, 22]);

png

These results look better. But at what times should the changepoints be at? Instead of using a changepoint in the mean function, we can also specify this behavior as a covariance function. One benefit of the latter formulation is that the changepoint can be a more realistic smooth transition, instead of a discrete breakpoint. In the next section, we'll look at how to do this, and then bring these things together to combine the ice core data with the Mauna Loa dataset. We'll be able to produce realistic forecasts, and backcast the seasonality.