Mauna Loa Example 2: Incorporating atmospheric measurements

Mon 09 July 2018

This GP example shows how to:

Fit fully Bayesian GPs with NUTS
Model inputs which are themselves uncertain (uncertainty in 'x')
Defining custom mean and covariance functions
Forecast and backcast

Example: Mauna Loa CO$_2$ part 3

Now we look at how to build a changepoint covariance function, using covariance function primitives available in the PyMC3 library. Finally, we'll put it all together to build an accurate GP model of CO2 emissions that combines ice core data with modern atmospheric measurements.

Re-loading the ice core data

import pymc3 as pm
import pandas as pd
import numpy as np
import theano
import theano.tensor as tt

import matplotlib.pyplot as plt
%matplotlib inline

ice = pd.read_csv(pm.get_data("merged_ice_core_yearly.csv"), header=26)
ice.columns = ["year", "CO2"]
ice["CO2"] = ice["CO2"].astype(np.float)

#### DATA AFTER 1958 is an average of ice core and mauna loa data, so remove it
ice = ice[ice["year"] <= 1958]
print("Number of data points:", len(ice))

Number of data points: 111

fig = plt.figure(figsize=(9,4))
ax = plt.gca()

ax.plot(ice.year.values, ice.CO2.values, '.k');
ax.set_xlabel("Year")
ax.set_ylabel("CO2 (ppm)");

png

A custom changepoint covariance function

More complex covariance functions can be constructed by composing base covariance functions in several ways. For instance, two of the most commonly used operations are

The sum of two covariance functions is a covariance function
The product of two covariance functions is a covariance function

We can also construct a covariance function by scaling a base covariance function ($k_b$) by any arbitrary function,

$$ k(x, x) = s(x) k_{\mathrm{b}}(x, x') s(x') \,. $$

The scaling function can be parameterized by known parameters, or unknown parameters can be inferred.

Heaviside step function

To specifically construct a covariance function that describes a changepoint, we could propose a scaling function $s(x)$ that specifies the region where the base covariance is active. The simplest option is the step function,

$$ s(x;\, x_0) = \begin{cases} 0 & x \leq x_0 \\ 1 & x_0 < x \end{cases} $$

which is parameterized by the changepoint $x_0$. The covariance function $s(x; x_0) k_b(x, x') s(x'; x_0)$ is only active in the region $x \leq x_0$.

The PyMC3 contains the ScaledCov covariance function. As arguments, it takes a base covariance, a scaling function, and the tuple of the arguments for the base covariance. To construct this in PyMC3, we first define the scaling function:

def step_function(x, x0, greater=True):
    if greater:
        # s = 1 for x < x_0
        return 0.5 * (tt.sgn(x - x0) + 1.0)
    else:
        return 0.5 * (tt.sgn(x0 - x) + 1.0)

step_function(np.linspace(0, 10, 10), x0=5, greater=True).eval()

array([0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])

step_function(np.linspace(0, 10, 10), x0=5, greater=False).eval()

array([1., 1., 1., 1., 1., 0., 0., 0., 0., 0.])

Then we can define the the following covariance function. We compute it over $x \in (0, 100)$. The base covariance has a lengthscale of 10, and $x_0 = 40$. Since we are using a step function, it is "active" for $x < 40$.

cov = pm.gp.cov.ExpQuad(1, 10)
sc_cov = pm.gp.cov.ScaledCov(1, cov, step_function, (40, False))

x = np.linspace(0, 100, 100)
K = sc_cov(x[:,None]).eval()
m=plt.imshow(K, cmap="magma"); plt.colorbar(m);

png

But his isn't a changepoint covariance function yet. We can add two of these together. For $x < 40$, Lets use a base covariance that is a Matern32 with a lengthscale of 5 and an amplitude of 0.25.

cov1 = pm.gp.cov.ExpQuad(1, 10)
sc_cov1 = pm.gp.cov.ScaledCov(1, cov1, step_function, (40, False))

cov2 = 0.25 * pm.gp.cov.Matern32(1, 5)
sc_cov2 = pm.gp.cov.ScaledCov(1, cov2, step_function, (40, True))

sc_cov = sc_cov1 + sc_cov2

# plot over 0 < x < 100
x = np.linspace(0, 100, 100)
K = sc_cov(x[:,None]).eval()
m=plt.imshow(K, cmap="magma"); plt.colorbar(m);

png

What do samples from the Gaussian process prior with this covariance look like?

prior_samples = np.random.multivariate_normal(np.zeros(100), K, 3).T
plt.plot(x, prior_samples);
plt.axvline(x=40, color="k", alpha=0.5);

png

Before $x = 40$, the function is smooth and slowly changing. After $x = 40$, the samples are less smooth and change quickly.

A gradual change with a sigmoid function

Instead of a sharp cutoff, It is usually more realistic to have a smooth transition. For this we can use the logistic function, shown below:

# a is the slope, b is the location

a = -0.2
b = 40
plt.plot(x, pm.math.invlogit(a*(x - b)).eval(), label="scaling left cov");

a = 0.2
b = 40
plt.plot(x, pm.math.invlogit(a*(x - b)).eval(), label="scaling right cov");
plt.legend();

png

def logistic(x, a, x0):
    # a is the slope, x0 is the location
    return pm.math.invlogit(a*(x - x0))

The same covariance function as before, but with a gradual changepoint is shown below.

cov1 = pm.gp.cov.ExpQuad(1, 10)
sc_cov1 = pm.gp.cov.ScaledCov(1, cov1, logistic, (-0.1, 40))

cov2 = 0.25 * pm.gp.cov.Matern32(1, 5)
sc_cov2 = pm.gp.cov.ScaledCov(1, cov2, logistic, (0.1, 40))

sc_cov = sc_cov1 + sc_cov2

# plot over 0 < x < 100
x = np.linspace(0, 100, 100)
K = sc_cov(x[:,None]).eval()
m=plt.imshow(K, cmap="magma"); plt.colorbar(m);

png

Below, you can see that the transition of the prior functions from one region to the next is more gradual,

prior_samples = np.random.multivariate_normal(np.zeros(100), K, 3).T
plt.plot(x, prior_samples);
plt.axvline(x=40, color="k", alpha=0.5);

png

Lets try this model out instead of the semiparametric changepoint version.

Changepoint covariance model

The features of this model are:

One covariance for short term variation across all time points
The parameter x0 is the location of the industrial revolution. It is given a prior that has most of its support between years 1760 and 1840, centered at 1800.
We can easily use the parameter as the shift parameter in the 2nd degree Polynomial (quadratic) covariance, and as the location of the changepoint in the changepoint covariance.
A changepoint covariance that is ExpQuad prior to the industrial revolution, and ExpQuad + Polynomial(degree=2) afterwards.
We use the same scaling and lengthscale parameters for each of the two base covariances in the changepoint covariance.
Still modeling uncertainty in x as before

with pm.Model() as model:
    η = pm.HalfNormal("η", sd=5)
    ℓ = pm.Gamma("ℓ", alpha=2, beta=0.1)

    # changepoint occurs near the year 1800, sometime between 1760, 1840
    x0 = pm.Normal("x0", mu=18, sd=0.1)
    # the change happens gradually
    a = pm.HalfNormal("a", sd=2)
    # a constant for the 
    c = pm.HalfNormal("c", sd=3)
    # quadratic polynomial scale
    ηq = pm.HalfNormal("ηq", sd=5)

    cov1 = η**2 * pm.gp.cov.ExpQuad(1, ℓ)
    cov2 = η**2 * pm.gp.cov.ExpQuad(1, ℓ) + ηq**2 * pm.gp.cov.Polynomial(1, x0, 2, c)

    # construct changepoint cov
    sc_cov1 = pm.gp.cov.ScaledCov(1, cov1, logistic, (-a, x0))
    sc_cov2 = pm.gp.cov.ScaledCov(1, cov2, logistic, ( a, x0))
    cov_c = sc_cov1 + sc_cov2

    # short term variation
    ηs = pm.HalfNormal("ηs", sd=5)
    ℓs = pm.Gamma("ℓs", alpha=2, beta=1)
    cov_s = ηs**2 * pm.gp.cov.Matern52(1, ℓs)

    gp = pm.gp.Marginal(cov_func=cov_s + cov_c)

    # x location uncertainty (sd = 0.01 is a standard deviation of one year)
    t_diff = pm.Normal("t_diff", mu=0.0, sd=0.01, shape=len(t))
    t_uncert = t_n - t_diff

    # white noise variance
    σ = pm.HalfNormal("σ", sd=5, testval=1)
    y_ = gp.marginal_likelihood("y", X=t_uncert[:,None], y=y_n, noise=σ)

with model:
    tr = pm.sample(500, chains=2, cores=1, nuts_kwargs={"target_accept": 0.95})

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [σ, t_diff, ℓs, ηs, ηq, c, a, x0, ℓ, η]
100%|██████████| 1000/1000 [02:17<00:00,  7.29it/s]
100%|██████████| 1000/1000 [02:02<00:00,  8.18it/s]
The acceptance probability does not match the target. It is 0.8988327928717826, but should be close to 0.8. Try to increase the number of tuning steps.

pm.traceplot(tr, varnames=["η", "ηs", "ℓ", "ℓs", "c", "a", "x0", "σ"]);

png

Predictions

tnew = np.linspace(-100, 2300, 2200)/100

with model:
    fnew = gp.conditional("fnew", Xnew=tnew[:,None])

with model:
    ppc = pm.sample_ppc(tr, samples=100, vars=[fnew])

100%|██████████| 100/100 [17:02<00:00, 10.23s/it]

samples = y_sd * ppc["fnew"] + y_mu

fig = plt.figure(figsize=(12,5))
ax = plt.gca()
pm.gp.util.plot_gp_dist(ax, samples, tnew, plot_samples=True, palette="Blues")
ax.plot(t/100, y, "k.");
ax.set_xticks(np.arange(0,23));
ax.set_xlim([-1, 23]);
ax.set_ylim([250, 450]);

png

The predictions for this model look much more realistic. The product of a 2nd degree polynomial with an ExpQuad looks like a good model to forecast with. It allows for the amount of CO2 to increase in a not-exactly-linear fashion. We can see from the predictions that

The amount of CO2 could increase at a faster rate
The amount of CO2 should increase more or less linearly
It is possible for the CO2 to start to decrease

Incorporating Atmospheric CO2 measurements

Next, we incorporate the CO2 measurements from the Mauna Loa observatory. These data points were taken monthly from atmospheric levels. Unlike the ice core data, there is no uncertainty in these measurements. While modeling both of these data sets together, the value of including the uncertainty in the ice core measurement time will be more apparent.

First let's load in the data, and then plot it alongside the ice core data.

from datetime import datetime as dt
import time

def toYearFraction(date):
    date = pd.to_datetime(date)
    def sinceEpoch(date): # returns seconds since epoch
        return time.mktime(date.timetuple())
    s = sinceEpoch

    year = date.year
    startOfThisYear = dt(year=year, month=1, day=1)
    startOfNextYear = dt(year=year+1, month=1, day=1)

    yearElapsed = s(date) - s(startOfThisYear)
    yearDuration = s(startOfNextYear) - s(startOfThisYear)
    fraction = yearElapsed/yearDuration

    return date.year + fraction

airdata = pd.read_csv(pm.get_data("monthly_in_situ_co2_mlo.csv"), header=56)

# - replace -99.99 with NaN
airdata.replace(to_replace=-99.99, value=np.nan, inplace=True)

# fix column names
cols = ["year", "month", "--", "--", "CO2", "seasonaly_adjusted", "fit",
        "seasonally_adjusted_fit", "CO2_filled", "seasonally_adjusted_filled"]
airdata.columns = cols
cols.remove("--"); cols.remove("--")
airdata = airdata[cols]

# drop rows with nan
airdata.dropna(inplace=True)

# fix time index
airdata["day"] = 15
airdata.index = pd.to_datetime(airdata[["year", "month", "day"]])
airdata["year"] = [toYearFraction(date) for date in airdata.index.values]
cols.remove("month")
airdata = airdata[cols]

air = airdata[["year", "CO2"]]
air.head(5)

	year	CO2
1958-03-15	1958.200000	315.69
1958-04-15	1958.284932	317.46
1958-05-15	1958.367009	317.50
1958-07-15	1958.534132	315.86
1958-08-15	1958.619064	314.93

Like was done in the first notebook, we reserve the data from 2004 onwards as the test set.

sep_idx = air.index.searchsorted(pd.to_datetime("2003-12-15"))
air_test = air.iloc[sep_idx:, :]
air = air.iloc[:sep_idx+1, :]

plt.plot(air.year.values, air.CO2.values, ".b", label="atmospheric CO2");
plt.plot(ice.year.values, ice.CO2.values, ".", color="c", label="ice core CO2");
plt.legend();

png

If we zoom in on the late 1950's, we can see that the atmospheric data has a seasonal component, while the ice core data does not.

plt.plot(air.year.values, air.CO2.values, ".b", label="atmospheric CO2");
plt.plot(ice.year.values, ice.CO2.values, ".", color="c", label="ice core CO2");
plt.xlim([1949, 1965]);
plt.ylim([305, 325]);
plt.legend();

png

Since the ice core data isn't measured accurately, it wont be possible to backcast the seasonal component unless we model uncertainty in x.

To model both the data together, we will combine the model we've built up using the ice core data, and combine it with elements from the previous notebook on the Mauna Loa data. From the previous notebook we will additionally include the

The Perioidic, seasonal component
The RatQuad covariance for short range, annual scale variations

Also, since we are using two different data sets, there should be two different y-direction uncertainties, one for the ice core data, and one for the atmospheric data. Do accomplish this, we make a custom WhiteNoise covariance function that has two σ parameters.

All custom covariance functions need to have the same three methods defined, __init__, diag, and full. full returns the full covariance, given either X or X and a different Xs. diag returns only the diagonal, and __init__ saves the input parameters.

class CustomWhiteNoise(pm.gp.cov.Covariance):
    """ Custom White Noise covariance
    - sigma1 is applied to the first n1 points in the data
    - sigma2 is applied to the next n2 points in the data

    The total number of data points n = n1 + n2
    """
    def __init__(self, sigma1, sigma2, n1, n2):
        super(CustomWhiteNoise, self).__init__(1, None)
        self.sigma1 = sigma1
        self.sigma2 = sigma2
        self.n1 = n1
        self.n2 = n2

    def diag(self, X):
        d1 = tt.alloc(tt.square(self.sigma1), self.n1)
        d2 = tt.alloc(tt.square(self.sigma2), self.n2)
        return tt.concatenate((d1, d2), 0)

    def full(self, X, Xs=None):
        if Xs is None:
            return tt.diag(self.diag(X))
        else:
            return tt.alloc(0.0, X.shape[0], Xs.shape[0])

Next we need to organize and combine the two data sets. Remeber that the unit on the x-axis is centuries, not years.

# form dataset, stack t and co2 measurements
t = np.concatenate((ice.year.values, air.year.values), 0)
y = np.concatenate((ice.CO2.values, air.CO2.values), 0)

y_mu, y_sd = np.mean(ice.CO2.values[0:50]), np.std(y)
y_n = (y - y_mu) / y_sd
t_n = t * 0.01

The specification of the model is below. Since the data set is larger, we find the MAP estimate, instead of doing MCMC. We also choose our priors for the hyperparameters more carefully. For the changepoint covariance, we model the post-industrial revolution data with an ExpQuad covariance that has the same longer lengthscale as before the industrial revolution. The idea is that whatever process was at work before, is still there after. But then we add the product of a Polynomial(degree=2) and a Matern52. We fix the lengthscale of the Matern52 to two. Since it has only been about two centuries since the industrial revolution, we force the Polynomial component to decay at that time scale. This forces the uncertainty to rise at this time scale.

with pm.Model() as model:
    ηc = pm.Gamma("ηc", alpha=3, beta=2)
    ℓc = pm.Gamma("ℓc", alpha=10, beta=1)

    # changepoint occurs near the year 1800, sometime between 1760, 1840
    x0 = pm.Normal("x0", mu=18, sd=0.1)
    # the change happens gradually
    a = pm.Gamma("a", alpha=3, beta=1)
    # constant offset
    c = pm.HalfNormal("c", sd=2)

    # quadratic polynomial scale
    ηq = pm.HalfNormal("ηq", sd=1)
    ℓq = 2.0 # 2 century impact, since we only have 2 C of post IR data 

    cov1 = ηc**2 * pm.gp.cov.ExpQuad(1, ℓc)
    cov2 = ηc**2 * pm.gp.cov.ExpQuad(1, ℓc) + \
           ηq**2 * pm.gp.cov.Polynomial(1, x0, 2, c) * pm.gp.cov.Matern52(1, ℓq) # ~2 century impact

    # construct changepoint cov
    sc_cov1 = pm.gp.cov.ScaledCov(1, cov1, logistic, (-a, x0))
    sc_cov2 = pm.gp.cov.ScaledCov(1, cov2, logistic, ( a, x0))
    gp_c = pm.gp.Marginal(cov_func=sc_cov1 + sc_cov2)

    # short term variation
    ηs = pm.HalfNormal("ηs", sd=3)
    ℓs = pm.Gamma("ℓs", alpha=5, beta=100)
    α = pm.Gamma("α", alpha=4, beta=1)
    cov_s = ηs**2 * pm.gp.cov.RatQuad(1, α, ℓs)
    gp_s = pm.gp.Marginal(cov_func=cov_s)

    # medium term variation
    ηm = pm.HalfNormal("ηm", sd=5)
    ℓm = pm.Gamma("ℓm", alpha=2, beta=3)
    cov_m = ηm**2 * pm.gp.cov.ExpQuad(1, ℓm)
    gp_m = pm.gp.Marginal(cov_func=cov_m)

    ## periodic
    ηp = pm.HalfNormal("ηp", sd=2)
    ℓp_decay = pm.Gamma("ℓp_decay", alpha=40, beta=0.1)
    ℓp_smooth = pm.Normal("ℓp_smooth ", mu=1.0, sd=0.05)
    period = 1 * 0.01  # we know the period is annual
    cov_p = ηp**2 * pm.gp.cov.Periodic(1, period, ℓp_smooth) \
                  * pm.gp.cov.ExpQuad(1, ℓp_decay)
    gp_p = pm.gp.Marginal(cov_func=cov_p) 

    gp = gp_c + gp_m + gp_s + gp_p

    # - x location uncertainty (sd = 0.01 is a standard deviation of one year)
    # - only the first 111 points are the ice core data
    # - sd = 0.002 says the point may be the given data +- about half a year
    t_mu = t_n[:111]
    t_diff = pm.Normal("t_diff", mu=0.0, sd=0.002, shape=len(t_mu))
    t_uncert = t_mu - t_diff
    t_combined = tt.concatenate((t_uncert, t_n[111:]), 0)

    # Noise covariance, using boundary avoiding priors for MAP estimation
    σ1 = pm.Gamma("σ1", alpha=3, beta=50)
    σ2 = pm.Gamma("σ2", alpha=3, beta=50)
    η_noise = pm.HalfNormal("η_noise", sd=1)
    ℓ_noise = pm.Gamma("ℓ_noise", alpha=2, beta=200)
    cov_noise = η_noise**2 * pm.gp.cov.Matern32(1, ℓ_noise) +\
                CustomWhiteNoise(σ1, σ2, 111, 545)


    y_ = gp.marginal_likelihood("y", X=t_combined[:,None], y=y_n, noise=cov_noise)

with model:
    mp = pm.find_MAP(method="BFGS")

logp = 2,447.7, ||grad|| = 2.2812: 100%|██████████| 398/398 [03:24<00:00,  1.95it/s]

Lets look at the results. First we'll look at the predictions out to the year 2050.

tnew = np.linspace(1700, 2040, 3000) * 0.01
mu, var = gp.predict(tnew[:,None], point=mp, diag=True)

# rescale
mu_s = y_sd * mu + y_mu
sd = np.sqrt(var)*y_sd

plt.figure(figsize=(12,5))

plt.plot(t, y, 'k.', label="observed data");
plt.plot(air_test.year.values, air_test.CO2.values, ".", color="orange", label="test set data");

plt.plot(100*tnew, mu_s, alpha=0.5);
plt.fill_between(100*tnew, mu_s - 2*sd, mu_s + 2*sd, alpha=0.5, label="2σ prediction interval");
plt.axhline(y=400, color="k", alpha=0.7, linestyle=":")
plt.ylabel("CO2 [ppm]")
plt.xlabel("year");
plt.title("Fit and forecast")
plt.legend();
plt.xlim([1700,2040]);
plt.ylim([260, 460]);

png

Lets zoom in for a closer look at the uncertainty intervals at the area around when the CO2 levels first cross 400 ppm.

plt.figure(figsize=(12,5))

plt.plot(t, y, 'k.');
plt.plot(100*(t_n[:111] - mp["t_diff"]), y[:111], ".y");
plt.plot(air_test.year.values, air_test.CO2.values, ".", color="orange");

plt.plot(100*tnew, mu_s, alpha=0.5);
plt.fill_between(100*tnew, mu_s - 2*sd, mu_s + 2*sd, alpha=0.5);
plt.axhline(y=400, color="k", alpha=0.7, linestyle=":")
plt.ylabel("CO2 [ppm]")
plt.xlabel("year")
plt.title("Prediction at the test data")
plt.xlim([2005,2025]);
plt.ylim([370, 430]);

png

If you compare this to the first Mauna Loa example notebook, the predictions are much better. The date when the CO2 level first hits 400 is predicted more accurately. This improvement in the bias is due to including the Polynomial * Matern52 term.

We can also look at what the model says about CO2 levels back in time. Since we allowed the x measurements to have uncertainty, we are able to fit the seasonal component back in time. Lets look at the fit of the model

tnew = np.linspace(11, 32, 500) * 0.01
mu, var = gp.predict(tnew[:,None], point=mp, diag=True)

# rescale
mu_s = y_sd * mu + y_mu
sd = np.sqrt(var)*y_sd

plt.figure(figsize=(12,5))

plt.plot(100*(t_n[:111] - mp["t_diff"]), y[:111], "oy", alpha=0.5, label="shifted observations");
plt.plot(t, y, 'k.', label="observed data");
plt.plot(air_test.year.values, air_test.CO2.values, ".", color="orange");

plt.plot(100*tnew, mu_s, alpha=0.5);
plt.fill_between(100*tnew, mu_s - 2*sd, mu_s + 2*sd, alpha=0.5, label="2σ uncertainty");
plt.legend(loc="upper center");
plt.ylabel("CO2 [ppm]")
plt.xlabel("year")
plt.xlim([12, 31]);
plt.xticks(np.arange(12,32));
plt.ylim([272, 283]);

png

We can see that far back in time, we can backcast even the seasonal behavior. The ~half year of uncertainty in the x locations allows them to be slightly shifted onto the nearest part of the seasonal oscillation for that year. The magnitude of the oscillation is the same as it is now.

tnew = np.linspace(-20, 0, 300) * 0.01
mu, var = gp.predict(tnew[:,None], point=mp, diag=True)

# rescale
mu_s = y_sd * mu + y_mu
sd = np.sqrt(var)*y_sd

plt.figure(figsize=(12,5))

plt.plot(100*tnew, mu_s, alpha=0.5);
plt.fill_between(100*tnew, mu_s - 2*sd, mu_s + 2*sd, alpha=0.5, label="2σ uncertainty");
plt.legend();
plt.ylabel("CO2 [ppm]")
plt.xlabel("year")
plt.xlim([-20, 0]);
plt.ylim([272, 283]);

png

As we go back before the year zero BCE, the backcasted seasonality remains intact, as one would expect.