Lab 04

Regression with Bayesian Methods

Module 1
Labs
Published

Fri., Sep. 15

1 Problem statement

Use linear regression to relate the mean annual runoff of several streams (runoff) with nine other variables:

  1. the precipitation falling at the gage (precip),
  2. the drainage area of the basin (area),
  3. the average slope of the basin (slope)
  4. the length of the drainage basin (length)
  5. the perimeter of the basin (perim)
  6. the diameter of the largest circle which could be inscribed within the drainage basin (diameter)
  7. the shape factor of the basin (shape_factor)
  8. the stream frequency – the ratio of the number of streams in the basin to the basin area (stream_freq)
  9. the relief ratio for the basin (relief_ratio)
Credit

This is a light modification of Example 11.3 from

Helsel, D. R., Hirsch, R. M., Ryberg, K. R., Archfield, S. A., & Gilroy, E. J. (2020). Statistical methods in water resources. Techniques and Methods. U.S. Geological Survey. https://doi.org/10.3133/tm4A3

2 Setup

First, we load the packages that we will use

using CSV
using DataFrames
using Distributions
using DynamicHMC
using LaTeXStrings
using LinearAlgebra: I
using Plots
using StatsPlots
using Turing

2.1 Data

Next, we need to load in the data.

runoff_df = CSV.read("data/runoff.csv", DataFrame)
display(size(runoff_df))
first(runoff_df, 3)
(13, 10)
3×10 DataFrame
Row runoff precip area slope length perim diameter shape_factor stream_freq relief_ratio
Float64 Float64 Float64 Int64 Float64 Float64 Float64 Float64 Float64 Int64
1 17.38 44.37 2.21 50 2.38 7.93 0.91 0.38 1.36 332
2 14.62 44.09 2.53 7 2.55 7.65 1.23 0.48 2.37 55
3 15.48 41.25 5.63 19 3.11 11.61 2.11 0.57 2.31 77

3 Model explanation and exploration

3.1 Model specification

We are going to fit models of the form \[ y_i \sim \mathcal{N} \left( \beta X_i , \sigma \right). \] We only have 13 observations of 10 variables, so we’ll need to be careful with our data. Before diving into the model, let’s look at the data.

3.2 Pairwise correlations

One simple way to look at the data is to look at the pairwise correlations between the variables. The following function plot_correlation will plot a heatmap of the pairwise correlations.

Code
function plot_correlation(df::AbstractDataFrame)
    varnames = names(df)
    correlations = cor(Matrix(df))
    P = size(correlations, 1)

    p = heatmap(
        varnames, # x labels
        varnames, # y labels
        correlations; # variable
        aspect_ratio=1, # 1:1 aspect ratio
        xrotation=30, # rotate x ticks
        size=(600, 600), # specify the size of the plot
        title="Pairwise Correlations",
        clims=(-1, 1),
        cmap=:PuOr, # select a diverging color map
    )

    # add the exact text
    ann = [
        (i - 0.5, j - 0.5, text(round(correlations[i, j]; digits=2), 8, :white, :center))
        for i in 1:P for j in 1:P
    ]
    annotate!(p, ann; linecolor=:white)
    return p
end
1
cor expects a Matrix input and will throw an error if we give it a DataFrame. Matrix(df) just extracts numeric values for our DataFrame. This won’t work well if we have non-numeric columns, but we don’t here.
plot_correlation(runoff_df)
Task

Describe what you learn from this plot in the specific context of modeling the relationship between runoff and the other variables. What can you not learn from this plot?

4 Linear regression

We can now fit a linear regression model to a single variable. You can pick whichever one you like; describe why you picked it.

We need priors on the intercept, slope, and standard deviation. Making some plausible choices, our full model is: \[ \begin{align} \alpha &\sim \mathcal{N}(0, 10) \\ \beta &\sim \mathcal{N}(0, 10) \\ \sigma &\sim \mathcal{N}^+(0, 10) \\ \mu &= \alpha + \beta X \\ y &\sim \mathcal{N}(\mu, \sigma) \end{align} \] where \(\mathcal{N}{+}\) is the half-normal distribution, i.e. a Normal distribution truncated at zero.

Task

Update the above model to reflect the priors you end up choosing. You are welcome to use other distributions or parameter values. You will need to modify the LaTeX math, but only very minor edits are required.

4.1 Define model

We can implement this in Turing as follows:

#| output: false
@model function linear_reg(X::AbstractVector, y::AbstractVector)

    # First, define all parameters
    # they need priors so we use a ~
    α # the intercept
    β # the slope
    σ # the standard deviation

    # We can calculate intermediate quantities in our function
    # Turing knows this isn't a parameter because we use = not ~
    μ = α .+ β .* X

    # likelihood model
    y ...
end

::: {.callout-important} ## Task

Implement the model above so that it is consistent with the mathematical model written above. You will iterate between this and the next step a few times. You only need to keep your final model, but briefly describe what you did and why. :::## Prior predictive check

Before we fit the model, we run a prior predictive check. This lets us check whether the prior assumptions we have encoded into the model are consisten with our beliefs. First we sample from the prior:

y = vec(runoff_df[!, :runoff])
X = vec(runoff_df[!, :YOUR_VAR_HERE]) # which variable did you choose?
ppc_linear = let
    model = linear_reg(X, y) # call the model
    sampler = Prior() # we want to sample from the prior
    nsamples = 10_000
    sample(model, sampler, nsamples; drop_warmup=true)
end

We can use the lm_predict function to generate predictions from the prior.

Code
function lm_predict(chn::Turing.MCMCChains.Chains, X::AbstractVecOrMat, N::Int)
    p = get_params(chn)
    α = vec(p[:α])
    β = if isa(p[:β], Tuple)
        β_matrix = hcat(vec.(p[:β])...)
        [β_matrix[i, :] for i in 1:size(β_matrix, 1)]
    else
        vec(p[:β])
    end
    idx = rand(1:length(α), N)
    ŷ = [
        α[i] .+ X * β[i] for i in idx
    ]
    if N == 1
        ŷ = ŷ[1]
    end
    return ŷ
end
1
This function extracts the parameters from the chain
2
If we use multiple chains, then p[:α] is a Matrix. We convert it to a vector so that it’s “flat”.
3
If we have multiple predictors (multivariate linear regression) then p[:β] is a tuple of vectors. We convert it into a vector of vectors so that β[i] is a vector of coefficients for the \(i\)th sample.
4
We randomly select N samples from the posterior.
5
We calculate the predictions for each of the N samples.
6
If N is 1, then we return a vector, not a vector of vectors.

We leverage this function to generate predictions from the prior and plot them.

X_pred = ... # specify the range of values for
p1 = plot(xlabel="Precip", ylabel="Runoff", title="Prior predictive check")
for i in 1:250
    y_pred = lm_predict(ppc_linear, X_pred, 1)
    plot!(p1, X_pred, y_pred; color=:black, alpha=0.1, label=false)
end
p1
Task

Fill in the missing lines in the above blocks and run.

4.2 Inference

Once we have a prior that we’re happy with, we can use our model to generate samples from the posterior distribution. We draw 4 chains of 5000 samples. If you get a weird error here, it may be because of threading. Try the mapreduce based approach here rather than waste time debugging.

chn_linear = let
    model = ... # FILL IN
    sampler = externalsampler(DynamicHMC.NUTS())
    n_per_chain = 5000
    nchains = 4
    sample(model, sampler, MCMCThreads(), n_per_chain, nchains; drop_warmup=true)
end
summarystats(chn_linear)

Based on \(\hat{R}\) values close to 1, we have some evidence that our chains have converged. We can also check the trace plots:

... # Fill in to generate the trace plot
Task

Fill in the missing lines in the above blocks. Describe what you learn from the trace plot.

4.3 Posterior predictive checks

We can again plot our predictions, this time using the posterior samples. We add the data to the plot as well.

p2 = plot(xlabel="Precip", ylabel="Runoff", title="Posterior") # blank plot
pred_linear = lm_predict(chn_linear, X_pred, 500) # generate predictions
for y_pred in pred_linear
    plot!(p2, X_pred, y_pred; color=:black, alpha=0.1, label=false) # add each prediction to a plot
end
# add the observations with scatter!
p2
Task

Add the observations to the plot above with scatter!

5 Multiple predictors

By using a model with only one predictor, we may be throwing away usful information from other variables.

#| output: false
@model function linear_reg(X::Matrix, y::AbstractVector)

    N, P = size(X) # number of observations, number of predictors

    # priors
    α ~ Normal(0, 5) # intercept
    β ~ MvNormal(zeros(P), 5.0 * I)
    σ ~ LogNormal(2, 0.5)

    # intermediate quantity
    # deterministic transformation
    μ = α .+ X * β

    # likelihood
    y .~ Normal.(μ, σ)

    return μ
end
1
β is a vector, so we use a multivariate normal distribution. Here, we assume that the predictors are uncorrelated, so we use a diagonal covariance matrix.
2
We use a log-normal distribution for \(\sigma\) because it is a positive quantity. This is an alternative to a truncated Normal.
Task

Uncomment the above model and then modify it to use better priors, iterating with the next step

5.1 Prior predictive check

X = Matrix(runoff_df[:, Not(:runoff)]) # all columns except runoff
y = vec(runoff_df[!, :runoff]) # runoff 

ppc_multi = let
    ... # FILL IN
end

How should we visualize the prior predictive check? A simple way is to visualize the implied distribution of \(y\) given observed \(X\):

histogram(
    vcat(lm_predict(ppc_multi, X, 500)...),
    xlabel=L"$y$",
    ylabel=L"Implied $p(y)$",
    label=false,
    normalize=:pdf,
    yticks=false,
    title="Prior predictive check",
)
Task

Modify the above code to draw samples from the prior and save as ppc_multi.

5.2 Inference

Now that our priors seem plausible (we could a;ways do better, but that’s not the point of this lab), we can fit the model.julia chn_multi = let ... end # summary stats

Task

Draw the samples from the posterior and save as chn_multi. Display the summary statistics. What do you learn?

5.3 Posterior predictive check

How should we visualize these inferences? It’s hard to plot a single \(x\) vs \(y\) when we have multiple \(x\). There are many possible plots, but a good one is to plot the predicted vs actual values of \(y\). To make the plot more readable, we’ll add some jitter to the data. Here’s a function to do that:

Code
function jitter(x, y; ϵx=0.1, ϵy=0.1)
    x_jit = x .+ rand(Normal(0, ϵx), size(x))
    y_jit = y .+ rand(Normal(0, ϵy), size(y))
    return x_jit, y_jit
end

Now we can create our scatter plot.

p4 = plot(; xlabel="Predicted Runoff", ylabel="Actual Runoff", aspect_ratio=1) # blank plot
for i in 1:500 # loop through
    pred = lm_predict(chn_multi, X, 1)
    xplt, yplt = jitter(pred, y; ϵx=0.1, ϵy=0.1)
    scatter!(xplt, yplt; color=:black, alpha=0.5, label=false, markersize=0.5)
end
Plots.abline!(p4, 1, 0; color=:red, label="1:1 line")
p4
Task

The above block shouldn’t need to be modified. Add the curly brackets when you are ready. What do you learn from this plot?

6 Submission

As you did for previous labs, you will submit your lab by pushing it to GitHub. In addition, submit the DOCX or PDF file (as for lab 2) to Canvas.