using Pkg
= dirname(@__FILE__)
lab_dir Pkg.activate(lab_dir)
# Pkg.instantiate() # uncomment this the first time you run the lab to install packages, then comment it back
- 1
- Get the directory path where this notebook file is located
Iterative prior specification, Bayesian GEV workflow, Two-station comparison with ArviZ.jl
CEVE 543 Fall 2025
Fri., Sep. 19
julia --project=.
in the terminal, then run using Pkg; Pkg.instantiate()
to install required packages. The easiest method is to type ]
in the REPL to enter package mode, then type instantiate
.quarto render index.qmd
in your terminal - it should create both PDF and HTML documents.This lab guides you through implementing a complete Bayesian workflow for extreme value analysis:
As with previous labs, you do not need to write code from scratch – focus on understanding the workflow, modifying parameters, and interpreting results.
To submit your work, push your changes to GitHub, render to PDF, and upload the PDF to Canvas.
Delete this entire section (including the “Do Before Lab” and “Do During Lab” sections above) and replace it with your analysis responses. Use the numbered format below for your final submission.
Response 1: Prior Evolution and Justification How did your understanding of Houston’s extreme rainfall evolve through the iterative prior specification process? What knowledge sources proved most valuable for extreme value analysis?
Response 2: Data Learning and Posterior Insights How did the observed data change your prior beliefs about GEV parameters? What does this reveal about the balance between domain knowledge and empirical evidence in Bayesian analysis?
Response 3: Two-Station Bayesian Comparison Should the same priors apply to both stations? How do posterior differences relate to geographic and climatic factors, and what are the implications for regional analysis?
Response 4: Uncertainty Quantification and Model Validation How does Bayesian uncertainty quantification compare to MLE approaches? What do the diagnostic results tell you about model reliability?
Response 5: Bayesian Workflow for PS1 How will this iterative workflow experience help you tackle regional extreme value analysis in PS1? What challenges do you anticipate?
Instead of using the package manager to activate the environment each time, we can do it programmatically. This approach ensures we’re using the correct package versions and may help you avoid some common environment issues.
We can load in some packages for Bayesian analysis. See Turing and ArviZ docs for more details.
We use Makie with the CairoMakie backend for high-quality plotting and set it to output images as SVG files for crisp rendering:
We load some utility packages for data manipulation and downloading files. See the TidierData docs for more details on the @chain
macro syntax.
Next, we include some utility functions from previous labs. Note that this file loads several additional packages that are not explicitly loaded here.
Finally, it’s good practice to set a random seed for reproducible results across different runs:
We’ll use the same Texas precipitation dataset from Labs 2 and 3, using a let...end
block to encapsulate the download and reading logic so that variables don’t leak into the global scope.
let...end
block for local scoping of temporary variables
let...end
blocks keep variables scoped locally, so if you try to access precip_fname
or url
outside the block, you’ll get an error.
Next, choose the same station you used in Lab 3 for your analysis.
my_stnid = 111 # REQUIRED: replace with your chosen station ID
my_station = @chain stations begin
@filter(stnid == !!my_stnid)
first
end
my_rainfall = @chain rainfall_data begin
@filter(stnid == !!my_stnid)
@arrange(date)
end
station_time_series = let
fig = Figure(size = (1000, 400))
ax = Axis(fig[1, 1], xlabel = "Year", ylabel = "Annual Max Rainfall (inches)",
title = "Annual Maximum Rainfall at $(my_station.name), TX")
years = my_rainfall.year
rain = ustrip.(u"inch", my_rainfall.rainfall)
lines!(ax, years, rain, color = :blue)
scatter!(ax, years, rain, color = :blue, markersize = 8)
fig
end
Before looking at any data, think about what you know about extreme rainfall in Houston. What’s your frame of reference? Are you thinking about:
As a starting point, I wrote down some initial very weak priors. However, they are likely not very realistic for Houston’s climate. We will iteratively refine them based on prior predictive checks and domain knowledge.
Prior predictive checking asks us to simulate data from our prior distributions and see if the simulated data looks reasonable. While the typical approach is to simulate y_{\text{rep}} \sim p(y | \theta), we can also look at the return level curves, which we can compute directly from the parameters.
First, we need to draw samples from our prior distribution. Turing helps us with this by allowing us to run MCMC with no data, which effectively samples from the prior only.
function load_or_sample(fname, model; overwrite = false, n_chains = 4, samples_per_chain = 2000, sampler = NUTS(), threading = MCMCThreads(), rng = rng)
idata = try
@assert !overwrite "Reading from cache disabled by overwrite=true"
idata = ArviZ.from_netcdf(fname)
@info "Loaded cached prior samples from $fname"
return idata
catch
chains = sample(
model,
sampler,
threading,
Int(ceil(samples_per_chain * n_chains)),
n_chains, # number of chains
verbose = false,
)
idata = ArviZ.from_mcmcchains(chains)
ArviZ.to_netcdf(idata, fname)
@info "Sampled and cached prior samples to $fname"
return idata
end
end
overwrite=true
to force re-sampling from the prior; set to false
to load cached samples
We can see that this returns an object of type InferenceData
. Click on the triangles to expand and see more. (Annoyingly, even though we sampled from the prior only, ArviZ still calls it “posterior” unless we do a rather painful workaround Let’s not get hung up on that.) Extract the parameter samples and create the prior predictive visualization. Each line represents a different possible return level curve from our prior beliefs.
let
fig = Figure(size = (1200, 600))
ax = Axis(fig[1, 1], xlabel = "Return Period (years)", ylabel = "Return Level (inches)",
title = "Prior v1 Predictive Distribution", xscale = log10, xticks = [1, 2, 5, 10, 25, 50, 100, 250])
rts = logrange(1.1, 250, 500)
for i in rand(1:length(prior_GEVs_v1), 250)
gev = prior_GEVs_v1[i]
add_return_level_curve!(ax, gev, rts; color = (:blue, 0.125))
end
ylims!(ax, 0, 75)
fig
end
Are these return level curves reasonable for Houston’s climate, based on your domain knowledge?
The first iteration showed that parameter-based priors can produce unrealistic scenarios. One of the key challenges is that we write a prior over the parameters themselves, but they are not in fact independent. Instead of thinking about abstract GEV parameters (\mu, \sigma, \xi), let’s specify priors on quantities we understand: return levels.
Return levels are much more intuitive for domain experts:
We encode these believes as a Julia data type, called a struct
, and pass in the return period in years, the mean return level, and the standard deviation (uncertainty) around that mean. Under the hood, it converts these to an Inverse Gamma distribution, which is lower-bounded at zero and has a long right tail, which is appropriate for rainfall amounts.
Start by defining your return level beliefs.
We can pass these into our Turing model
@model function gev_model_quantile_priors(y; return_level_priors = [])
μ ~ Normal(3.0, 3.0)
log_σ ~ Normal(0.0, 1.0)
ξ ~ Normal(0.0, 0.3)
σ = exp(log_σ)
dist = GeneralizedExtremeValue(μ, σ, ξ)
# Apply return level constraints
for prior in return_level_priors
rl = quantile(dist, prior.quantile)
if rl > 0.1
Turing.@addlogprob!(loglikelihood(prior.distribution, rl))
else
Turing.@addlogprob!(-Inf)
end
end
# Data likelihood
if length(y) > 0
y .~ dist
end
end
As before, we generate these samples
prior_idata_v2 = let
fname = joinpath(lab_dir, "prior_v2.nc")
model = gev_model_quantile_priors([]; return_level_priors = return_level_priors)
overwrite = true
load_or_sample(fname, model; overwrite = overwrite)
end
prior_GEVs_v2 = vec(GeneralizedExtremeValue.(prior_idata_v2.posterior.μ, exp.(prior_idata_v2.posterior.log_σ), prior_idata_v2.posterior.ξ))
We can compare these two prior versions side by side
fig_prior_comparison = let
rts = logrange(1.1, 250, 500)
xticks = [1, 2, 5, 10, 25, 50, 100, 250]
fig = Figure(size = (1200, 600))
ax1 = Axis(fig[1, 1], xlabel = "Return Period (years)", ylabel = "Return Level (inches)", title = "Prior v1 Predictive Distribution", xscale = log10, xticks = xticks)
posterior_bands!(ax1, prior_GEVs_v1, rts; color = (:blue, 0.2), ci = 0.90)
posterior_mean_curve!(ax1, prior_GEVs_v1, rts; color = :blue, linewidth = 3)
ax2 = Axis(fig[1, 2], xlabel = "Return Period (years)", ylabel = "Return Level (inches)", title = "Prior v2 Predictive Distribution", xscale = log10, xticks = xticks)
posterior_bands!(ax2, prior_GEVs_v2, rts; color = (:blue, 0.2), ci = 0.90)
posterior_mean_curve!(ax2, prior_GEVs_v2, rts; color = :blue, linewidth = 3)
linkaxes!(ax1, ax2)
fig
end
Do these make sense? Are the uncertainties too tight or too wide? Go back and revise your return_level_priors
if needed. When you’re happy with your answers, set overwrite=false
so you can load cached results in the future.
Now we move from prior-only analysis to combining our prior beliefs with the observed rainfall data!
First, let’s prepare the actual rainfall data for our primary station. We’ll define the model with the return level priors you specified above. It is very important to drop the missing values from your observed data – otherwise Turing will treat the missing values as parameters to sample, which is useful for imputation but not what we want here.
Before comparing the full posterior distribution to our prior beliefs, let’s also compute point estimates for comparison. Maximum likelihood estimation (MLE) finds the single parameter values that maximize the likelihood of observing our data. Maximum a posteriori (MAP) estimation finds the parameter values that maximize the posterior probability (combining likelihood and prior).
We can look at optim_result
from the MLE optimization to see if it thinks it converged.
retcode: Success
u: 3-element Vector{Float64}:
2.9449769752950496
-0.2716554451713945
0.338027952805801
Similarly for MAP. This is working fine, but it’s sometimes helpful to provide an initial guess for the parameters to help the optimization. Just make sure your initial guess is reasonable. Also, if your optimization routinely fails to converge without an initial guess, it may indicate problems with your model.
retcode: Success
u: 3-element Vector{Float64}:
2.962608886867419
-0.25580321837713194
0.3051870639511596
For comparison, let’s also compute MAP estimates using the original model without return level priors:
retcode: Success
u: 3-element Vector{Float64}:
2.9725070624159926
-0.25055114127355005
0.2932796794199498
We can save the resulting GEV distributions for later plotting
mle_dist = GeneralizedExtremeValue(mle_estimate.values[1], exp(mle_estimate.values[2]), mle_estimate.values[3])
map_dist = GeneralizedExtremeValue(map_estimate.values[1], exp(map_estimate.values[2]), map_estimate.values[3])
map_dist_original = GeneralizedExtremeValue(map_estimate_original.values[1], exp(map_estimate_original.values[2]), map_estimate_original.values[3])
Let’s compare the prior and posterior distributions visually.
let
fig = Figure(size = (900, 500))
rts = logrange(1.1, 250, 500)
ax1 = Axis(fig[1, 1], xlabel = "Return Period (years)", ylabel = "Return Level (inches)",
title = "Return Level Uncertainty: Prior vs Posterior", xscale = log10, xticks = [1, 2, 5, 10, 25, 50, 100, 250])
posterior_bands!(ax1, prior_GEVs_v2, rts; color = (:blue, 0.2), ci = 0.90, label = "Prior 90% CI")
posterior_bands!(ax1, posterior_GEVs, rts; color = (:orange, 0.4), ci = 0.90, label = "Posterior 90% CI")
posterior_mean_curve!(ax1, posterior_GEVs, rts; color = :blue, linewidth = 3, label = "Posterior Mean")
mean_return_levels = [quantile(mle_dist, 1 - 1 / T) for T in rts]
lines!(ax1, rts, mean_return_levels, color = :red, linewidth = 3, label = "MLE")
axislegend(ax1; position = :lt)
fig
end
Let’s also compare the MAP estimates from both models:
let
fig = Figure(size = (900, 500))
rts = logrange(1.1, 250, 500)
ax1 = Axis(fig[1, 1], xlabel = "Return Period (years)", ylabel = "Return Level (inches)",
title = "Point Estimates Comparison", xscale = log10, xticks = [1, 2, 5, 10, 25, 50, 100, 250])
posterior_bands!(ax1, prior_GEVs_v2, rts; color = (:blue, 0.1), ci = 0.90, label = "Prior 90% CI")
posterior_bands!(ax1, posterior_GEVs, rts; color = (:orange, 0.3), ci = 0.90, label = "Posterior 90% CI")
mle_return_levels = [quantile(mle_dist, 1 - 1 / T) for T in rts]
map_return_levels = [quantile(map_dist, 1 - 1 / T) for T in rts]
map_original_return_levels = [quantile(map_dist_original, 1 - 1 / T) for T in rts]
lines!(ax1, rts, mle_return_levels, color = :red, linewidth = 3, label = "MLE")
lines!(ax1, rts, map_return_levels, color = :green, linewidth = 3, label = "MAP (with return level priors)")
lines!(ax1, rts, map_original_return_levels, color = :purple, linewidth = 3, label = "MAP (generic priors only)")
axislegend(ax1; position = :lt)
fig
end
We can see that the MAP estimate with return levels is, in this case, extremely insensitive to the priors, although both are a little bit different than the MLE.
To make sure that our model is reliable, we need to check that our MCMC sampling converged and that our model fits the data well. We can use automatically generated summary statistics to do this:
mean | std | eti94 | ess_tail | ess_bulk | rhat | mcse_mean | mcse_std | |
---|---|---|---|---|---|---|---|---|
μ | 2.966 | 0.111 | 2.77 .. 3.19 | 16936 | 13548 | 1.00 | 0.00096 | 0.00067 |
ξ | 0.298 | 0.091 | 0.128 .. 0.472 | 11790 | 11925 | 1.00 | 0.00084 | 0.00059 |
log_σ | -0.230 | 0.120 | -0.449 .. -0.00173 | 14234 | 13160 | 1.00 | 0.0010 | 0.00072 |
We can look for:
ess_tail
and ess_bulk
) that are reasonably large (as close to the total number of samples as possible)These are not guarantees of convergence, but they indicate the absence of obvious problems. \hat{R} > 1.01 or low effective sample sizes suggest the chains haven’t mixed well.
We can also look at trace plots and marginal densities for each parameter.
function plot_trace_diagnostics(idata, title_prefix = "")
fig = Figure(size = (1200, 400))
# Extract chain x draw arrays for each parameter
μ_arr = Array(idata.posterior.μ)
log_σ_arr = Array(idata.posterior.log_σ)
ξ_arr = Array(idata.posterior.ξ)
ndraws, nchains = size(μ_arr)
# Trace plots per chain
ax1 = Axis(fig[1, 1], xlabel = "Iteration", ylabel = "μ",
title = "$title_prefix Location Parameter Trace")
for c in 1:nchains
lines!(ax1, 1:ndraws, μ_arr[:, c], label = "Chain $(c)")
end
axislegend(ax1; position = :lb)
ax2 = Axis(fig[1, 2], xlabel = "Iteration", ylabel = "log(σ)",
title = "$title_prefix Log-Scale Parameter Trace")
for c in 1:nchains
lines!(ax2, 1:ndraws, log_σ_arr[:, c])
end
ax3 = Axis(fig[1, 3], xlabel = "Iteration", ylabel = "ξ",
title = "$title_prefix Shape Parameter Trace")
for c in 1:nchains
lines!(ax3, 1:ndraws, ξ_arr[:, c])
end
return fig
end
fig_trace_primary = plot_trace_diagnostics(posterior_idata, "Primary Station:")
function plot_marginal_densities(idata, title_prefix = "")
fig = Figure(size = (1200, 400))
# Extract chain x draw arrays for each parameter
μ_arr = Array(idata.posterior.μ)
log_σ_arr = Array(idata.posterior.log_σ)
ξ_arr = Array(idata.posterior.ξ)
# Marginal densities aggregated across chains
ax1 = Axis(fig[1, 1], xlabel = "μ", ylabel = "Density",
title = "$title_prefix Location Posterior")
hist!(ax1, vec(μ_arr), bins = 50, normalization = :pdf, color = (:blue, 0.7))
ax2 = Axis(fig[1, 2], xlabel = "log(σ)", ylabel = "Density",
title = "$title_prefix Log-Scale Posterior")
hist!(ax2, vec(log_σ_arr), bins = 50, normalization = :pdf, color = (:blue, 0.7))
ax3 = Axis(fig[1, 3], xlabel = "ξ", ylabel = "Density",
title = "$title_prefix Shape Posterior")
hist!(ax3, vec(ξ_arr), bins = 50, normalization = :pdf, color = (:blue, 0.7))
return fig
end
fig_densities_primary = plot_marginal_densities(posterior_idata, "Primary Station:")
In lab 3, you compared MLE return levels between two stations. Now we’ll do a full Bayesian comparison using the same framework.
Row | stnid | noaa_id | name | state | latitude | longitude | years_of_data |
---|---|---|---|---|---|---|---|
Int64 | String | String | String | Float64 | Float64 | Int64 | |
1 | 742 | 41-9916 | WRIGHT PATMAN DM & LK | TX | 33.3039 | -94.1583 | 70 |
2 | 451 | 41-5229 | LINDEN | TX | 33.0161 | -94.3675 | 77 |
3 | 478 | 41-5667 | MAUD | TX | 33.3322 | -94.3436 | 78 |
4 | 682 | 41-8942 | TEXARKANA | TX | 33.4367 | -94.0772 | 49 |
5 | 518 | 41-6270 | NEW BOSTON | TX | 33.4547 | -94.4089 | 44 |
We will pick one of these five nearest stations for comparison
comparison_stnid = 478 # MODIFY THIS AND CHOOSE YOUR COMPARISON
comparison_station = @chain stations begin
@filter(stnid == !!comparison_stnid)
first
end
comparison_data = @chain rainfall_data begin
@filter(stnid == !!comparison_stnid)
@arrange(date)
end
# plot both stations' data for visual comparison
function plot_two_stations(station1, data1, station2, data2)
fig = Figure(size = (800, 400))
ax = Axis(fig[1, 1], xlabel = "Year", ylabel = "Annual Max Rainfall (inches)",
title = "Annual Maximum Rainfall: $(station1.name) vs $(station2.name)")
years1 = data1.year
rain1 = ustrip.(u"inch", data1.rainfall)
lines!(ax, years1, rain1, color = :purple, label = station1.name)
scatter!(ax, years1, rain1, color = :purple, markersize = 8)
years2 = data2.year
rain2 = ustrip.(u"inch", data2.rainfall)
lines!(ax, years2, rain2, color = :orange, label = station2.name)
scatter!(ax, years2, rain2, color = :orange, markersize = 8)
axislegend(ax; position = :rt)
return fig
end
plot_two_stations(my_station, my_rainfall, comparison_station, comparison_data)
We will use the same model and priors for the comparison station, assuming similar climate.
posterior_idata_comp = let
fname = joinpath(lab_dir, "posterior_data_comp.nc")
y_obs_comp = collect(skipmissing(ustrip.(u"inch", comparison_data.rainfall)))
model = gev_model_quantile_priors(y_obs_comp; return_level_priors = return_level_priors)
overwrite = true
load_or_sample(fname, model; overwrite = overwrite)
end
posterior_GEVs_comp = vec(GeneralizedExtremeValue.(posterior_idata_comp.posterior.μ, exp.(posterior_idata_comp.posterior.log_σ), posterior_idata_comp.posterior.ξ))
We conduct the same diagnostics for the comparison station
We can compare the return level uncertainties between the two stations
fig_rl_comp = let
rts = logrange(1.1, 250, 500)
xticks = [1, 2, 5, 10, 25, 50, 100, 250]
fig = Figure(size = (1200, 600))
ax1 = Axis(fig[1, 1], xlabel = "Return Period (years)", ylabel = "Return Level (inches)",
title = "$(my_station.name)", xscale = log10, xticks = xticks)
posterior_bands!(ax1, posterior_GEVs, rts; color = (:blue, 0.2), ci = 0.90)
posterior_mean_curve!(ax1, posterior_GEVs, rts; color = :blue, linewidth = 3)
ax2 = Axis(fig[1, 2], xlabel = "Return Period (years)", ylabel = "Return Level (inches)",
title = "$(comparison_station.name)", xscale = log10, xticks = xticks)
posterior_bands!(ax2, posterior_GEVs_comp, rts; color = (:orange, 0.2), ci = 0.90)
posterior_mean_curve!(ax2, posterior_GEVs_comp, rts; color = :orange, linewidth = 3)
linkaxes!(ax1, ax2)
fig
end
Compare the posterior distributions between the two stations. Do the stations show similar extreme rainfall patterns? Which parameters differ most between stations? How might geographic or climatic factors explain these differences? Should we have used different priors for each station? Do the return level uncertainties overlap significantly, or are they distinct? What does this tell you about spatial variability in extreme precipitation?