using Pkg
lab_dir = dirname(@__FILE__)
Pkg.activate(lab_dir)
# Pkg.instantiate() # uncomment this the first time you run the lab to install packages, then comment it backCEVE 543 Fall 2025 Lab 5: Regional vs Local Parameter Estimation
Single-station vs regional models, Trend estimation, Uncertainty quantification
Do Before Lab
- Accept the assignment. Use the GitHub Classroom link posted on Canvas to “accept” the assignment, which will give you your own GitHub repository.
- Clone your repository to your computer using GitHub Desktop or the command line.
- Open the repository in VS Code directly from VS Code, GitHub Desktop, or your command line.
- Activate the Julia environment. In VS Code, open the command palette (View → Command Palette) and type “Julia: Start REPL”. Alternatively, run
julia --project=.in the terminal, then runusing Pkg; Pkg.instantiate()to install required packages. The easiest method is to type]in the REPL to enter package mode, then typeinstantiate. - Review previous concepts including Bayesian GEV inference, return level calculations, and multi-station comparisons.
- Verify rendering works. Run
quarto render index.qmdin your terminal - it should create both PDF and HTML documents.
Do During Lab
This lab explores why regional analysis is essential for reliable extreme value analysis:
- Individual station trend tests give inconsistent, implausible results
- Single-station nonstationary models have excessive uncertainty
- Regional models that share some parameters across stations can reduce uncertainty
- Comparing approaches reveals bias-variance trade-offs in parameter estimation
As with previous labs, you do not need to write code from scratch – focus on understanding the regional analysis workflow, interpreting results, and connecting to PS1.
To submit your work, push your changes to GitHub, render to PDF, and upload the PDF to Canvas.
Your Analysis
Delete this entire section (including the “Do Before Lab” and “Do During Lab” sections above) and replace it with your analysis responses. Use the numbered format below for your final submission.
Response 1: Spatial Trend Inconsistencies What patterns do you observe in the Mann-Kendall results across nearby stations? Are these results climatologically plausible? What does this suggest about single-station trend analysis?
Response 2: Nonstationary Model Uncertainty How does posterior uncertainty in return levels compare across the two nonstationary GEV models? Which model structure provides the most reliable estimates and why?
Response 3: Regional vs Single-Station Trade-offs Compare the bias-variance trade-offs between single-station and regional parameter estimation approaches. Under what conditions would you choose each strategy?
Response 4: Parameter Sharing Effects How does regional parameter sharing affect trend estimates and uncertainty? What are the practical implications for stations with limited data?
Response 5: Regional Analysis for PS1 How will these regional analysis techniques enhance your PS1 approach? What challenges do you anticipate in implementing regional models for your selected stations?
1 Data Setup and Package Loading
1.1 Loading Required Packages
We’ll load packages for data analysis, Bayesian modeling, and visualization.
Load the core packages:
using TidierFiles
using DataFrames
using Downloads
using NCDatasets
using TidierData
using LinearAlgebra: I
using ArviZ
using Distributions
using Optim
using Random
using Statistics
using Turing
using CairoMakie
using GeoMakie
using LaTeXStrings
CairoMakie.activate!(type="svg")
ENV["DATAFRAMES_ROWS"] = 5Load utility functions and set random seed:
include("util.jl")
rng = MersenneTwister(543)1.2 Loading Houston Precipitation Data
Use the same Texas precipitation dataset from previous labs:
stations, rainfall_data = let
precip_fname = joinpath(lab_dir, "dur01d_ams_na14v11.txt")
url = "https://hdsc.nws.noaa.gov/pub/hdsc/data/tx/dur01d_ams_na14v11.txt"
if !isfile(precip_fname)
Downloads.download(url, precip_fname)
end
read_noaa_data(precip_fname)
end1.3 Loading CO2 Data
Load global mean CO2 concentrations. The data comes from Mauna Loa (1959–2024) and from ice core reconstructions (pre-1959).
co2_data = let
co2_fname = joinpath(lab_dir, "logCo2.csv")
TidierFiles.read_csv(co2_fname) |> DataFrame
endSelect your primary station that has substantial data and potential trend signal:
my_stnid = 31 # Station 41-4309 (Houston Addicks), selected for potential trend signal
my_station = @chain stations begin
@filter(stnid == !!my_stnid)
first
end
my_rainfall = @chain rainfall_data begin
@filter(stnid == !!my_stnid)
@arrange(date)
@full_join(co2_data, "year") # Join with CO2 data
@arrange(year)
end
my_rainfall_nomissing = @chain my_rainfall begin
@filter(!ismissing(rainfall) && !ismissing(log_CO2))
end
station_time_series = let
fig = Figure(size=(1200, 600))
# Top plot: Rainfall vs Year
ax1 = Axis(fig[1, 1], xlabel="Year", ylabel="Annual Max Rainfall (inches)",
title="Annual Maximum Rainfall at $(my_station.name), TX")
years = my_rainfall.year
rain = ustrip.(u"inch", my_rainfall.rainfall)
lines!(ax1, years, rain, color=:blue)
scatter!(ax1, years, rain, color=:blue, markersize=8)
# Middle plot: CO2 vs Year
ax2 = Axis(fig[2, 1], xlabel="Year", ylabel="CO₂ Concentration (ppm)",
title="Global Mean CO₂ Concentration")
lines!(ax2, years, my_rainfall.log_CO2, color=:red, linewidth=2)
# Bottom plot: Rainfall vs log(CO2)
ax3 = Axis(fig[1:2, 2], xlabel="log(CO₂) Concentration", ylabel="Annual Max Rainfall (inches)",
title="Rainfall vs log(CO₂)")
scatter!(ax3, my_rainfall_nomissing.log_CO2, ustrip.(u"inch", my_rainfall_nomissing.rainfall), color=:green, markersize=8, alpha=0.7)
fig
end2 Spatial Inconsistencies in Single-Station Analysis
One of the key motivations for regional analysis is that individual station analyses often give inconsistent results that aren’t climatologically plausible. Let’s demonstrate this by analyzing trends across nearby stations.
2.1 Finding Regional Stations
First, let’s identify stations near our primary station for regional analysis:
nearby_stations = find_nearest_stations(my_station, stations, 50)
nearby_stations| Row | stnid | noaa_id | name | state | latitude | longitude | years_of_data |
|---|---|---|---|---|---|---|---|
| Int64 | String | String | String | Float64 | Float64 | Int64 | |
| 1 | 31 | 41-4476 | IREDELL | TX | 31.9808 | -97.8731 | 45 |
| 2 | 368 | 41-4137 | HICO | TX | 31.9844 | -98.0311 | 93 |
| 3 | 194 | 41-1625 | CHALK MTN | TX | 32.1561 | -97.9369 | 48 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 49 | 329 | 41-3614 | GOLDTHWAITE 1 WSW | TX | 31.4403 | -98.5903 | 82 |
| 50 | 813 | 79-0157 | MINERAL WELLS AP | TX | 32.7817 | -98.0603 | 61 |
2.2 Implementing Trend Tests
We’ll implement the Mann-Kendall test, a commonly used non-parametric test for monotonic trends in data, assuming no serial correlation.
function mann_kendall_test(x::AbstractVector)
"""Mann-Kendall test for monotonic trend detection."""
n = length(x)
# Test statistic: sum of signs of all pairwise differences
S = sum(sign(x[j] - x[i]) for i in 1:(n-1) for j in (i+1):n)
# For n>10, under Null Hypothesis of no trend,
# S is Normally distributed with mean 0 and
# variance V = (n/18) * (n-1) * (2n+5)
var_S = n * (n - 1) * (2n + 5) / 18
# Standardized test statistic with continuity correction
Z = if S > 0
(S - 1) / sqrt(var_S)
elseif S < 0
(S + 1) / sqrt(var_S)
else
0.0
end
# Two-tailed p-value
p_value = 2 * (1 - cdf(Normal(0, 1), abs(Z)))
return S, p_value
endNow apply to our actual data - testing for trends in rainfall with respect to log(CO2):
prcp_obs = ustrip.(u"inch", my_rainfall_nomissing.rainfall)
mk_S, mk_p = mann_kendall_test(prcp_obs)(139.0, 0.17702877314693977)
Now let’s apply these tests to all nearby stations:
let
stnids = nearby_stations.stnid
raw_results = map(stnids) do stnid
df = @chain rainfall_data begin
@filter(stnid == !!stnid)
@filter(!ismissing(rainfall))
end
prcp = ustrip.(u"inch", df.rainfall)
mk_S, mk_p = mann_kendall_test(prcp)
return mk_S, mk_p
end
nearby_stations[!, :mk_S] = getindex.(raw_results, 1)
nearby_stations[!, :mk_pvalue] = getindex.(raw_results, 2)
end
first(nearby_stations, 3)| Row | stnid | noaa_id | name | state | latitude | longitude | years_of_data | mk_S | mk_pvalue |
|---|---|---|---|---|---|---|---|---|---|
| Int64 | String | String | String | Float64 | Float64 | Int64 | Float64 | Float64 | |
| 1 | 31 | 41-4476 | IREDELL | TX | 31.9808 | -97.8731 | 45 | 139.0 | 0.177029 |
| 2 | 368 | 41-4137 | HICO | TX | 31.9844 | -98.0311 | 93 | 489.0 | 0.105321 |
| 3 | 194 | 41-1625 | CHALK MTN | TX | 32.1561 | -97.9369 | 48 | 171.0 | 0.130797 |
2.3 Visualizing Spatial Inconsistencies
Let’s create visualizations that show the spatial inconsistency in trend results:
fig_trends = let
fig = Figure(size=(1200, 600))
# Create 1x2 layout with GeoAxis, each with colorbar to the right
ax1 = GeoAxis(fig[1, 1]; source="+proj=latlong", dest="+proj=merc",
title="Mann-Kendall S Statistic", xgridvisible=false, ygridvisible=false,
xticksvisible=false, yticksvisible=false, xticklabelsvisible=false, yticklabelsvisible=false)
ax2 = GeoAxis(fig[1, 3]; source="+proj=latlong", dest="+proj=merc",
title="Mann-Kendall p-values", xgridvisible=false, ygridvisible=false,
xticksvisible=false, yticksvisible=false, xticklabelsvisible=false, yticklabelsvisible=false)
# Add background layers for each axis
counties = GeoMakie.naturalearth("admin_2_counties_lakes", 10)
for ax in [ax1, ax2]
# Add US counties (white with gray borders)
poly!(ax, counties.geometry; strokecolor=:lightgray, strokewidth=1.5, color=:white)
end
# Set Texas extent
Δ = 0.5
for ax in [ax1, ax2]
xlims!(ax, minimum(nearby_stations.longitude) - Δ, maximum(nearby_stations.longitude) + Δ)
ylims!(ax, minimum(nearby_stations.latitude) - Δ, maximum(nearby_stations.latitude) + Δ)
end
# Plot data with appropriate colormaps and individual colorbars
s1 = scatter!(ax1, nearby_stations.longitude, nearby_stations.latitude,
color=nearby_stations.mk_S, colormap=:RdBu, markersize=18)
Colorbar(fig[1, 2], s1, label="S Statistic")
s2 = scatter!(ax2, nearby_stations.longitude, nearby_stations.latitude,
color=nearby_stations.mk_pvalue, colormap=:viridis, markersize=18)
Colorbar(fig[1, 4], s2, label="p-value")
# Overlay my station as a star on ax1 only, using the same colormap
my_mk_S = nearby_stations[nearby_stations.stnid.==my_stnid, :mk_S][1]
scatter!(ax1, [my_station.longitude], [my_station.latitude],
color=my_mk_S, colormap=:RdBu, markersize=24, marker=:star5, strokewidth=2, strokecolor=:white)
fig
endThe spatial inconsistency in trend results highlights problems with single-station analysis: nearby stations show contradictory trends that are not climatologically plausible, suggesting that individual station records contain too much noise to reliably detect regional climate signals.
3 Single-Station Nonstationary Models
3.1 Understanding Nonstationarity
Traditional extreme value analysis assumes parameters remain constant over time (stationarity). However, climate change may alter the distribution of extreme precipitation. Nonstationarity means the statistical properties of extremes change systematically with time or other covariates.
We’ll use atmospheric CO₂ concentration as our covariate x = \log(\text{CO}_2) because: - CO₂ is a primary driver of global warming - Higher temperatures can increase atmospheric moisture capacity - This may intensify extreme precipitation events
3.2 Model Specifications
We implement two models that allow different GEV parameters to vary with CO₂:
- Location-varying model: Only the location parameter changes with CO₂
- Location and scale varying model: Both location and scale parameters change with CO₂
@model function nonstationary_gev_model1(y, x)
# Model 1: μ(x) = α_μ + β_μ*x where x = log(CO2)
α_μ ~ Normal(3.0, 2.0) # baseline location parameter
β_μ ~ Normal(0.0, 2.0) # location trend parameter (inches per log(ppm))
log_σ ~ Normal(0.0, 1.0) # log-scale parameter
ξ ~ Normal(0.0, 0.3) # shape parameter
σ = exp(log_σ)
# Location parameter varies with CO2
for i in eachindex(y)
x_centered = x[i] - log(380) # center around ~380 ppm
μ_x = α_μ + β_μ * x_centered
dist = GeneralizedExtremeValue(μ_x, σ, ξ)
y[i] ~ dist
end
end
@model function nonstationary_gev_model2(y, x)
# Model 2: μ(x) = α_μ + β_μ*x, σ(x) = α_σ + β_σ*x
α_μ ~ Normal(3.0, 2.0) # baseline location parameter
β_μ ~ Normal(0.0, 2.0) # location trend parameter
α_σ ~ LogNormal(0.0, 1.0) # baseline scale parameter
β_σ ~ Normal(0.0, 0.2) # scale trend parameter (small prior)
ξ ~ Normal(0.0, 0.3) # shape parameter
for i in eachindex(y)
x_centered = x[i] - log(380)
μ_x = α_μ + β_μ * x_centered
σ_x = α_σ + β_σ * x_centered
# Ensure positive scale parameter
if σ_x > 0.1
dist = GeneralizedExtremeValue(μ_x, σ_x, ξ)
y[i] ~ dist
else
Turing.@addlogprob!(-Inf)
end
end
end3.3 Model Fitting and Diagnostics
Let’s fit these models to our primary station:
# Prepare data
y_obs = ustrip.(u"inch", my_rainfall_nomissing.rainfall)
x_obs = my_rainfall_nomissing.log_CO2 # x = log(CO2)
# Fit the two models
models = [
("Location trend", nonstationary_gev_model1(y_obs, x_obs)),
("Location + Scale trends", nonstationary_gev_model2(y_obs, x_obs)),
]
# Sample from posteriors and check diagnostics
posterior_results = []
for (name, model) in models
fname = joinpath(lab_dir, "nonstat_$(replace(name, " " => "_")).nc")
overwrite = false
idata = load_or_sample(fname, model; overwrite=overwrite, samples_per_chain=1000)
push!(posterior_results, (name=name, idata=idata))
# Check diagnostics immediately after fitting
println("=== Diagnostics for $name ===")
display(ArviZ.summarize(idata))
endCreate traceplots to visualize MCMC chain behavior for key parameters:
# Traceplots for Model 1
model1_traceplots = let
fig = Figure(size=(1200, 600))
param_names = [:α_μ, :β_μ, :log_σ, :ξ]
for (i, param) in enumerate(param_names)
ax = Axis(fig[i, 1], xlabel=i == length(param_names) ? "Iteration" : "",
ylabel=string(param), title=i == 1 ? "Model 1 (Location Trend)" : "")
traceplot!(ax, posterior_results[1].idata, param)
end
fig
end# Traceplots for Model 2
model2_traceplots = let
fig = Figure(size=(1200, 800))
param_names = [:α_μ, :β_μ, :α_σ, :β_σ, :ξ]
for (i, param) in enumerate(param_names)
ax = Axis(fig[i, 1], xlabel=i == length(param_names) ? "Iteration" : "",
ylabel=string(param), title=i == 1 ? "Model 2 (Location + Scale Trends)" : "")
traceplot!(ax, posterior_results[2].idata, param)
end
fig
endCheck convergence: effective sample size should exceed 400, R-hat should be near 1.0, and traceplots should show good mixing across chains.
3.4 Extracting GEV Distributions
First, let’s define functions to extract GEV distributions for any year across our two nonstationary models:
# Model 1: Location trend only
function extract_model1_gevs(idata, x)
x_centered = x - log(380) # center around ~380 ppm
α_μ = Array(idata.posterior[:α_μ])
β_μ = Array(idata.posterior[:β_μ])
σ = exp.(Array(idata.posterior[:log_σ]))
ξ = Array(idata.posterior[:ξ])
μ_x = α_μ .+ β_μ .* x_centered
vec(GeneralizedExtremeValue.(μ_x, σ, ξ))
end
# Model 2: Location + Scale trends
function extract_model2_gevs(idata, x)
x_centered = x - log(380)
α_μ = Array(idata.posterior[:α_μ])
β_μ = Array(idata.posterior[:β_μ])
α_σ = Array(idata.posterior[:α_σ])
β_σ = Array(idata.posterior[:β_σ])
ξ = Array(idata.posterior[:ξ])
μ_x = α_μ .+ β_μ .* x_centered
σ_x = α_σ .+ β_σ .* x_centered
# Filter out negative scale parameters
valid = σ_x .> 0.1
vec(GeneralizedExtremeValue.(μ_x[valid], σ_x[valid], ξ[valid]))
endNow extract GEV distributions for both 1950 and 2025 using appropriate CO2 levels:
# Extract data for each model
model1_name, model1_idata = posterior_results[1].name, posterior_results[1].idata
model2_name, model2_idata = posterior_results[2].name, posterior_results[2].idata
# Approximate CO2 levels (x = log(CO2))
x_1950 = co2_data.log_CO2[co2_data.year.==1950][1] # ~310 ppm in 1950
x_2025 = co2_data.log_CO2[co2_data.year.==2024][1] # ~425 ppm projected for 2025
# Extract GEV distributions for both time periods
gevs_1950 = [
extract_model1_gevs(model1_idata, x_1950),
extract_model2_gevs(model2_idata, x_1950),
]
gevs_2025 = [
extract_model1_gevs(model1_idata, x_2025),
extract_model2_gevs(model2_idata, x_2025),
]3.5 Model Comparison and Uncertainty
# Create comprehensive comparison: 1950 vs 2025 across both models
fig_comprehensive = let
fig = Figure(size=(1000, 700))
rts = logrange(1.1, 250, 100)
xticks = [2, 5, 10, 25, 50, 100, 250]
# Top row: 1950 vs 2025 comparison for each model (adjust column widths)
ax1 = Axis(fig[1, 1], xlabel="Return Period (years)", ylabel="Return Level (inches)",
title="Location Trend Model", xscale=log10, xticks=xticks)
ax2 = Axis(fig[1, 2], xlabel="Return Period (years)", ylabel="Return Level (inches)",
title="Location + Scale Trends Model", xscale=log10, xticks=xticks)
# Make columns equal width
colsize!(fig.layout, 1, Relative(0.5))
colsize!(fig.layout, 2, Relative(0.5))
top_axes = [ax1, ax2]
for (i, (ax, gevs_50, gevs_25)) in enumerate(zip(top_axes, gevs_1950, gevs_2025))
posterior_bands!(ax, gevs_50, rts; ci=0.90, color=(:blue, 0.3))
posterior_mean_curve!(ax, gevs_50, rts; color=:blue, linewidth=2, label="1950")
posterior_bands!(ax, gevs_25, rts; ci=0.90, color=(:red, 0.3))
posterior_mean_curve!(ax, gevs_25, rts; color=:red, linewidth=2, label="2025")
if i == 1
axislegend(ax, position=:rb)
end
end
# Bottom: Direct model comparison for 2025
ax3 = Axis(fig[2, 1:2], xlabel="Return Period (years)", ylabel="Return Level (inches)",
title="Model Comparison for 2025 (Note: Models Show Similar Behavior)",
xscale=log10, xticks=xticks)
colors = [:blue, :red]
labels = ["Location Trend", "Location + Scale"]
for (i, (gevs, color, label)) in enumerate(zip(gevs_2025, colors, labels))
posterior_bands!(ax3, gevs, rts; ci=0.68, color=(color, 0.3))
posterior_mean_curve!(ax3, gevs, rts; color=color, linewidth=2, label=label)
end
axislegend(ax3, position=:lt)
linkyaxes!(top_axes...)
fig
endThe two models show remarkably similar behavior, which is actually quite interesting - it suggests that for this dataset, allowing the scale parameter to vary with CO₂ doesn’t dramatically change the return level projections. Let’s examine the 100-year return period distributions more closely:
fig_rl100_comparison = let
fig = Figure(size=(800, 400))
titles = ["Location Trend", "Location + Scale"]
for i in 1:2
ax = Axis(fig[1, i], xlabel="100-year RL (inches)", ylabel="Count", title=titles[i])
# Calculate 100-year return levels
rl_1950 = [quantile(gev, 0.99) for gev in gevs_1950[i]]
rl_2025 = [quantile(gev, 0.99) for gev in gevs_2025[i]]
# Plot histograms
hist!(ax, rl_1950, bins=25, color=(:purple, 0.5), label="1950")
hist!(ax, rl_2025, bins=25, color=(:orange, 0.5), label="2025")
i == 1 && axislegend(ax, position=:rt)
end
fig
endThe large posterior uncertainties in single-station models, particularly for return level projections, indicate that individual stations lack sufficient data to reliably estimate trends in extreme precipitation.
4 Regional Parameter Estimation
The large uncertainties in single-station analyses motivate regional approaches. Let’s implement a regional model where some parameters are shared across stations while others remain station-specific.
4.1 Regional Model Design
Our regional GEV model uses a simple approach where some parameters are shared regionally while others vary by station.
For station i in year t:
Y_{i,t} \sim \text{GEV}(\mu_{i,t}, \sigma_i, \xi_{\text{region}})
where:
- Regional parameters: \beta_{\text{region}} (trend) and \xi_{\text{region}} (shape) are the same for all stations
- Station-specific parameters: \alpha_{\mu,i} (baseline location) and \sigma_i (scale) differ by station
The location parameter varies with our covariate x = \log(\text{CO}_2): \mu_{i,t} = \alpha_{\mu,i} + \beta_{\text{region}} \cdot (x_t - \log(380))
This approach assumes climate change affects trend and extreme value tail behavior similarly across the region, while allowing for local differences in baseline precipitation amounts and variability.
We’ll select the 8 closest stations with at least 40 years of data
analysis_stations = let
lon = my_station.longitude
lat = my_station.latitude
@chain nearby_stations begin
@filter(years_of_data >= 40)
@mutate(distance = calc_distance(!!lat, !!lon, latitude, longitude))
@arrange(distance)
first(8)
end
end
analysis_stnids = analysis_stations.stnidTo analyze the data, we’re going to converte it to a matrix, where each row corresponds to a year and each column corresponds to a station.
years_vec, rainfall_matrix = let
rainfall_matrix_data = @chain rainfall_data begin
@filter(in(stnid, !!analysis_stnids))
@mutate(rainfall_inch = ifelse(ismissing(rainfall), missing, ustrip(u"inch", rainfall)))
@select(year, stnid, rainfall_inch)
@pivot_wider(names_from = stnid, values_from = rainfall_inch)
@arrange(year)
end
years = rainfall_matrix_data.year
matrix = Matrix(rainfall_matrix_data[:, 2:end])
years, matrix
end4.2 Data Preparation
First, prepare the data matrices for our hierarchical analysis:
# Prepare matrices for regional model
y_matrix, x_vector = let
# Get rainfall matrix: [year, station]
rainfall_wide = @chain rainfall_data begin
@filter(in(stnid, !!analysis_stnids))
@mutate(rainfall_inch = ustrip(u"inch", rainfall))
@select(year, stnid, rainfall_inch)
@pivot_wider(names_from = stnid, values_from = rainfall_inch)
@arrange(year)
end
# Extract years and matrix
years = rainfall_wide.year
y_mat = Matrix(rainfall_wide[:, 2:end]) # Drop year column
# Get x vector (log CO2) for the same years
x_vec = @chain co2_data begin
@filter(in(year, !!years))
@arrange(year)
@select(log_CO2)
end
y_mat, x_vec.log_CO2
end4.3 Regional Model Implementation
Now implement our simplified regional model:
@model function regional_nonstationary_gev(y_matrix, x_vector)
n_years, n_stations = size(y_matrix)
# Regional parameters (shared across all stations)
β_region ~ Normal(0.0, 2.0) # Regional trend (inches per log(ppm))
ξ_region ~ Normal(0.0, 0.2) # Regional shape parameter
# Station-specific parameters (independent for each station)
α_μ_stations ~ MvNormal(fill(3.0, n_stations), I * 2.0) # Baseline location for each station
log_σ_stations ~ MvNormal(zeros(n_stations), I * 0.5) # Scale parameter for each station
σ_stations = exp.(log_σ_stations)
# Data likelihood - loop over matrix, skip missing values
for i in 1:n_years
x_centered = x_vector[i] - log(380) # Center x around ~380 ppm CO2
for j in 1:n_stations
if !ismissing(y_matrix[i, j])
μ_ij = α_μ_stations[j] + β_region * x_centered
dist = GeneralizedExtremeValue(μ_ij, σ_stations[j], ξ_region)
y_matrix[i, j] ~ dist
end
end
end
end
# Fit regional model with diagnostics
regional_idata = let
regional_fname = joinpath(lab_dir, "regional_nonstat.nc")
regional_model = regional_nonstationary_gev(y_matrix, x_vector)
overwrite = false
idata = load_or_sample(regional_fname, regional_model; overwrite=overwrite, samples_per_chain=1500)
# Check diagnostics immediately after fitting
println("=== Regional Model Diagnostics ===")
display(ArviZ.summarize(idata))
idata
end# Traceplots for regional parameters
regional_traceplots = let
fig = Figure(size=(1200, 400))
param_names = [:β_region, :ξ_region]
for (i, param) in enumerate(param_names)
ax = Axis(fig[i, 1], xlabel=i == length(param_names) ? "Iteration" : "",
ylabel=string(param), title=i == 1 ? "Regional Model Parameters" : "")
traceplot!(ax, regional_idata, param)
end
fig
endThis takes about a minute on my laptop, but may take longer on your machine. Using overwrite=false is helpful - you can run it once and then load the results later without re-running the sampling.
5 Comparing Regional vs Single-Station Approaches
Let’s compare how the regional model performs versus the single-station nonstationary models for our primary station.
my_station_idx = findfirst(x -> x == my_stnid, analysis_stnids)
regional_my_station = let
# Extract regional parameters (shared)
β_samples = vec(Array(regional_idata.posterior[:β_region]))
ξ_samples = vec(Array(regional_idata.posterior[:ξ_region]))
# Extract station-specific parameters for our station
α_μ_samples = vec(Array(regional_idata.posterior[:α_μ_stations])[:, :, my_station_idx])
σ_samples = exp.(vec(Array(regional_idata.posterior[:log_σ_stations])[:, :, my_station_idx]))
(α_μ=α_μ_samples, β_μ=β_samples, σ=σ_samples, ξ=ξ_samples)
end
# Extract single-station model results (Model 1: Location trend only)
single_station = let
model1_idata = posterior_results[1].idata
α_μ_samples = vec(Array(model1_idata.posterior[:α_μ]))
β_μ_samples = vec(Array(model1_idata.posterior[:β_μ]))
σ_samples = exp.(vec(Array(model1_idata.posterior[:log_σ])))
ξ_samples = vec(Array(model1_idata.posterior[:ξ]))
(α_μ=α_μ_samples, β_μ=β_μ_samples, σ=σ_samples, ξ=ξ_samples)
end
# Prepare data for comparison plots
rts = logrange(1.1, 250, 100)
xticks = [2, 5, 10, 25, 50, 100, 250]
x_2025 = co2_data.log_CO2[co2_data.year.==2024][1]
x_centered = x_2025 - log(380)
# Create GEV distributions for 2025
μ_single_2025 = single_station.α_μ .+ single_station.β_μ .* x_centered
gevs_single = GeneralizedExtremeValue.(μ_single_2025, single_station.σ, single_station.ξ)
μ_regional_2025 = regional_my_station.α_μ .+ regional_my_station.β_μ .* x_centered
gevs_regional = GeneralizedExtremeValue.(μ_regional_2025, regional_my_station.σ, regional_my_station.ξ)24000-element Vector{GeneralizedExtremeValue{Float64}}:
Distributions.GeneralizedExtremeValue{Float64}(μ=2.6233953214834433, σ=0.7349168095512729, ξ=0.1381538464839199)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0324987852353398, σ=1.179924993429768, ξ=0.07738383772907395)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.978508264653534, σ=0.7775008778145074, ξ=0.12067192319552812)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0602462079616157, σ=0.7598425336679961, ξ=0.1238709806645203)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.9897165601500633, σ=1.1850864777752341, ξ=0.1125668206659867)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0455838922258693, σ=0.7171532077465685, ξ=0.0879952571395228)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.106761127103365, σ=0.9602381160443091, ξ=0.06951913030288262)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.9956127662046477, σ=0.794772142721128, ξ=0.11163525200029273)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0159966769591895, σ=0.8011006828448223, ξ=0.11182302019183858)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.9405193509661207, σ=0.7897597872959153, ξ=0.16830939725598495)
⋮
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0773263666157566, σ=1.2464805687909462, ξ=0.04347860356883834)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.8232379745634204, σ=0.6941724043306227, ξ=0.06966360494980378)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.3446858554168046, σ=1.2772377051453498, ξ=0.06906435173298438)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.274580428487241, σ=0.7576985374174522, ξ=0.09073136770995194)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.8604716162222443, σ=0.972821379557864, ξ=0.03883003717774308)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.587346696957743, σ=0.859596070987913, ξ=0.1256570799138009)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.583559371120404, σ=0.8640414972122636, ξ=0.1359855550852122)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.10833391995527, σ=0.7619890124446638, ξ=0.10812239649600947)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0643219885459136, σ=0.7756585546528296, ξ=0.09539543989006517)
5.1 Return Level Curves
# Return level curves comparison
return_level_fig = let
fig = Figure(size=(800, 500))
ax = Axis(fig[1, 1], xlabel="Return Period (years)", ylabel="Return Level (inches)",
title="2025 Return Level Comparison: Single-Station vs Regional",
xscale=log10, xticks=xticks)
# Plot uncertainty bands and mean curves
posterior_bands!(ax, gevs_single, rts; ci=0.90, color=(:blue, 0.3))
posterior_mean_curve!(ax, gevs_single, rts; color=:blue, linewidth=3, label="Single-Station")
posterior_bands!(ax, gevs_regional, rts; ci=0.90, color=(:red, 0.3))
posterior_mean_curve!(ax, gevs_regional, rts; color=:red, linewidth=3, label="Regional")
axislegend(ax, position=:rb)
fig
end5.2 Parameter Uncertainty
# Parameter uncertainty comparison
parameter_uncertainty_fig = let
fig = Figure(size=(800, 400))
ax = Axis(fig[1, 1], xlabel="Parameter", ylabel="Posterior Standard Deviation",
title="Parameter Uncertainty: Single-Station vs Regional")
params = [L"$\alpha_\mu$", L"$\beta_\mu$", L"$\sigma$", L"$\xi$"]
single_stds = [
std(single_station.α_μ),
std(single_station.β_μ),
std(single_station.σ),
std(single_station.ξ),
]
regional_stds = [
std(regional_my_station.α_μ),
std(regional_my_station.β_μ),
std(regional_my_station.σ),
std(regional_my_station.ξ),
]
x_pos = 1:length(params)
barplot!(ax, x_pos .- 0.2, single_stds, width=0.35, color=:blue, alpha=0.7, label="Single-Station")
barplot!(ax, x_pos .+ 0.2, regional_stds, width=0.35, color=:red, alpha=0.7, label="Regional")
ax.xticks = (x_pos, params)
axislegend(ax, position=:rt)
fig
end5.3 Return Level Distributions
# 100-year return level distributions
return_level_dist_fig = let
fig = Figure(size=(800, 400))
ax = Axis(fig[1, 1], xlabel="100-year Return Level (inches)", ylabel="Density",
title="100-year Return Level Uncertainty")
# Calculate 100-year return levels
rl100_single = [quantile(gev, 0.99) for gev in gevs_single]
rl100_regional = [quantile(gev, 0.99) for gev in gevs_regional]
hist!(ax, rl100_single, bins=30, color=(:blue, 0.5), label="Single-Station", normalization=:pdf)
hist!(ax, rl100_regional, bins=30, color=(:red, 0.5), label="Regional", normalization=:pdf)
axislegend(ax, position=:rt)
fig
end5.4 Trend Parameter Analysis
Let’s examine how trend estimates compare between the single-station and regional approaches:
# Compare trend estimates from single-station vs regional models
trend_comparison_fig = let
fig = Figure(size=(1000, 400))
# Re-extract needed variables for proper scoping
single_idata = posterior_results[1].idata
regional_fname = joinpath(lab_dir, "regional_nonstat.nc")
regional_idata = ArviZ.from_netcdf(regional_fname)
# Extract trend parameters
single_β_μ = vec(Array(single_idata.posterior[:β_μ]))
regional_β_μ = vec(Array(regional_idata.posterior[:β_region]))
# Left plot: trend parameter distributions
ax1 = Axis(fig[1, 1], xlabel=L"Location Trend Parameter $\beta_\mu$ (inches per log(ppm))", ylabel="Density",
title="Trend Parameter Estimates")
hist!(ax1, single_β_μ, bins=30, color=(:blue, 0.5), label="Single-Station", normalization=:pdf)
hist!(ax1, regional_β_μ, bins=30, color=(:red, 0.5), label="Regional", normalization=:pdf)
axislegend(ax1, position=:rt)
# Right plot: uncertainty comparison
ax2 = Axis(fig[1, 2], xlabel="Model", ylabel="Standard Deviation",
title="Trend Parameter Uncertainty")
trend_stds = [std(single_β_μ), std(regional_β_μ)]
barplot!(ax2, 1:2, trend_stds, color=[:blue, :red], alpha=0.7)
ax2.xticks = (1:2, ["Single-Station", "Regional"])
# Add text labels on bars
for (i, val) in enumerate(trend_stds)
text!(ax2, i, val - 0.25, text=string(round(val, digits=2)),
align=(:center, :bottom), color=:white, fontsize=16)
end
fig
endThe regional approach appears to provide a somewhat more precise estimate of the trend parameter, even though we didn’t explicitly pool the trend parameter across stations. This improved precision occurs because by pooling the shape parameter and constraining the station-specific parameters, we get better estimates of the other model components, which in turn help constrain our trend estimate. This demonstrates how regional modeling can sharpen parameter estimates through better overall model specification.
6 Wrapup
This lab demonstrates several key concepts essential for PS1:
- Extremes are noisy and random. Trend tests and nonstationary model inferences conducted at individual stations can be implausible and inconsistent across nearby stations.
- Just because we don’t unambiguously detect a trend at a single station doesn’t mean we should assume that no trend exists.
- Regional models provide a straightforward way to share information across stations while preserving local differences
- Regionalization can improve parameter estimates and reduce uncertainty, even for parameters that aren’t explicitly pooled.