using Pkg
= dirname(@__FILE__)
lab_dir Pkg.activate(lab_dir)
# Pkg.instantiate() # uncomment this the first time you run the lab to install packages, then comment it back
CEVE 543 Fall 2025 Lab 5: Regional vs Local Parameter Estimation
Single-station vs regional models, Trend estimation, Uncertainty quantification
Do Before Lab
- Accept the assignment. Use the GitHub Classroom link posted on Canvas to “accept” the assignment, which will give you your own GitHub repository.
- Clone your repository to your computer using GitHub Desktop or the command line.
- Open the repository in VS Code directly from VS Code, GitHub Desktop, or your command line.
- Activate the Julia environment. In VS Code, open the command palette (View → Command Palette) and type “Julia: Start REPL”. Alternatively, run
julia --project=.
in the terminal, then runusing Pkg; Pkg.instantiate()
to install required packages. The easiest method is to type]
in the REPL to enter package mode, then typeinstantiate
. - Review previous concepts including Bayesian GEV inference, return level calculations, and multi-station comparisons.
- Verify rendering works. Run
quarto render index.qmd
in your terminal - it should create both PDF and HTML documents.
Do During Lab
This lab explores why regional analysis is essential for reliable extreme value analysis:
- Individual station trend tests give inconsistent, implausible results
- Single-station nonstationary models have excessive uncertainty
- Regional models that share some parameters across stations can reduce uncertainty
- Comparing approaches reveals bias-variance trade-offs in parameter estimation
As with previous labs, you do not need to write code from scratch – focus on understanding the regional analysis workflow, interpreting results, and connecting to PS1.
To submit your work, push your changes to GitHub, render to PDF, and upload the PDF to Canvas.
Your Analysis
Delete this entire section (including the “Do Before Lab” and “Do During Lab” sections above) and replace it with your analysis responses. Use the numbered format below for your final submission.
Response 1: Spatial Trend Inconsistencies What patterns do you observe in the Mann-Kendall results across nearby stations? Are these results climatologically plausible? What does this suggest about single-station trend analysis?
Response 2: Nonstationary Model Uncertainty How does posterior uncertainty in return levels compare across the two nonstationary GEV models? Which model structure provides the most reliable estimates and why?
Response 3: Regional vs Single-Station Trade-offs Compare the bias-variance trade-offs between single-station and regional parameter estimation approaches. Under what conditions would you choose each strategy?
Response 4: Parameter Sharing Effects How does regional parameter sharing affect trend estimates and uncertainty? What are the practical implications for stations with limited data?
Response 5: Regional Analysis for PS1 How will these regional analysis techniques enhance your PS1 approach? What challenges do you anticipate in implementing regional models for your selected stations?
1 Data Setup and Package Loading
1.1 Loading Required Packages
We’ll load packages for data analysis, Bayesian modeling, and visualization.
Load the core packages:
using TidierFiles
using DataFrames
using Downloads
using NCDatasets
using TidierData
using LinearAlgebra: I
using ArviZ
using Distributions
using Optim
using Random
using Statistics
using Turing
using CairoMakie
using GeoMakie
using LaTeXStrings
activate!(type="svg")
CairoMakie.
ENV["DATAFRAMES_ROWS"] = 5
Load utility functions and set random seed:
include("util.jl")
= MersenneTwister(543) rng
1.2 Loading Houston Precipitation Data
Use the same Texas precipitation dataset from previous labs:
= let
stations, rainfall_data = joinpath(lab_dir, "dur01d_ams_na14v11.txt")
precip_fname = "https://hdsc.nws.noaa.gov/pub/hdsc/data/tx/dur01d_ams_na14v11.txt"
url
if !isfile(precip_fname)
Downloads.download(url, precip_fname)
end
read_noaa_data(precip_fname)
end
1.3 Loading CO2 Data
Load global mean CO2 concentrations. The data comes from Mauna Loa (1959–2024) and from ice core reconstructions (pre-1959).
= let
co2_data = joinpath(lab_dir, "logCo2.csv")
co2_fname read_csv(co2_fname) |> DataFrame
TidierFiles.end
Select your primary station that has substantial data and potential trend signal:
= 31 # Station 41-4309 (Houston Addicks), selected for potential trend signal
my_stnid = @chain stations begin
my_station @filter(stnid == !!my_stnid)
firstend
= @chain rainfall_data begin
my_rainfall @filter(stnid == !!my_stnid)
@arrange(date)
@full_join(co2_data, "year") # Join with CO2 data
@arrange(year)
end
= @chain my_rainfall begin
my_rainfall_nomissing @filter(!ismissing(rainfall) && !ismissing(log_CO2))
end
= let
station_time_series = Figure(size=(1200, 600))
fig
# Top plot: Rainfall vs Year
= Axis(fig[1, 1], xlabel="Year", ylabel="Annual Max Rainfall (inches)",
ax1 ="Annual Maximum Rainfall at $(my_station.name), TX")
title= my_rainfall.year
years = ustrip.(u"inch", my_rainfall.rainfall)
rain lines!(ax1, years, rain, color=:blue)
scatter!(ax1, years, rain, color=:blue, markersize=8)
# Middle plot: CO2 vs Year
= Axis(fig[2, 1], xlabel="Year", ylabel="CO₂ Concentration (ppm)",
ax2 ="Global Mean CO₂ Concentration")
titlelines!(ax2, years, my_rainfall.log_CO2, color=:red, linewidth=2)
# Bottom plot: Rainfall vs log(CO2)
= Axis(fig[1:2, 2], xlabel="log(CO₂) Concentration", ylabel="Annual Max Rainfall (inches)",
ax3 ="Rainfall vs log(CO₂)")
titlescatter!(ax3, my_rainfall_nomissing.log_CO2, ustrip.(u"inch", my_rainfall_nomissing.rainfall), color=:green, markersize=8, alpha=0.7)
figend
2 Spatial Inconsistencies in Single-Station Analysis
One of the key motivations for regional analysis is that individual station analyses often give inconsistent results that aren’t climatologically plausible. Let’s demonstrate this by analyzing trends across nearby stations.
2.1 Finding Regional Stations
First, let’s identify stations near our primary station for regional analysis:
= find_nearest_stations(my_station, stations, 100)
nearby_stations nearby_stations
Row | stnid | noaa_id | name | state | latitude | longitude | years_of_data |
---|---|---|---|---|---|---|---|
Int64 | String | String | String | Float64 | Float64 | Int64 | |
1 | 31 | 41-4476 | IREDELL | TX | 31.9808 | -97.8731 | 45 |
2 | 368 | 41-4137 | HICO | TX | 31.9844 | -98.0311 | 93 |
3 | 194 | 41-1625 | CHALK MTN | TX | 32.1561 | -97.9369 | 48 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
99 | 145 | 41-0996 | BOYD | TX | 33.08 | -97.5639 | 59 |
100 | 166 | 41-1239 | BURKETT | TX | 31.9917 | -99.2203 | 58 |
2.2 Implementing Trend Tests
We’ll implement the Mann-Kendall test, a commonly used non-parametric test for monotonic trends in data, assuming no serial correlation.
function mann_kendall_test(x::AbstractVector)
"""Mann-Kendall test for monotonic trend detection."""
= length(x)
n
# Test statistic: sum of signs of all pairwise differences
= sum(sign(x[j] - x[i]) for i in 1:(n-1) for j in (i+1):n)
S
# For n>10, under Null Hypothesis of no trend,
# S is Normally distributed with mean 0 and
# variance V = (n/18) * (n-1) * (2n+5)
= n * (n - 1) * (2n + 5) / 18
var_S
# Standardized test statistic with continuity correction
= if S > 0
Z - 1) / sqrt(var_S)
(S elseif S < 0
+ 1) / sqrt(var_S)
(S else
0.0
end
# Two-tailed p-value
= 2 * (1 - cdf(Normal(0, 1), abs(Z)))
p_value
return S, p_value
end
Now apply to our actual data - testing for trends in rainfall with respect to log(CO2):
= ustrip.(u"inch", my_rainfall_nomissing.rainfall)
prcp_obs = mann_kendall_test(prcp_obs) mk_S, mk_p
(139.0, 0.17702877314693977)
Now let’s apply these tests to all nearby stations:
let
= nearby_stations.stnid
stnids = map(stnids) do stnid
raw_results = @chain rainfall_data begin
df @filter(stnid == !!stnid)
@filter(!ismissing(rainfall))
end
= ustrip.(u"inch", df.rainfall)
prcp = mann_kendall_test(prcp)
mk_S, mk_p return mk_S, mk_p
end
:mk_S] = getindex.(raw_results, 1)
nearby_stations[!, :mk_pvalue] = getindex.(raw_results, 2)
nearby_stations[!, end
first(nearby_stations, 3)
Row | stnid | noaa_id | name | state | latitude | longitude | years_of_data | mk_S | mk_pvalue |
---|---|---|---|---|---|---|---|---|---|
Int64 | String | String | String | Float64 | Float64 | Int64 | Float64 | Float64 | |
1 | 31 | 41-4476 | IREDELL | TX | 31.9808 | -97.8731 | 45 | 139.0 | 0.177029 |
2 | 368 | 41-4137 | HICO | TX | 31.9844 | -98.0311 | 93 | 489.0 | 0.105321 |
3 | 194 | 41-1625 | CHALK MTN | TX | 32.1561 | -97.9369 | 48 | 171.0 | 0.130797 |
2.3 Visualizing Spatial Inconsistencies
Let’s create visualizations that show the spatial inconsistency in trend results:
= let
fig_trends = Figure(size=(1200, 600))
fig
# Create 1x2 layout with GeoAxis, each with colorbar to the right
= GeoAxis(fig[1, 1]; source="+proj=latlong", dest="+proj=merc",
ax1 ="Mann-Kendall S Statistic", xgridvisible=false, ygridvisible=false,
title=false, yticksvisible=false, xticklabelsvisible=false, yticklabelsvisible=false)
xticksvisible= GeoAxis(fig[1, 3]; source="+proj=latlong", dest="+proj=merc",
ax2 ="Mann-Kendall p-values", xgridvisible=false, ygridvisible=false,
title=false, yticksvisible=false, xticklabelsvisible=false, yticklabelsvisible=false)
xticksvisible
# Add background layers for each axis
= GeoMakie.naturalearth("admin_2_counties_lakes", 10)
counties for ax in [ax1, ax2]
# Add US counties (white with gray borders)
poly!(ax, counties.geometry; strokecolor=:lightgray, strokewidth=1.5, color=:white)
end
# Set Texas extent
= 0.5
Δ for ax in [ax1, ax2]
xlims!(ax, minimum(nearby_stations.longitude) - Δ, maximum(nearby_stations.longitude) + Δ)
ylims!(ax, minimum(nearby_stations.latitude) - Δ, maximum(nearby_stations.latitude) + Δ)
end
# Plot data with appropriate colormaps and individual colorbars
= scatter!(ax1, nearby_stations.longitude, nearby_stations.latitude,
s1 =nearby_stations.mk_S, colormap=:RdBu, markersize=18)
colorColorbar(fig[1, 2], s1, label="S Statistic")
= scatter!(ax2, nearby_stations.longitude, nearby_stations.latitude,
s2 =nearby_stations.mk_pvalue, colormap=:viridis, markersize=18)
colorColorbar(fig[1, 4], s2, label="p-value")
figend
The spatial inconsistency in trend results highlights problems with single-station analysis: nearby stations show contradictory trends that are not climatologically plausible, suggesting that individual station records contain too much noise to reliably detect regional climate signals.
3 Single-Station Nonstationary Models
3.1 Understanding Nonstationarity
Traditional extreme value analysis assumes parameters remain constant over time (stationarity). However, climate change may alter the distribution of extreme precipitation. Nonstationarity means the statistical properties of extremes change systematically with time or other covariates.
We’ll use atmospheric CO₂ concentration as our covariate x = \log(\text{CO}_2) because: - CO₂ is a primary driver of global warming - Higher temperatures can increase atmospheric moisture capacity - This may intensify extreme precipitation events
3.2 Model Specifications
We implement two models that allow different GEV parameters to vary with CO₂:
- Location-varying model: Only the location parameter changes with CO₂
- Location and scale varying model: Both location and scale parameters change with CO₂
@model function nonstationary_gev_model1(y, x)
# Model 1: μ(x) = α_μ + β_μ*x where x = log(CO2)
~ Normal(3.0, 2.0) # baseline location parameter
α_μ ~ Normal(0.0, 2.0) # location trend parameter (inches per log(ppm))
β_μ ~ Normal(0.0, 1.0) # log-scale parameter
log_σ ~ Normal(0.0, 0.3) # shape parameter
ξ
= exp(log_σ)
σ
# Location parameter varies with CO2
for i in eachindex(y)
= x[i] - log(380) # center around ~380 ppm
x_centered = α_μ + β_μ * x_centered
μ_x = GeneralizedExtremeValue(μ_x, σ, ξ)
dist ~ dist
y[i] end
end
@model function nonstationary_gev_model2(y, x)
# Model 2: μ(x) = α_μ + β_μ*x, σ(x) = α_σ + β_σ*x
~ Normal(3.0, 2.0) # baseline location parameter
α_μ ~ Normal(0.0, 2.0) # location trend parameter
β_μ ~ LogNormal(0.0, 1.0) # baseline scale parameter
α_σ ~ Normal(0.0, 0.2) # scale trend parameter (small prior)
β_σ ~ Normal(0.0, 0.3) # shape parameter
ξ
for i in eachindex(y)
= x[i] - log(380)
x_centered = α_μ + β_μ * x_centered
μ_x = α_σ + β_σ * x_centered
σ_x
# Ensure positive scale parameter
if σ_x > 0.1
= GeneralizedExtremeValue(μ_x, σ_x, ξ)
dist ~ dist
y[i] else
@addlogprob!(-Inf)
Turing.end
end
end
3.3 Model Fitting and Diagnostics
Let’s fit these models to our primary station:
# Prepare data
= ustrip.(u"inch", my_rainfall_nomissing.rainfall)
y_obs = my_rainfall_nomissing.log_CO2 # x = log(CO2)
x_obs
# Fit the two models
= [
models "Location trend", nonstationary_gev_model1(y_obs, x_obs)),
("Location + Scale trends", nonstationary_gev_model2(y_obs, x_obs)),
(
]
# Sample from posteriors and check diagnostics
= []
posterior_results for (name, model) in models
= joinpath(lab_dir, "nonstat_$(replace(name, " " => "_")).nc")
fname = false
overwrite = load_or_sample(fname, model; overwrite=overwrite, samples_per_chain=1000)
idata push!(posterior_results, (name=name, idata=idata))
# Check diagnostics immediately after fitting
println("=== Diagnostics for $name ===")
display(ArviZ.summarize(idata))
end
Create traceplots to visualize MCMC chain behavior for key parameters:
# Traceplots for Model 1
= let
model1_traceplots = Figure(size=(1200, 600))
fig = [:α_μ, :β_μ, :log_σ, :ξ]
param_names
for (i, param) in enumerate(param_names)
= Axis(fig[i, 1], xlabel=i == length(param_names) ? "Iteration" : "",
ax =string(param), title=i == 1 ? "Model 1 (Location Trend)" : "")
ylabeltraceplot!(ax, posterior_results[1].idata, param)
end
figend
# Traceplots for Model 2
= let
model2_traceplots = Figure(size=(1200, 800))
fig = [:α_μ, :β_μ, :α_σ, :β_σ, :ξ]
param_names
for (i, param) in enumerate(param_names)
= Axis(fig[i, 1], xlabel=i == length(param_names) ? "Iteration" : "",
ax =string(param), title=i == 1 ? "Model 2 (Location + Scale Trends)" : "")
ylabeltraceplot!(ax, posterior_results[2].idata, param)
end
figend
Check convergence: effective sample size should exceed 400, R-hat should be near 1.0, and traceplots should show good mixing across chains.
3.4 Extracting GEV Distributions
First, let’s define functions to extract GEV distributions for any year across our two nonstationary models:
# Model 1: Location trend only
function extract_model1_gevs(idata, x)
= x - log(380) # center around ~380 ppm
x_centered = Array(idata.posterior[:α_μ])
α_μ = Array(idata.posterior[:β_μ])
β_μ = exp.(Array(idata.posterior[:log_σ]))
σ = Array(idata.posterior[:ξ])
ξ = α_μ .+ β_μ .* x_centered
μ_x vec(GeneralizedExtremeValue.(μ_x, σ, ξ))
end
# Model 2: Location + Scale trends
function extract_model2_gevs(idata, x)
= x - log(380)
x_centered = Array(idata.posterior[:α_μ])
α_μ = Array(idata.posterior[:β_μ])
β_μ = Array(idata.posterior[:α_σ])
α_σ = Array(idata.posterior[:β_σ])
β_σ = Array(idata.posterior[:ξ])
ξ = α_μ .+ β_μ .* x_centered
μ_x = α_σ .+ β_σ .* x_centered
σ_x # Filter out negative scale parameters
= σ_x .> 0.1
valid vec(GeneralizedExtremeValue.(μ_x[valid], σ_x[valid], ξ[valid]))
end
Now extract GEV distributions for both 1950 and 2025 using appropriate CO2 levels:
# Extract data for each model
= posterior_results[1].name, posterior_results[1].idata
model1_name, model1_idata = posterior_results[2].name, posterior_results[2].idata
model2_name, model2_idata
# Approximate CO2 levels (x = log(CO2))
= co2_data.log_CO2[co2_data.year.==1950][1] # ~310 ppm in 1950
x_1950 = co2_data.log_CO2[co2_data.year.==2024][1] # ~425 ppm projected for 2025
x_2025
# Extract GEV distributions for both time periods
= [
gevs_1950 extract_model1_gevs(model1_idata, x_1950),
extract_model2_gevs(model2_idata, x_1950),
]
= [
gevs_2025 extract_model1_gevs(model1_idata, x_2025),
extract_model2_gevs(model2_idata, x_2025),
]
3.5 Model Comparison and Uncertainty
# Create comprehensive comparison: 1950 vs 2025 across both models
= let
fig_comprehensive = Figure(size=(1000, 700))
fig
= logrange(1.1, 250, 100)
rts = [2, 5, 10, 25, 50, 100, 250]
xticks
# Top row: 1950 vs 2025 comparison for each model (adjust column widths)
= Axis(fig[1, 1], xlabel="Return Period (years)", ylabel="Return Level (inches)",
ax1 ="Location Trend Model", xscale=log10, xticks=xticks)
title= Axis(fig[1, 2], xlabel="Return Period (years)", ylabel="Return Level (inches)",
ax2 ="Location + Scale Trends Model", xscale=log10, xticks=xticks)
title
# Make columns equal width
colsize!(fig.layout, 1, Relative(0.5))
colsize!(fig.layout, 2, Relative(0.5))
= [ax1, ax2]
top_axes
for (i, (ax, gevs_50, gevs_25)) in enumerate(zip(top_axes, gevs_1950, gevs_2025))
posterior_bands!(ax, gevs_50, rts; ci=0.90, color=(:blue, 0.3))
posterior_mean_curve!(ax, gevs_50, rts; color=:blue, linewidth=2, label="1950")
posterior_bands!(ax, gevs_25, rts; ci=0.90, color=(:red, 0.3))
posterior_mean_curve!(ax, gevs_25, rts; color=:red, linewidth=2, label="2025")
if i == 1
axislegend(ax, position=:rb)
end
end
# Bottom: Direct model comparison for 2025
= Axis(fig[2, 1:2], xlabel="Return Period (years)", ylabel="Return Level (inches)",
ax3 ="Model Comparison for 2025 (Note: Models Show Similar Behavior)",
title=log10, xticks=xticks)
xscale
= [:blue, :red]
colors = ["Location Trend", "Location + Scale"]
labels
for (i, (gevs, color, label)) in enumerate(zip(gevs_2025, colors, labels))
posterior_bands!(ax3, gevs, rts; ci=0.68, color=(color, 0.3))
posterior_mean_curve!(ax3, gevs, rts; color=color, linewidth=2, label=label)
end
axislegend(ax3, position=:lt)
linkyaxes!(top_axes...)
figend
The two models show remarkably similar behavior, which is actually quite interesting - it suggests that for this dataset, allowing the scale parameter to vary with CO₂ doesn’t dramatically change the return level projections. Let’s examine the 100-year return period distributions more closely:
= let
fig_rl100_comparison = Figure(size=(800, 400))
fig
= ["Location Trend", "Location + Scale"]
titles
for i in 1:2
= Axis(fig[1, i], xlabel="100-year RL (inches)", ylabel="Count", title=titles[i])
ax
# Calculate 100-year return levels
= [quantile(gev, 0.99) for gev in gevs_1950[i]]
rl_1950 = [quantile(gev, 0.99) for gev in gevs_2025[i]]
rl_2025
# Plot histograms
hist!(ax, rl_1950, bins=25, color=(:purple, 0.5), label="1950")
hist!(ax, rl_2025, bins=25, color=(:orange, 0.5), label="2025")
== 1 && axislegend(ax, position=:rt)
i end
figend
The large posterior uncertainties in single-station models, particularly for return level projections, indicate that individual stations lack sufficient data to reliably estimate trends in extreme precipitation.
4 Regional Parameter Estimation
The large uncertainties in single-station analyses motivate regional approaches. Let’s implement a regional model where some parameters are shared across stations while others remain station-specific.
4.1 Regional Model Design
Our regional GEV model uses a simple approach where some parameters are shared regionally while others vary by station.
For station i in year t:
Y_{i,t} \sim \text{GEV}(\mu_{i,t}, \sigma_i, \xi_{\text{region}})
where:
- Regional parameters: \beta_{\text{region}} (trend) and \xi_{\text{region}} (shape) are the same for all stations
- Station-specific parameters: \alpha_{\mu,i} (baseline location) and \sigma_i (scale) differ by station
The location parameter varies with our covariate x = \log(\text{CO}_2): \mu_{i,t} = \alpha_{\mu,i} + \beta_{\text{region}} \cdot (x_t - \log(380))
This approach assumes climate change affects trend and extreme value tail behavior similarly across the region, while allowing for local differences in baseline precipitation amounts and variability.
We’ll select the 8 closest stations with at least 40 years of data
= let
analysis_stations = my_station.longitude
lon = my_station.latitude
lat @chain nearby_stations begin
@filter(years_of_data >= 40)
@mutate(distance = calc_distance(!!lat, !!lon, latitude, longitude))
@arrange(distance)
first(8)
end
end
= analysis_stations.stnid analysis_stnids
To analyze the data, we’re going to converte it to a matrix, where each row corresponds to a year and each column corresponds to a station.
= let
years_vec, rainfall_matrix = @chain rainfall_data begin
rainfall_matrix_data @filter(in(stnid, !!analysis_stnids))
@mutate(rainfall_inch = ifelse(ismissing(rainfall), missing, ustrip(u"inch", rainfall)))
@select(year, stnid, rainfall_inch)
@pivot_wider(names_from = stnid, values_from = rainfall_inch)
@arrange(year)
end
= rainfall_matrix_data.year
years = Matrix(rainfall_matrix_data[:, 2:end])
matrix
years, matrixend
4.2 Data Preparation
First, prepare the data matrices for our hierarchical analysis:
# Prepare matrices for regional model
= let
y_matrix, x_vector # Get rainfall matrix: [year, station]
= @chain rainfall_data begin
rainfall_wide @filter(in(stnid, !!analysis_stnids))
@mutate(rainfall_inch = ustrip(u"inch", rainfall))
@select(year, stnid, rainfall_inch)
@pivot_wider(names_from = stnid, values_from = rainfall_inch)
@arrange(year)
end
# Extract years and matrix
= rainfall_wide.year
years = Matrix(rainfall_wide[:, 2:end]) # Drop year column
y_mat
# Get x vector (log CO2) for the same years
= @chain co2_data begin
x_vec @filter(in(year, !!years))
@arrange(year)
@select(log_CO2)
end
y_mat, x_vec.log_CO2end
4.3 Regional Model Implementation
Now implement our simplified regional model:
@model function regional_nonstationary_gev(y_matrix, x_vector)
= size(y_matrix)
n_years, n_stations
# Regional parameters (shared across all stations)
~ Normal(0.0, 2.0) # Regional trend (inches per log(ppm))
β_region ~ Normal(0.0, 0.2) # Regional shape parameter
ξ_region
# Station-specific parameters (independent for each station)
~ MvNormal(fill(3.0, n_stations), I * 2.0) # Baseline location for each station
α_μ_stations ~ MvNormal(zeros(n_stations), I * 0.5) # Scale parameter for each station
log_σ_stations
= exp.(log_σ_stations)
σ_stations
# Data likelihood - loop over matrix, skip missing values
for i in 1:n_years
= x_vector[i] - log(380) # Center x around ~380 ppm CO2
x_centered for j in 1:n_stations
if !ismissing(y_matrix[i, j])
= α_μ_stations[j] + β_region * x_centered
μ_ij = GeneralizedExtremeValue(μ_ij, σ_stations[j], ξ_region)
dist ~ dist
y_matrix[i, j] end
end
end
end
# Fit regional model with diagnostics
= let
regional_idata = joinpath(lab_dir, "regional_nonstat.nc")
regional_fname = regional_nonstationary_gev(y_matrix, x_vector)
regional_model = false
overwrite = load_or_sample(regional_fname, regional_model; overwrite=overwrite, samples_per_chain=1500)
idata
# Check diagnostics immediately after fitting
println("=== Regional Model Diagnostics ===")
display(ArviZ.summarize(idata))
idataend
# Traceplots for regional parameters
= let
regional_traceplots = Figure(size=(1200, 400))
fig = [:β_region, :ξ_region]
param_names
for (i, param) in enumerate(param_names)
= Axis(fig[i, 1], xlabel=i == length(param_names) ? "Iteration" : "",
ax =string(param), title=i == 1 ? "Regional Model Parameters" : "")
ylabeltraceplot!(ax, regional_idata, param)
end
figend
This takes about a minute on my laptop, but may take longer on your machine. Using overwrite=false
is helpful - you can run it once and then load the results later without re-running the sampling.
5 Comparing Regional vs Single-Station Approaches
Let’s compare how the regional model performs versus the single-station nonstationary models for our primary station.
= findfirst(x -> x == my_stnid, analysis_stnids)
my_station_idx = let
regional_my_station # Extract regional parameters (shared)
= vec(Array(regional_idata.posterior[:β_region]))
β_samples = vec(Array(regional_idata.posterior[:ξ_region]))
ξ_samples
# Extract station-specific parameters for our station
= vec(Array(regional_idata.posterior[:α_μ_stations])[:, :, my_station_idx])
α_μ_samples = exp.(vec(Array(regional_idata.posterior[:log_σ_stations])[:, :, my_station_idx]))
σ_samples
=α_μ_samples, β_μ=β_samples, σ=σ_samples, ξ=ξ_samples)
(α_μend
# Extract single-station model results (Model 1: Location trend only)
= let
single_station = posterior_results[1].idata
model1_idata = vec(Array(model1_idata.posterior[:α_μ]))
α_μ_samples = vec(Array(model1_idata.posterior[:β_μ]))
β_μ_samples = exp.(vec(Array(model1_idata.posterior[:log_σ])))
σ_samples = vec(Array(model1_idata.posterior[:ξ]))
ξ_samples
=α_μ_samples, β_μ=β_μ_samples, σ=σ_samples, ξ=ξ_samples)
(α_μend
# Prepare data for comparison plots
= logrange(1.1, 250, 100)
rts = [2, 5, 10, 25, 50, 100, 250]
xticks = co2_data.log_CO2[co2_data.year.==2024][1]
x_2025 = x_2025 - log(380)
x_centered
# Create GEV distributions for 2025
= single_station.α_μ .+ single_station.β_μ .* x_centered
μ_single_2025 = GeneralizedExtremeValue.(μ_single_2025, single_station.σ, single_station.ξ)
gevs_single
= regional_my_station.α_μ .+ regional_my_station.β_μ .* x_centered
μ_regional_2025 = GeneralizedExtremeValue.(μ_regional_2025, regional_my_station.σ, regional_my_station.ξ) gevs_regional
24000-element Vector{GeneralizedExtremeValue{Float64}}:
Distributions.GeneralizedExtremeValue{Float64}(μ=2.6233953214834433, σ=0.7349168095512729, ξ=0.1381538464839199)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0324987852353398, σ=1.179924993429768, ξ=0.07738383772907395)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.978508264653534, σ=0.7775008778145074, ξ=0.12067192319552812)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0602462079616157, σ=0.7598425336679961, ξ=0.1238709806645203)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.9897165601500633, σ=1.1850864777752341, ξ=0.1125668206659867)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0455838922258693, σ=0.7171532077465685, ξ=0.0879952571395228)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.106761127103365, σ=0.9602381160443091, ξ=0.06951913030288262)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.9956127662046477, σ=0.794772142721128, ξ=0.11163525200029273)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0159966769591895, σ=0.8011006828448223, ξ=0.11182302019183858)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.9405193509661207, σ=0.7897597872959153, ξ=0.16830939725598495)
⋮
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0773263666157566, σ=1.2464805687909462, ξ=0.04347860356883834)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.8232379745634204, σ=0.6941724043306227, ξ=0.06966360494980378)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.3446858554168046, σ=1.2772377051453498, ξ=0.06906435173298438)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.274580428487241, σ=0.7576985374174522, ξ=0.09073136770995194)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.8604716162222443, σ=0.972821379557864, ξ=0.03883003717774308)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.587346696957743, σ=0.859596070987913, ξ=0.1256570799138009)
Distributions.GeneralizedExtremeValue{Float64}(μ=2.583559371120404, σ=0.8640414972122636, ξ=0.1359855550852122)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.10833391995527, σ=0.7619890124446638, ξ=0.10812239649600947)
Distributions.GeneralizedExtremeValue{Float64}(μ=3.0643219885459136, σ=0.7756585546528296, ξ=0.09539543989006517)
5.1 Return Level Curves
# Return level curves comparison
= let
return_level_fig = Figure(size=(800, 500))
fig
= Axis(fig[1, 1], xlabel="Return Period (years)", ylabel="Return Level (inches)",
ax ="2025 Return Level Comparison: Single-Station vs Regional",
title=log10, xticks=xticks)
xscale
# Plot uncertainty bands and mean curves
posterior_bands!(ax, gevs_single, rts; ci=0.90, color=(:blue, 0.3))
posterior_mean_curve!(ax, gevs_single, rts; color=:blue, linewidth=3, label="Single-Station")
posterior_bands!(ax, gevs_regional, rts; ci=0.90, color=(:red, 0.3))
posterior_mean_curve!(ax, gevs_regional, rts; color=:red, linewidth=3, label="Regional")
axislegend(ax, position=:rb)
figend
5.2 Parameter Uncertainty
# Parameter uncertainty comparison
= let
parameter_uncertainty_fig = Figure(size=(800, 400))
fig
= Axis(fig[1, 1], xlabel="Parameter", ylabel="Posterior Standard Deviation",
ax ="Parameter Uncertainty: Single-Station vs Regional")
title
= [L"$\alpha_\mu$", L"$\beta_\mu$", L"$\sigma$", L"$\xi$"]
params = [
single_stds std(single_station.α_μ),
std(single_station.β_μ),
std(single_station.σ),
std(single_station.ξ),
]= [
regional_stds std(regional_my_station.α_μ),
std(regional_my_station.β_μ),
std(regional_my_station.σ),
std(regional_my_station.ξ),
]
= 1:length(params)
x_pos barplot!(ax, x_pos .- 0.2, single_stds, width=0.35, color=:blue, alpha=0.7, label="Single-Station")
barplot!(ax, x_pos .+ 0.2, regional_stds, width=0.35, color=:red, alpha=0.7, label="Regional")
= (x_pos, params)
ax.xticks axislegend(ax, position=:rt)
figend
5.3 Return Level Distributions
# 100-year return level distributions
= let
return_level_dist_fig = Figure(size=(800, 400))
fig
= Axis(fig[1, 1], xlabel="100-year Return Level (inches)", ylabel="Density",
ax ="100-year Return Level Uncertainty")
title
# Calculate 100-year return levels
= [quantile(gev, 0.99) for gev in gevs_single]
rl100_single = [quantile(gev, 0.99) for gev in gevs_regional]
rl100_regional
hist!(ax, rl100_single, bins=30, color=(:blue, 0.5), label="Single-Station", normalization=:pdf)
hist!(ax, rl100_regional, bins=30, color=(:red, 0.5), label="Regional", normalization=:pdf)
axislegend(ax, position=:rt)
figend
5.4 Trend Parameter Analysis
Let’s examine how trend estimates compare between the single-station and regional approaches:
# Compare trend estimates from single-station vs regional models
= let
trend_comparison_fig = Figure(size=(1000, 400))
fig
# Re-extract needed variables for proper scoping
= posterior_results[1].idata
single_idata = joinpath(lab_dir, "regional_nonstat.nc")
regional_fname = ArviZ.from_netcdf(regional_fname)
regional_idata
# Extract trend parameters
= vec(Array(single_idata.posterior[:β_μ]))
single_β_μ = vec(Array(regional_idata.posterior[:β_region]))
regional_β_μ
# Left plot: trend parameter distributions
= Axis(fig[1, 1], xlabel=L"Location Trend Parameter $\beta_\mu$ (inches per log(ppm))", ylabel="Density",
ax1 ="Trend Parameter Estimates")
title
hist!(ax1, single_β_μ, bins=30, color=(:blue, 0.5), label="Single-Station", normalization=:pdf)
hist!(ax1, regional_β_μ, bins=30, color=(:red, 0.5), label="Regional", normalization=:pdf)
axislegend(ax1, position=:rt)
# Right plot: uncertainty comparison
= Axis(fig[1, 2], xlabel="Model", ylabel="Standard Deviation",
ax2 ="Trend Parameter Uncertainty")
title
= [std(single_β_μ), std(regional_β_μ)]
trend_stds barplot!(ax2, 1:2, trend_stds, color=[:blue, :red], alpha=0.7)
= (1:2, ["Single-Station", "Regional"])
ax2.xticks
# Add text labels on bars
for (i, val) in enumerate(trend_stds)
text!(ax2, i, val - 0.25, text=string(round(val, digits=2)),
=(:center, :bottom), color=:white, fontsize=16)
alignend
figend
The regional approach appears to provide a somewhat more precise estimate of the trend parameter, even though we didn’t explicitly pool the trend parameter across stations. This improved precision occurs because by pooling the shape parameter and constraining the station-specific parameters, we get better estimates of the other model components, which in turn help constrain our trend estimate. This demonstrates how regional modeling can sharpen parameter estimates through better overall model specification.
6 Wrapup
This lab demonstrates several key concepts essential for PS1:
- Extremes are noisy and random. Trend tests and nonstationary model inferences conducted at individual stations can be implausible and inconsistent across nearby stations.
- Just because we don’t unambiguously detect a trend at a single station doesn’t mean we should assume that no trend exists.
- Regional models provide a straightforward way to share information across stations while preserving local differences
- Regionalization can improve parameter estimates and reduce uncertainty, even for parameters that aren’t explicitly pooled.