Regionalization and Spatial Pooling




Figure 1: Fagnant et al. (2020)


  1. Nearby stations should (usually) have similar precipitation probabilities
    1. Alternatively: flood, fire, etc.
  2. Reduce estimation error by pooling information
  3. Reduce sampling error of random variation betweeen nearby stations
  4. DOES NOT reduce sampling error of major regional events


NOAA Atlas 14 for stations with \(29.25 \leq ~\text{lat}~ \leq 30.25\) and \(-96 \leq ~\text{lon}~ \leq -95\)

annmax_precip ="data/dur01d_ams_na14v11_houston.csv", DataFrame)
stations = combine(groupby(annmax_precip, :stnid), :lat => first => :lat, :lon => first => :lon, :name => first => :name, :stnid => length => :n)
scatter(stations.lon,, zcolor=stations.n, xlabel="Longitude", ylabel="Latitude", colorbar_title="Number of Years", title="Locations of $(nrow(stations)) Stations")

Four longest records

stn_plots = []
for stnid in sort(stations, :n, rev=true)[!, :stnid][1:4]
    sub = annmax_precip[annmax_precip.stnid.==stnid, :]
    name = stations[stations.stnid.==stnid, :name][1]
    pᵢ = plot(sub.year, sub.precip_in, marker=:circ, xlabel="Year", ylabel="Annual Maximum Precipitation [in]", label=name)
    push!(stn_plots, pᵢ)
plot(stn_plots..., layout=(2, 2), size=(1600, 700), link=:all)

All stations

precip_df = sort(unstack(annmax_precip, :year, :stnid, :precip_in), :year)
plot(precip_df.year, Matrix(precip_df[!, Not(:year)]), label=false, yscale=:log10, yticks=([2, 5, 10, 15, 20, 25], ["2", "5", "10", "15", "20", "25"]), ylabel="Annual Maximum Precip [in]", linewidth=0.5)

Learning objectives

  1. Discuss the motivation for regionalization
  2. Describe key assumptions of different regionalization approaches
  3. Outline several specific models that implement regionalization

Classical approaches


  1. Classical approaches

  2. Hierarchical models

Recall: \(L\) moment estimators

  1. Linear combinations of order statistics
  2. Can choose parameters of a distribution so that the theoretical \(L\) moments of the distribution “match” the empirical \(L\) moments of the data
  3. Pros:
    1. Computationally efficient
    2. Work well inpractice
  4. Cons:
    1. Inflexible
    2. Difficult to quantify parametric uncertainty

See Hosking (1990) for details.

Regional Frequency Analysis

  1. Assign sites to regions
  2. Estimate \(L\) moments for each site
  3. Check for homogeneity
  4. Take regional \(L\) moments as the weighted mean of the site \(L\) moments
    1. For floods: apply scaling factor (e.g., average annual maximum flood)

Best implemented in the R lmomRFA package. (Can use RCall to call R from Julia.)

Region of Influence

  1. RFA assumes that all sites are assigned to a single region
    1. But often regions are not distinct!
  2. ROI: define a “similarity” between each pair of sites. E.g., distance, land use, elevation, climate.
  3. To make estimates at site \(i\), define its “region of influence” as the most similar sites (similar to KNN)
  4. Estimate \(L\) moments for each site and compute weighted average as in RFA

Hierarchical models


  1. Classical approaches

  2. Hierarchical models

Bayesian GEV in Turing

We’ve used Extremes.jl, but we need our own model for customization. Here’s a GEV model for a single site.

@model function stationary_gev(y::AbstractVector)
    # priors
1    μ ~ Normal(5, 5)
2    σ ~ LogNormal(0, 2)
3    ξ ~ Uniform(-0.5, 0.5)
    y .~ GeneralizedExtremeValue(μ, σ, ξ)
Wide priors on our parameters for now
Work on the log scale for \(\sigma\) to ensure positivity
Restrict shape parameter to the interval \((-1, 1)\)


We draw samples as

chn_stationary_gev = let # variables defined in a let...end block are temporary
    sub = vec(annmax_precip[annmax_precip.stnid.=="41-4321", :precip_in])
    model = stationary_gev(sub)
    sampler = Turing.NUTS()
    n_per_chain = 5000
    nchains = 4
    sample(model, sampler, MCMCThreads(), n_per_chain, nchains; drop_warmup=true)
Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯
           μ    3.4930    0.1880    0.0023   6509.0259   7713.4040    1.0005   ⋯
           σ    1.2617    0.1629    0.0019   7289.1721   9834.1796    1.0004   ⋯
           ξ    0.2697    0.1052    0.0013   6436.0820   4114.3823    1.0007   ⋯
                                                                1 column omitted


dists = [
    GeneralizedExtremeValue(μ, σ, ξ) for (μ, σ, ξ) in
    zip(vec(chn_stationary_gev[:μ]), vec(chn_stationary_gev[:σ]), vec(chn_stationary_gev[:ξ]))
histogram(quantile.(dists, 0.99), normalize=:pdf, label=false, xlabel="Annual Maximum Precipitation [in]", ylabel="Density", title="100 Year Return Level Posterior", linewidth=0)

Full pooling: concept

Stations should “pool” information

  1. Assume: within a region, all sites have the same distribution
  2. Estimate a single distribution for the entire region
  3. Analagrous to regional frequency analysis

Full pooling: model

@model function gev_fully_pooled(y::AbstractMatrix)
    N_yr, N_stn = size(y)
    μ ~ Normal(5, 5)
    σ ~ LogNormal(0, 2)
    ξ ~ Uniform(-0.5, 0.5)
    for s in 1:N_stn
        for t in 1:N_yr
            if !ismissing(y[t, s])
                y[t, s] ~ GeneralizedExtremeValue(μ, σ, ξ)

Full pooling: sampling

We have a lot of data per parameter, so sampling is pretty fast.

precip_array = Matrix(precip_df[!, Not(:year)])
chn_gev_fully_pooled = let # variables defined in a let...end block are temporary
    model = gev_fully_pooled(precip_array)
    sampler = Turing.NUTS()
    n_per_chain = 5000
    nchains = 4
    sample(model, sampler, MCMCThreads(), n_per_chain, nchains; drop_warmup=true)
Summary Statistics
  parameters      mean       std      mcse     ess_bulk     ess_tail      rhat     Symbol   Float64   Float64   Float64      Float64      Float64   Float64 ⋯
           μ    3.5082    0.0351    0.0003   11946.4544   13458.1452    1.0003 ⋯
           σ    1.3949    0.0286    0.0003   12932.6177   14075.0610    1.0002 ⋯
           ξ    0.2396    0.0192    0.0002   14374.2644   11676.1996    1.0003 ⋯
                                                                1 column omitted

Important caveat!

  1. We weight each observation equally, regardless of site or year
  2. If we have more observations for some years than others, we are implicitly weighting those years more heavily
  3. A better model would correct for this and weight each year equally

Partial pooling: concept

What if we believe there should be some variation between stations, but that they should still share information?

  1. In between the two extremes of “full pooling” and “no pooling”
  2. Model the parameters at each site as being drawn from a common distribution

This leads to models that look like this: \[ \begin{aligned} y_{s, t} &\sim \text{GEV}(\mu_s, \sigma_s, \xi_s) \\ \mu_s &\sim \text{Normal}(\mu^0, \tau^\mu) \\ \ldots \end{aligned} \]

where \(s\) is the site index and \(t\) is the year index.


In machine learning (e.g., Random Forests) we studied hyperparameters that the user must “tune”. In Bayesian statistics, hyperparameters are learned as part of the model.

In our model, the hyperparameters are \(\mu^0\) and \(\tau^\mu\). These describe the distribution from which the \(\mu_s\) are drawn. (And similarly for \(\sigma\) and, optionally, \(\xi\).)


We implement full pooling on \(\xi\) and partial pooling on \(\mu\) and \(\sigma\).

@model function gev_partial_pool(y::AbstractMatrix)
    N_yr, N_stn = size(y)

    # First define the hyperparameters. Stronger priors are helpful!
    μ₀ ~ Normal(5, 3)
    τμ ~ LogNormal(0, 0.5)
    σ₀ ~ LogNormal(0.5, 0.5)
    τσ ~ LogNormal(0, 0.5)

    # Parameters depend on the hyperparameters
    μ ~ filldist(Normal(μ₀, τμ), N_stn)
    σ ~ filldist(truncated(Normal(σ₀, τσ), 0, Inf), N_stn)

    # Parameters that don't depend on hyperparameters
    ξ ~ Uniform(-0.5, 0.5)

    # Likelihood
    for s in 1:N_stn
        for t in 1:N_yr
            y[t, s] ~ GeneralizedExtremeValue(μ[s], σ[s], ξ)


Downside: we have a lot of parameters to estimate! If we have \(N\) stations, then we have \(4 + N + N + 1 = 5+2N\) parameters to estimate. This makes sampling slower.

Sampling is left as an exercise. For an example, see Lima et al. (2016).

Spatial Models

We can also model the paramteters as a function of location. This can simplify our model because we only need to estimate the parameters that describe how our parameters vary spatially.


We can imagine a very simple toy model:

\[ \begin{aligned} \mu(s) &= \alpha^\mu + \beta^\mu_1 \cdot \text{lat}(s) + \beta^\mu_2 \cdot \text{lon}(s) \\ \sigma(s) &= \alpha^\sigma + \beta^\sigma_1 \cdot \text{lat}(s) + \beta^\sigma_2 \cdot \text{lon}(s) y &\sim \text{GEV}(\mu(s), \sigma(s), \xi) \end{aligned} \]

Last Friday

I presented work from my group combining Bayesian and spatial models at the 2023 Statistical Hydrology conference. Slides here.


Fagnant, C., Gori, A., Sebastian, A., Bedient, P. B., & Ensor, K. B. (2020). Characterizing spatiotemporal trends in extreme precipitation in Southeast Texas. Natural Hazards, 104(2), 1597–1621.
Hosking, J. R. M. (1990). L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. Journal of the Royal Statistical Society. Series B (Methodological), 52(1), 105–124. Retrieved from
Lima, C. H. R., Lall, U., Troy, T., & Devineni, N. (2016). A hierarchical Bayesian GEV model for improving local and regional flood quantile estimates. Journal of Hydrology, 541, 816–823.