# Core packages
using Downloads
using TidierData
using DataFrames
using Extremes
using Turing
using Optim
using ColorSchemes
using CairoMakie
using GeoMakie
using Makie.Unitful
using Makie.Dates
ENV["DATAFRAMES_ROWS"] = 5
# Configure plotting backend
activate!(type = "svg")
CairoMakie.
# utility functions
include("util.jl")
= Makie.UnitfulConversion(u"inch") rainfall_conversion
CEVE 543 Fall 2025 Lab 3: GEV Inference
Estimating GEV parameters and characterizing uncertainty
Do Before Lab
- Accept the assignment. Use the GitHub Classroom link posted on Canvas to “accept” the assignment, which will give you your own GitHub repository.
- Clone your repository to your computer using GitHub Desktop or the command line.
- Open the repository in VS Code directly from VS Code, GitHub Desktop, or your command line.
- Activate the Julia environment. In VS Code, open the command palette (View → Command Palette) and type “Julia: Start REPL”. Alternatively, run
julia --project=.
in the terminal, then runusing Pkg; Pkg.instantiate()
to install required packages. The easiest method is to type]
in the REPL to enter package mode, then typeinstantiate
. - Verify rendering works. Run
quarto render index.qmd
in your terminal - it should create both PDF and HTML documents.
Do During Lab
This lab guides you through comparing Bayesian and traditional approaches to GEV parameter estimation:
- Choose and analyze a station with adequate data for GEV fitting
- Compare fitting methods using both Extremes.jl and Turing.jl
- Analyze multiple stations by incorporating nearby stations into your analysis
- Explore uncertainty through a synthetic data experiment
As with previous labs, you do not need to write code from scratch – just modify and run provided code.
To submit your work, push your changes to GitHub, render to PDF, and upload the PDF to Canvas.
Your Analysis
Delete this entire section (including the “Do Before Lab” and “Do During Lab” sections above) and replace it with your analysis responses. Use the numbered format below for your final submission.
Response 1: Station Selection Rationale
Replace this text with 2-3 sentences explaining why you chose your particular station for GEV analysis. Address factors like data length, location relevance, and data quality.
Response 2: Fitting Methods Comparison
Replace this text with 3-4 sentences comparing the Extremes.jl and Turing.jl approaches. Discuss parameter differences, return level estimates, and potential reasons for any discrepancies.
Response 3: Multi-Station Geographic Analysis
Replace this text with 4-5 sentences analyzing the spatial patterns in your multi-station comparison. Address the credibility of similarities/differences and explain how geographic factors might influence extreme precipitation patterns.
Response 4: Uncertainty and Sample Size Insights
Replace this text with 3-4 sentences interpreting your synthetic experiment results. Discuss adequate sample sizes for reliable GEV estimation and implications for infrastructure design.
Response 5: Final Synthesis
Replace this text with 2-3 sentences synthesizing your key findings and their broader implications for climate risk assessment and engineering applications.
1 Data Setup and Station Selection
1.1 Loading Required Packages
First, we need to load all the Julia packages we’ll use in this analysis:
1.2 Loading NOAA Precipitation Data
Like previous labs, we’ll download and read the Texas precipitation data:
= "dur01d_ams_na14v11.txt"
fname = "https://hdsc.nws.noaa.gov/pub/hdsc/data/tx/dur01d_ams_na14v11.txt"
url
if !isfile(fname)
Downloads.download(url, fname)
end
= read_noaa_data(fname)
stations, rainfall_data display(stations)
display(rainfall_data)
- 1
- Check if data file already exists locally
- 2
- Download the file if it doesn’t exist
- 3
- Read and parse the NOAA precipitation data
Row | stnid | noaa_id | name | state | latitude | longitude | years_of_data |
---|---|---|---|---|---|---|---|
Int64 | String | String | String | Float64 | Float64 | Int64 | |
1 | 1 | 60-0011 | CLEAR CK AT BAY AREA BLVD | TX | 29.4977 | -95.1599 | 31 |
2 | 2 | 60-0019 | TURKEY CK AT FM 1959 | TX | 29.5845 | -95.1869 | 31 |
3 | 3 | 60-0022 | ARMAND BYU AT GENOARED BLF RD | TX | 29.6345 | -95.1123 | 31 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
815 | 815 | 87-0031 | SECO CREEK AT MILLER RANCH | TX | 29.5731 | -99.4028 | 20 |
816 | 816 | 99-2048 | COTULLA | TX | 28.4567 | -99.2183 | 116 |
Row | stnid | date | year | rainfall |
---|---|---|---|---|
Int64 | Date | Int64 | Quantity…? | |
1 | 1 | 1987-06-11 | 1987 | 6.31 inch |
2 | 1 | 1988-09-02 | 1988 | 5.46 inch |
3 | 1 | 1989-08-01 | 1989 | 11.39 inch |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
61924 | 816 | 2016-08-20 | 2016 | 3.44 inch |
61925 | 816 | 2017-09-25 | 2017 | 2.72 inch |
1.3 Choosing Your Station
For GEV analysis, we need sufficient data. Let’s examine stations with longer records:
# Show stations ordered by data availability
@chain stations begin
@arrange(desc(years_of_data))
@slice_head(n = 10)
end
- 1
- Sort stations by years of data in descending order
- 2
- Select the top 10 stations with most data
Row | stnid | noaa_id | name | state | latitude | longitude | years_of_data |
---|---|---|---|---|---|---|---|
Int64 | String | String | String | Float64 | Float64 | Int64 | |
1 | 112 | 41-0420 | AUSTIN | TX | 30.2682 | -97.7426 | 169 |
2 | 600 | 41-7622 | RIO GRANDE CITY | TX | 26.3769 | -98.8117 | 168 |
3 | 771 | 79-0043 | BROWNSVILLE | TX | 25.9156 | -97.4186 | 168 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
9 | 303 | 41-3265 | FT GRIFFIN | TX | 32.9236 | -99.2225 | 148 |
10 | 779 | 79-0055 | GALVESTON | TX | 29.3048 | -94.7934 | 146 |
Choose YOUR OWN station for analysis.
# REQUIRED: You MUST choose a different station than the default
= 111
my_stnid
= @chain stations begin
my_station @filter(stnid == !!my_stnid)
firstend
# Extract rainfall data for your chosen station
= @chain rainfall_data begin
my_precip @filter(stnid == !!my_stnid)
@arrange(date)
end
println("Selected station: $(my_station.noaa_id) - $(my_station.name)")
println("Years of data: $(my_station.years_of_data)")
- 1
- Filter rainfall data for your chosen station using station ID
- 2
- Sort the data chronologically by date
Selected station: 41-0408 - ATLANTA
Years of data: 87
Explain why you chose this particular station and what makes it suitable for GEV analysis. Consider factors like data length, geographic location, and data completeness.
Let’s visualize your chosen station’s rainfall time series:
function plot_time_series(station_row, rainfall_df)
= Figure(size = (800, 400))
fig = Axis(fig[1, 1],
ax = "Annual Maximum 24-Hour Rainfall [inch]",
ylabel = "$(station_row.noaa_id): $(station_row.name)",
title = rainfall_conversion)
dim2_conversion
lines!(ax, rainfall_df.date, rainfall_df.rainfall, color = :blue, linewidth = 2)
scatter!(ax, rainfall_df.date, rainfall_df.rainfall, markersize = 10, marker = :circle, strokewidth = 2, color = :transparent)
figend
plot_time_series(my_station, my_precip)
2 GEV Fitting: Multiple Approaches
We’ll compare different approaches to fitting GEV distributions to understand how estimation methods affect our results.
2.1 MLE with Extremes.jl
First, let’s fit a GEV distribution using maximum likelihood estimation (MLE) with Extremes.jl
:
# Extract clean data for fitting
= collect(skipmissing(ustrip.(u"inch", my_precip.rainfall)))
y
# Fit GEV using maximum likelihood estimation (MLE)
= gevfit(y)
extremes_fit
# Extract parameters
= location(extremes_fit)[1]
μ_extremes = scale(extremes_fit)[1]
σ_extremes = shape(extremes_fit)[1]
ξ_extremes
# Display parameters in a DataFrame
= DataFrame(
params_extremes = ["Location (μ)", "Scale (σ)", "Shape (ξ)"],
Parameter = [round(μ_extremes, digits = 3), round(σ_extremes, digits = 3), round(ξ_extremes, digits = 3)],
Value
)println("Extremes.jl GEV parameters:")
params_extremes
# Create distribution object for analysis
= GeneralizedExtremeValue(μ_extremes, σ_extremes, ξ_extremes) extremes_dist
- 1
- Convert rainfall data to plain numbers, removing units and missing values
- 2
- Fit GEV distribution using maximum likelihood estimation (MLE)
- 3
- Extract location parameter (μ) from the fitted model
- 4
- Extract scale parameter (σ) from the fitted model
- 5
- Extract shape parameter (ξ) from the fitted model
- 6
- Create a distribution object for further analysis and plotting
Extremes.jl GEV parameters:
Distributions.GeneralizedExtremeValue{Float64}(μ=2.954978095617213, σ=0.7711826954655844, ξ=0.3417501165764326)
2.2 Method-of-Moments with Extremes.jl
Now let’s try the method-of-moments approach using probability-weighted moments (PWM):
# Fit GEV using probability-weighted moments (method-of-moments)
= gevfitpwm(y)
extremes_pwm_fit
# Extract parameters
= location(extremes_pwm_fit)[1]
μ_extremes_pwm = scale(extremes_pwm_fit)[1]
σ_extremes_pwm = shape(extremes_pwm_fit)[1]
ξ_extremes_pwm
# Display parameters in a DataFrame
= DataFrame(
params_extremes_pwm = ["Location (μ)", "Scale (σ)", "Shape (ξ)"],
Parameter = [round(μ_extremes_pwm, digits = 3), round(σ_extremes_pwm, digits = 3), round(ξ_extremes_pwm, digits = 3)],
Value
)println("Extremes.jl PWM parameters:")
params_extremes_pwm
# Create distribution object
= GeneralizedExtremeValue(μ_extremes_pwm, σ_extremes_pwm, ξ_extremes_pwm) extremes_pwm_dist
- 1
- Fit GEV distribution using probability-weighted moments (PWM) method
- 2
- Extract location parameter (μ) from the PWM fitted model
- 3
- Extract scale parameter (σ) from the PWM fitted model
- 4
- Extract shape parameter (ξ) from the PWM fitted model
- 5
- Create distribution object for PWM approach
Extremes.jl PWM parameters:
Distributions.GeneralizedExtremeValue{Float64}(μ=2.9771973838711747, σ=0.8211598914692136, ξ=0.25901290436114777)
2.3 Bayesian Approach with Turing.jl
Now let’s implement a Bayesian approach using Turing.jl with wide priors:
# Define Bayesian GEV model
@model function gev_model(y)
# Priors informed by data characteristics
~ Normal(0, 10)
μ ~ Normal(0, 10)
log_σ ~ Normal(0.0, 0.25)
ξ
= exp(log_σ)
σ
.~ GeneralizedExtremeValue(μ, σ, ξ)
y end
# Fit the model using MLE
= maximum_likelihood(gev_model(y), NelderMead(); initial_params = [0.0, 1.0, 0.0])
turing_fit
# Extract parameters
= turing_fit.values[:μ]
μ_turing = exp(turing_fit.values[:log_σ])
σ_turing = turing_fit.values[:ξ]
ξ_turing
# Display parameters in a DataFrame
= DataFrame(
params_turing = ["Location (μ)", "Scale (σ)", "Shape (ξ)"],
Parameter = [round(μ_turing, digits = 3), round(σ_turing, digits = 3), round(ξ_turing, digits = 3)],
Value
)println("Turing.jl GEV parameters (MLE):")
params_turing
# Create distribution object
= GeneralizedExtremeValue(μ_turing, σ_turing, ξ_turing) turing_dist
- 1
- Location parameter with wide prior
- 2
- Log-scale parameter with wide prior (ensures positive scale)
- 3
- Shape parameter with conservative prior around zero
- 4
- Transform to positive scale parameter
- 5
- Vectorized likelihood for the GEV distribution
- 6
- Maximum likelihood estimation using updated Turing.jl syntax per docs
- 7
- Extract location parameter
- 8
- Extract and transform scale parameter
- 9
- Extract shape parameter
- 10
- Create distribution object for analysis
Turing.jl GEV parameters (MLE):
Distributions.GeneralizedExtremeValue{Float64}(μ=2.9549821524838222, σ=0.7711833602977148, ξ=0.3417462762770801)
2.4 Comparing All Three Methods
Let’s compare the fitted parameters and return level estimates across all three approaches:
# Compare parameters
= DataFrame(
params_comparison = ["Extremes MLE", "Extremes PWM", "Turing MLE"],
Method = [round(μ_extremes, digits = 3), round(μ_extremes_pwm, digits = 3), round(μ_turing, digits = 3)],
μ = [round(σ_extremes, digits = 3), round(σ_extremes_pwm, digits = 3), round(σ_turing, digits = 3)],
σ = [round(ξ_extremes, digits = 3), round(ξ_extremes_pwm, digits = 3), round(ξ_turing, digits = 3)],
ξ
)
params_comparison
# Compare return levels
= [5, 10, 25, 50, 100]
return_periods = DataFrame(
return_levels_comparison = return_periods,
T_years = [round(quantile(extremes_dist, 1 - 1 / T), digits = 2) for T in return_periods],
Extremes_MLE = [round(quantile(extremes_pwm_dist, 1 - 1 / T), digits = 2) for T in return_periods],
Extremes_PWM = [round(quantile(turing_dist, 1 - 1 / T), digits = 2) for T in return_periods],
Turing_MLE
) return_levels_comparison
Row | T_years | Extremes_MLE | Extremes_PWM | Turing_MLE |
---|---|---|---|---|
Int64 | Float64 | Float64 | Float64 | |
1 | 5 | 4.47 | 4.48 | 4.47 |
2 | 10 | 5.57 | 5.49 | 5.57 |
3 | 25 | 7.43 | 7.07 | 7.43 |
4 | 50 | 9.26 | 8.52 | 9.26 |
5 | 100 | 11.57 | 10.24 | 11.57 |
Now let’s visualize all three fits together:
function plot_gev_comparison(station_data, extremes_dist, extremes_pwm_dist, turing_dist, station_info)
= Figure()
fig = Axis(fig[1, 1],
ax = "Return Period (years)",
xlabel = "Return Level (inches)",
ylabel = "GEV Fit Comparison\n$(station_info.noaa_id): $(station_info.name)",
title = log10)
xscale
# Plot theoretical curves
= create_return_period_range(1.1, 250, 100)
T_smooth
# Extremes.jl MLE curve
= [quantile(extremes_dist, 1 - 1 / T) for T in T_smooth]
levels_extremes lines!(ax, T_smooth, levels_extremes, color = :blue, linewidth = 2, label = "Extremes MLE")
# Extremes.jl PWM curve
= [quantile(extremes_pwm_dist, 1 - 1 / T) for T in T_smooth]
levels_extremes_pwm lines!(ax, T_smooth, levels_extremes_pwm, color = :green, linewidth = 2, linestyle = :dot, label = "Extremes PWM")
# Turing.jl curve
= [quantile(turing_dist, 1 - 1 / T) for T in T_smooth]
levels_turing lines!(ax, T_smooth, levels_turing, color = :red, linewidth = 2, linestyle = :dash, label = "Turing MLE")
# Empirical data points
= weibull_plotting_positions(station_data.rainfall)
emp_levels, emp_periods scatter!(ax, emp_periods, emp_levels,
= :black, markersize = 8, marker = :circle,
color = "Observed Data")
label
# Standard return periods
= [5, 10, 25, 50, 100, 250]
return_periods = return_periods
ax.xticks
axislegend(ax, position = :rb)
return fig
end
plot_gev_comparison(my_precip, extremes_dist, extremes_pwm_dist, turing_dist, my_station)
- 1
- Use logarithmic scale for x-axis (return periods)
- 2
- Create smooth range of return periods for plotting curves
- 3
- Calculate return levels for Extremes.jl MLE fit across all return periods
- 4
- Calculate return levels for Extremes.jl PWM fit across all return periods
- 5
- Calculate return levels for Turing.jl fit across all return periods
- 6
- Calculate empirical plotting positions from observed data
- 7
- Set x-axis tick marks at standard return periods
- 8
- Add legend in bottom-right corner
Analyze how the three fitting approaches compare. Address parameter differences, return level estimates, and provide explanations for any discrepancies you observe between the MLE, PWM, and Bayesian methods.
3 Multi-Station Analysis
Individual station analyses can be limited by sample size. Let’s expand our analysis by incorporating data from nearby stations.
3.1 Finding Nearest Stations
# Find the 5 nearest stations to your chosen station
= find_nearest_stations(my_station, stations, 5)
nearest_stations
= my_station.longitude
target_lon = my_station.latitude
target_lat = @chain nearest_stations begin
nearest_with_distance @mutate(distance_km = calc_distance(longitude, latitude, !!target_lon, !!target_lat))
@select(noaa_id, name, distance_km, years_of_data)
end
println("Nearest stations to $(my_station.noaa_id):")
display(nearest_with_distance)
- 1
- Find the 5 geographically closest stations to your chosen station
- 2
- Calculate distance in kilometers from your station to each nearby station
- 3
- Select relevant columns for display: station ID, name, distance, and data years
Nearest stations to 41-0408:
Row | noaa_id | name | distance_km | years_of_data |
---|---|---|---|---|
String | String | Quantity… | Int64 | |
1 | 41-9916 | WRIGHT PATMAN DM & LK | 19.9946 km | 70 |
2 | 41-5229 | LINDEN | 22.3227 km | 77 |
3 | 41-5667 | MAUD | 28.4298 km | 78 |
4 | 41-8942 | TEXARKANA | 35.7352 km | 49 |
5 | 41-6270 | NEW BOSTON | 43.1546 km | 44 |
3.2 Fitting GEV to Multiple Stations
Let’s fit GEV distributions to all stations (your chosen station plus the 5 nearest):
# Function to fit GEV to a single station
function fit_station_gev(station_row)
= station_row.stnid
target_stnid = @chain rainfall_data begin
st_precip @filter(stnid == !!target_stnid)
@arrange(date)
end
= collect(skipmissing(ustrip.(u"inch", st_precip.rainfall)))
y_st
# Fit using Extremes.jl
= gevfit(y_st)
extremes_fit_st = location(extremes_fit_st)[1]
μ_st = scale(extremes_fit_st)[1]
σ_st = shape(extremes_fit_st)[1]
ξ_st
return (
= GeneralizedExtremeValue(μ_st, σ_st, ξ_st),
distribution = (noaa_id = station_row.noaa_id, n_years = length(y_st)),
info
)end
# Fit all stations using map
= map(fit_station_gev, eachrow(nearest_stations))
station_results = [r.distribution for r in station_results]
station_fits = [r.info for r in station_results] station_info
- 1
- Extract station ID first to avoid variable scoping issues
- 2
- Filter rainfall data for this specific station
- 3
- Sort the data chronologically by date
- 4
- Apply the fitting function to each of the nearest stations
- 5
- Extract the fitted GEV distribution objects from results
- 6
- Extract station information (ID and years of data) from results
Let’s visualize the geographic distribution of these stations:
function plot_station_map(my_station, nearest_stations)
# Create map plot with all stations
= Figure(size = (800, 600))
fig = GeoAxis(fig[1, 1]; source = "+proj=latlong", dest = "+proj=merc",
ga = "Selected Stations for GEV Analysis", xgridvisible = false, ygridvisible = false)
title
# Add US states (white with black borders)
= GeoMakie.naturalearth("admin_1_states_provinces_lakes", 110)
states poly!(ga, states.geometry;
= :black, strokewidth = 1, color = :white)
strokecolor
# Plot all stations as small dots colored by years of data
scatter!(ga, stations.longitude, stations.latitude;
= stations.years_of_data, colormap = :viridis, markersize = 4, alpha = 0.7,
color = "All stations")
label
# Plot nearest stations with distinct colors
= ColorSchemes.Set1_5
colors scatter!(ga, nearest_stations.longitude, nearest_stations.latitude,
= colors[1:length(nearest_stations.longitude)],
color = 12, strokewidth = 2, strokecolor = :black,
markersize = "Analysis stations")
label
# Highlight the chosen station
scatter!(ga, [my_station.longitude], [my_station.latitude],
= :gold, markersize = 15, marker = :star5,
color = 2, strokecolor = :black,
strokewidth = "My station")
label
# Set plot bounds based on data bounds plus padding
= 0.3
delta = extrema(stations.longitude)
min_lon, max_lon = extrema(stations.latitude)
min_lat, max_lat
xlims!(ga, min_lon - delta, max_lon + delta)
ylims!(ga, min_lat - delta, max_lat + delta)
# Add colorbar for years of data
Colorbar(fig[1, 2], label = "Years of Data", colormap = :viridis,
= (minimum(stations.years_of_data), maximum(stations.years_of_data)))
limits
axislegend(ga, position = :lt)
return fig
end
plot_station_map(my_station, nearest_stations)
- 1
- Create geographic axis with map projection (lat/lon to Mercator)
- 2
- Load and display US state boundaries for geographic context
- 3
- Plot all stations as small dots, colored by years of available data
- 4
- Highlight the 5 nearest stations with larger, distinct colored markers
- 5
- Mark your chosen station with a gold star for clear identification
- 6
- Set map bounds with padding around all station locations
- 7
- Add colorbar showing the years of data scale
Now let’s compare the GEV fits across these stations:
function plot_multi_station_comparison(station_fits, station_info)
= Figure(size = (900, 600))
fig = Axis(fig[1, 1],
ax = "Return Period (years)",
xlabel = "Return Level (inches)",
ylabel = "Multi-Station GEV Comparison",
title = log10)
xscale
= ColorSchemes.Set1_5
colors = create_return_period_range(1.1, 250, 100)
T_smooth
for (i, (dist, info)) in enumerate(zip(station_fits, station_info))
# Plot return levels using quantile function
= [quantile(dist, 1 - 1 / T) for T in T_smooth]
levels
lines!(ax, T_smooth, levels,
= colors[i], linewidth = 2,
color = "$(info.noaa_id) ($(info.n_years) yrs)")
label end
= [5, 10, 25, 50, 100]
return_periods = return_periods
ax.xticks axislegend(ax, position = :rb)
return fig
end
plot_multi_station_comparison(station_fits, station_info)
- 1
- Use distinct color scheme for different stations
- 2
- Create smooth range of return periods for plotting
- 3
- Calculate return levels for each station’s fitted GEV distribution
- 4
- Plot each station with unique color, labeled with station ID and years of data
Analyze the spatial patterns in your multi-station comparison. Address whether the station similarities or differences are credible, what this reveals about spatial variability in extreme precipitation, and how geographical factors (elevation, proximity to coast, urban effects, etc.) might explain the patterns you observe.
4 Uncertainty Analysis: Synthetic Experiment
Understanding parameter uncertainty is crucial for risk assessment. Let’s explore how sample size affects GEV parameter estimation through a controlled experiment.
4.1 Setting Up the Synthetic Experiment
using Random
Random.seed!(543)
# Define "true" GEV parameters for our synthetic data
= 4.0
true_μ = 1.2
true_σ = 0.1
true_ξ
= GeneralizedExtremeValue(true_μ, true_σ, true_ξ)
true_gev
# Display true parameters in a DataFrame
= DataFrame(
true_params = ["Location (μ)", "Scale (σ)", "Shape (ξ)"],
Parameter = [true_μ, true_σ, true_ξ],
Value
)println("True GEV parameters:")
true_params
- 1
- Set random seed for reproducible synthetic data generation
- 2
- True location parameter (center of the distribution)
- 3
- True scale parameter (spread of the distribution)
- 4
- True shape parameter (tail heaviness, positive = heavy tail)
- 5
- Create the “true” GEV distribution for generating synthetic data
True GEV parameters:
Row | Parameter | Value |
---|---|---|
String | Float64 | |
1 | Location (μ) | 4.0 |
2 | Scale (σ) | 1.2 |
3 | Shape (ξ) | 0.1 |
4.2 Running the Experiment
For each sample size, we’ll generate synthetic data, fit GEV parameters, and repeat this process many times:
# Experimental setup
= [30, 50, 100]
sample_sizes = 500
n_replicates = 100
return_period
# Storage for results
= Dict()
experiment_results
for n in sample_sizes
# Storage for this sample size
= []
fitted_params = []
return_levels
for rep in 1:n_replicates
# Generate synthetic data
= rand(true_gev, n)
synthetic_data
# Fit GEV using gevfit (faster for synthetic experiments)
try
= gevfit(synthetic_data)
fit
# Extract parameters
= location(fit)[1]
μ_fit = scale(fit)[1]
σ_fit = shape(fit)[1]
ξ_fit
# Calculate 100-year return level
= GeneralizedExtremeValue(μ_fit, σ_fit, ξ_fit)
fitted_dist = quantile(fitted_dist, 1 - 1 / return_period)
rl_100
# Store results
push!(fitted_params, (μ = μ_fit, σ = σ_fit, ξ = ξ_fit))
push!(return_levels, rl_100)
catch e
# Skip failed fits (can happen with synthetic data)
continue
end
end
= (params = fitted_params, return_levels = return_levels)
experiment_results[n] end
- 1
- Sample sizes to test (years of synthetic data)
- 2
- Number of Monte Carlo replications for each sample size
- 3
- Focus analysis on 100-year return level estimates
- 4
- Generate n random samples from the true GEV distribution
- 5
- Fit GEV parameters to each synthetic dataset
- 6
- Calculate 100-year return level from fitted parameters
- 7
- Skip any fits that fail due to numerical issues
We can now visualize this
function plot_uncertainty_experiment(experiment_results, true_gev, return_period)
= Figure(size = (1200, 800))
fig
# Calculate true 100-year return level for reference
= quantile(true_gev, 1 - 1 / return_period)
true_rl_100
= [30, 50, 100]
sample_sizes = [:blue, :red, :green]
colors
# Top row: Histograms of 100-year return levels for each sample size
= [] # Store axes for linking
top_axes for (i, n) in enumerate(sample_sizes)
= Axis(fig[1, i],
ax_top = "100-Year Return Level (inches)",
xlabel = i == 1 ? "Frequency" : "",
ylabel = "N = $n years",
title = i == 1)
yticklabelsvisible
= experiment_results[n].return_levels
return_levels
# Create histogram
hist!(ax_top, return_levels, bins = 15, color = (colors[i], 0.7),
= 1, strokecolor = colors[i])
strokewidth
# Add true value line
vlines!(ax_top, [true_rl_100], color = :black, linestyle = :dash, linewidth = 2)
push!(top_axes, ax_top)
end
# Link x-axes for top row only
linkxaxes!(top_axes...)
# Bottom row: Return period curves for each sample size
= create_return_period_range(2, 250, 50)
T_smooth = [quantile(true_gev, 1 - 1 / T) for T in T_smooth]
true_levels
= [] # Store axes for linking
bottom_axes for (i, n) in enumerate(sample_sizes)
= Axis(fig[2, i],
ax_bottom = "Return Period (years)",
xlabel = i == 1 ? "Return Level (inches)" : "",
ylabel = log10,
xscale = i == 1)
yticklabelsvisible
# Plot true curve
lines!(ax_bottom, T_smooth, true_levels, color = :black, linewidth = 3, label = "True GEV")
# Plot fitted curves for this sample size
= experiment_results[n].params
params
for (j, p) in enumerate(params)
if j <= 30 # Plot first 30 to show uncertainty
= GeneralizedExtremeValue(p.μ, p.σ, p.ξ)
fitted_dist = [quantile(fitted_dist, 1 - 1 / T) for T in T_smooth]
levels lines!(ax_bottom, T_smooth, levels,
= colors[i], alpha = 0.2, linewidth = 1)
color end
end
# Set consistent x-axis
= [5, 10, 25, 50, 100, 250]
return_periods = return_periods
ax_bottom.xticks
# Add legend only to first plot
if i == 1
axislegend(ax_bottom, position = :rb)
end
push!(bottom_axes, ax_bottom)
end
# Link y-axes for bottom row
linkyaxes!(bottom_axes...)
return fig
end
plot_uncertainty_experiment(experiment_results, true_gev, return_period)
- 1
- Calculate the true 100-year return level for reference
- 2
- Show y-axis labels only on the leftmost histogram
- 3
- Create histogram of 100-year return level estimates
- 4
- Add vertical line showing the true 100-year return level
- 5
- Link x-axes so all histograms have the same x-range
- 6
- Create return period range for plotting curves
- 7
- Calculate true return levels across all return periods
- 8
- Show y-axis labels only on the leftmost curve plot
- 9
- Plot subset of fitted curves to show uncertainty without cluttering
- 10
- Link y-axes so all curve plots have the same y-range
What sample size would you consider adequate for reliable GEV parameter estimation? How does this synthetic experiment inform your interpretation of the real station analyses from earlier in the lab? Is the uncertainty observed in this experiment an upper or a lower bound for the uncertainty in real-world applications, and why?
Synthesize your key findings from all analyses and discuss the broader implications for infrastructure design and climate risk assessment.