Parametric uncertainty and Monte Carlo

Lecture

2023-09-11

Motivation

Today

Motivation
Flood depths
Functions of random variables
Parameter uncertainty
Wrapup

Recall: Bayesian Decision Theory

Recall: \[ \mathbb{E}\left[L(a, \theta) \right] = \int_\theta L(a, \theta) p(\theta) d\theta \] Where \(\theta\) is a vector of parameters, \(a\) is some action or decision, and \(L\) is the loss function.

Note

We previously called \(\theta\) a “state of the world” and \(L\) a “reward function”.

Problem statement

You have been comissioned by a client to assess their exposure to flooding. Specifically, they want to know the probability distribution of annual flood damages at their property if they do not elevate or floodproof their building.

\(p(h)\): probability distribution of annual maximum flood heights at their property
\(d(h)\): flood damages as a deterministic function of flood height
\(p(d) = \int_h d(h) p(h) \, dh\)

With this information, they can compute metrics like the expected annual damage, the 99th percentile annual damage, and the probability of any flood occurring that will help them make a decision.

Flood depths

Today

Motivation
Flood depths
Functions of random variables
Parameter uncertainty
Wrapup

Data

We fold this long code block to save space.

3×2 DataFrame

Row	year	lsl
	Int64	Quantity…
1	1928	5.05343 ft
2	1929	3.90728 ft
3	1930	4.60245 ft

Recall the *= syntax: x *= 2 is equivalent ot x = x * 2. We use .*= to work element-wise on the vector.
This adds the points to the plot
normalize=:pdf normalizes the histogram so that the area under the curve is 1.
The \(y\) axis of the histogram matches that of the line plot, so we remove any y ticks from the histogram.
We can define very flexible layouts for combining multiple plots. See the docs.
suptitle adds a title to the entire figure.

This data comes from the NOAA Tides and Currents database, specifically a gauge at Sewell’s Point, VA, with sea level rise removed.

Flood depths model

We want a probability distribution for flood depths \(p(h)\). We can work with the log of the flood depths and treat them as normally distributed: \[ \log h_i \sim \mathcal{N}(\mu, \sigma^2) \]

We call this a lognormal distribution: \[ h_i \sim \text{LN}(\mu, \sigma) \]

Distribution and histogram

normalizes the histogram so that the area under the curve is 1.
The fit_mle function requires a vector of numbers, without units. We use ustrip to convert the units to a number, using feet (u"ft") as the reference unit.
This accesses the :lsl_ft column we created.

Note

We’ll spend a whole module on extreme value distributions. They should do a better job of modeling annual maxima but require some subtelty.

Return periods and levels

Flood probabilities are often plotted as return periods. This is just a visualization the CDF on a log (or log-log) scale.

This is a trick to get the ticks on the axes right when we plot on a log scale.

Note

Work through this code and make sure you understand it

Plot Position

It is common to add the data points as dots on the return period curve. This begs the question: what return period is assigned to each point? This is a subjective choice, but a common one is the Weibull plotting position:

We are again using the vector that we converted to a scalar, in feet.

🤔

How well does this fit the data?

Functions of random variables

Today

Motivation
Flood depths
Functions of random variables
Parameter uncertainty
Wrapup

Depth-damage model

A bounded logistic function provides a plausible depth-damage model for now: \[ d(h) = \mathbb{I}\left[x > 0 \right] \frac{L}{1 + \exp(-k(x - x_0))} \] where \(d\) is damage as a percent of total value, \(h\) is water depth, \(\mathbb{I}\) is the indicator function, \(L\) is the maximum loss, \(k\) is the slope, and \(x_0\) is the inflection point. We fix \(L=1\) (known upper bound: 100% damage) so

Plotting fit

We will use \(x_0 = 4\) and \(k = 0.75\).

This is another syntax for defining a function. It is equivalent to f(x) = blogistic(x, x0, k). It’s similar to lambda functions in python

Analytic approach

Plugging in our bounded logistic model for \(d(h)\) and our lognormal model for \(h\): \[ \begin{align} p(d) &= \int_h d(h) p(h) \, dh \\ &= \int_{-\infty}^\infty \mathbb{I}[h > 0] \text{logistic}(h) \mathcal{N}(h | \mu, \sigma^2) \, dh\\ &= \int_0^\infty \text{logistic}(h) \mathcal{N}(\mu, \sigma^2) \, dh\\ &= \int_0^\infty \frac{1}{1 + \exp(-k * (x - x0))} \frac {1}{\sigma {\sqrt {2\pi }}} \exp \left\{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2} \right\} \, dh \end{align} \]

Limitations: analytic appraoch

We might be able to solve this analytically (Wolfram Alpha can’t…). But…

Numerous simplifying assumptions and approximations.
What if we want to use a different distribution?
A different damage model?

What is Monte Carlo?

Monte Carlo methods are a set of computational techniques for:

Generating samples from a target distribution
approximating the expectations of some random quantities under this distribution.

Monte Carlo: Theory

We want to approximate the quantity \[ \int_0^\infty \text{logistic}(h) \mathcal{N}(\mu, \sigma^2) \, dh \]::::: {.incremental} :::: {.columns} ::: {.column width=“50%”} A deterministic strategy:

sample \(h^s = 0, \Delta h, 2\Delta h, \ldots, (S-1)\Delta h\)
compute \(\text{logistic}(h^s) \mathcal{N}(h^s | \mu, \sigma^2)\) at each point and sum
drawbacks: we have to go to \(\infty\) and select \(\Delta h\).

::: ::: {.column width=“50%”} A sampling strategy

sample \(h^1, h^2, \ldots, h^S \sim p(h)\) – which we can do because we have a model for \(p(h)\)
for each value: compute \(\mathbb{I}(h^s > 0) \text{logistic}(h^s)\) and take the average
this converges to the correct expectation!

::: :::: :::::

More formally

If \(\theta^s \sim p(\theta)\), then \[ \mathbb{E}\left[ f(\theta) \right] = \int_\theta f(\theta) p(\theta) d\theta \approx \frac{1}{S} \sum_{s=1}^S f(\theta^s) \]

References

See chapter 10 of Gelman et al. (2014) for more details or section 5 of Betancourt (2018) for a more precise mathematical treatment.

Monte Carlo: Implementation

A deceptively simple idea:

Monte Carlo Expectations

Given these samples, we can compute expectations of any function of the samples. For example, we can compute the mean damage, the 99th percentile of damage, and the probability of any damage occurring

3-element Vector{Float64}:
 0.00235
 0.07008
 0.03486

Parameter uncertainty

Today

Motivation
Flood depths
Functions of random variables
Parameter uncertainty
Wrapup

Overview

We have been working with a single probability distribution for flood depths, which we computed by maximum likelihood.

These values are not precise. What happens if we consider the lognormal distribution with slightly different, but still plausible, parameters?

What about the depth-damage parameters \(x_0\) and \(k\)?

Parameter uncertainty: flood distribution

Flood distribution ➡️ damages

3×3 DataFrame

Row	params	MLE	alt_dist
	String	Float64	Float64
1	Average Annual Loss	0.00235	0.00975
2	Q99 Annual Loss	0.07008	0.16531
3	Probability of Loss	0.03486	0.10076

Parameter uncertainty: depth-damage curve

Depth-damage curve ➡️ damages

3×5 DataFrame

Row	params	MLE	alt_dist	alt_curve	alt_both
	String	Float64	Float64	Float64	Float64
1	Average Annual Loss	0.00235	0.00975	0.00407	0.01529
2	Q99 Annual Loss	0.07008	0.16531	0.12715	0.24093
3	Probability of Loss	0.03486	0.10076	0.03308	0.09746

Wrapup

Today

Motivation
Flood depths
Functions of random variables
Parameter uncertainty
Wrapup

Reflection

Common problem: \(\mathbb{E} \left[ f(x) \right]\) where \(x \sim p(x)\)
Solution: \(\mathbb{E} \left[ f(x) \right] = \int_x p(x) f(x) \, dx\)
Monte Carlo approach:
1. Sample \(x^1, x^2, \ldots, x^S \sim p(x)\)
2. Compute \(\frac{1}{S} \sum_{s=1}^S f(x^s)\)
Uncertainties in our model parameters propagate to uncertainties in the things we care about.

Up next

Tomrorow 9/12 at 10AM: Office hours (Ryon 215 or Zoom?)
Wednesday: intro to Bayesian inference
Friday: Bayes lab

References

Betancourt, M. (2018, October). Probability Theory (For Scientists and Engineers). Retrieved from https://betanalpha.github.io/assets/case_studies/probability_theory.html#7_conclusion

Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014). Bayesian Data Analysis (3rd ed.). Chapman & Hall/CRC Boca Raton, FL, USA.