Parametric uncertainty and Monte Carlo


Lecture

2023-09-11

Motivation

Today

  1. Motivation

  2. Flood depths

  3. Functions of random variables

  4. Parameter uncertainty

  5. Wrapup

Recall: Bayesian Decision Theory

Recall: \[ \mathbb{E}\left[L(a, \theta) \right] = \int_\theta L(a, \theta) p(\theta) d\theta \] Where \(\theta\) is a vector of parameters, \(a\) is some action or decision, and \(L\) is the loss function.

Note

We previously called \(\theta\) a “state of the world” and \(L\) a “reward function”.

Problem statement

You have been comissioned by a client to assess their exposure to flooding. Specifically, they want to know the probability distribution of annual flood damages at their property if they do not elevate or floodproof their building.

  1. \(p(h)\): probability distribution of annual maximum flood heights at their property
  2. \(d(h)\): flood damages as a deterministic function of flood height
  3. \(p(d) = \int_h d(h) p(h) \, dh\)

With this information, they can compute metrics like the expected annual damage, the 99th percentile annual damage, and the probability of any flood occurring that will help them make a decision.

Flood depths

Today

  1. Motivation

  2. Flood depths

  3. Functions of random variables

  4. Parameter uncertainty

  5. Wrapup

Data

We fold this long code block to save space.

3×2 DataFrame
Row year lsl
Int64 Quantity…
1 1928 5.05343 ft
2 1929 3.90728 ft
3 1930 4.60245 ft
  1. Recall the *= syntax: x *= 2 is equivalent ot x = x * 2. We use .*= to work element-wise on the vector.
  2. This adds the points to the plot
  3. normalize=:pdf normalizes the histogram so that the area under the curve is 1.
  4. The \(y\) axis of the histogram matches that of the line plot, so we remove any y ticks from the histogram.
  5. We can define very flexible layouts for combining multiple plots. See the docs.
  6. suptitle adds a title to the entire figure.

This data comes from the NOAA Tides and Currents database, specifically a gauge at Sewell’s Point, VA, with sea level rise removed.

Flood depths model

We want a probability distribution for flood depths \(p(h)\). We can work with the log of the flood depths and treat them as normally distributed: \[ \log h_i \sim \mathcal{N}(\mu, \sigma^2) \]

We call this a lognormal distribution: \[ h_i \sim \text{LN}(\mu, \sigma) \]

Distribution and histogram

  1. normalizes the histogram so that the area under the curve is 1.
  2. The fit_mle function requires a vector of numbers, without units. We use ustrip to convert the units to a number, using feet (u"ft") as the reference unit.
  3. This accesses the :lsl_ft column we created.

Note

We’ll spend a whole module on extreme value distributions. They should do a better job of modeling annual maxima but require some subtelty.

Return periods and levels

Flood probabilities are often plotted as return periods. This is just a visualization the CDF on a log (or log-log) scale.

  1. This is a trick to get the ticks on the axes right when we plot on a log scale.

Note

Work through this code and make sure you understand it

Plot Position

It is common to add the data points as dots on the return period curve. This begs the question: what return period is assigned to each point? This is a subjective choice, but a common one is the Weibull plotting position:

  1. We are again using the vector that we converted to a scalar, in feet.

🤔

How well does this fit the data?

Functions of random variables

Today

  1. Motivation

  2. Flood depths

  3. Functions of random variables

  4. Parameter uncertainty

  5. Wrapup

Depth-damage model

A bounded logistic function provides a plausible depth-damage model for now: \[ d(h) = \mathbb{I}\left[x > 0 \right] \frac{L}{1 + \exp(-k(x - x_0))} \] where \(d\) is damage as a percent of total value, \(h\) is water depth, \(\mathbb{I}\) is the indicator function, \(L\) is the maximum loss, \(k\) is the slope, and \(x_0\) is the inflection point. We fix \(L=1\) (known upper bound: 100% damage) so

Plotting fit

We will use \(x_0 = 4\) and \(k = 0.75\).

  1. This is another syntax for defining a function. It is equivalent to f(x) = blogistic(x, x0, k). It’s similar to lambda functions in python

Analytic approach

Plugging in our bounded logistic model for \(d(h)\) and our lognormal model for \(h\): \[ \begin{align} p(d) &= \int_h d(h) p(h) \, dh \\ &= \int_{-\infty}^\infty \mathbb{I}[h > 0] \text{logistic}(h) \mathcal{N}(h | \mu, \sigma^2) \, dh\\ &= \int_0^\infty \text{logistic}(h) \mathcal{N}(\mu, \sigma^2) \, dh\\ &= \int_0^\infty \frac{1}{1 + \exp(-k * (x - x0))} \frac {1}{\sigma {\sqrt {2\pi }}} \exp \left\{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2} \right\} \, dh \end{align} \]

Limitations: analytic appraoch

We might be able to solve this analytically (Wolfram Alpha can’t…). But…

  1. Numerous simplifying assumptions and approximations.
  2. What if we want to use a different distribution?
  3. A different damage model?

What is Monte Carlo?

Monte Carlo methods are a set of computational techniques for:

  1. Generating samples from a target distribution
  2. approximating the expectations of some random quantities under this distribution.

Monte Carlo: Theory

We want to approximate the quantity \[ \int_0^\infty \text{logistic}(h) \mathcal{N}(\mu, \sigma^2) \, dh \]::::: {.incremental} :::: {.columns} ::: {.column width=“50%”} A deterministic strategy:

  1. sample \(h^s = 0, \Delta h, 2\Delta h, \ldots, (S-1)\Delta h\)
  2. compute \(\text{logistic}(h^s) \mathcal{N}(h^s | \mu, \sigma^2)\) at each point and sum
  3. drawbacks: we have to go to \(\infty\) and select \(\Delta h\).

::: ::: {.column width=“50%”} A sampling strategy

  1. sample \(h^1, h^2, \ldots, h^S \sim p(h)\) – which we can do because we have a model for \(p(h)\)
  2. for each value: compute \(\mathbb{I}(h^s > 0) \text{logistic}(h^s)\) and take the average
  3. this converges to the correct expectation!

::: :::: :::::

More formally

If \(\theta^s \sim p(\theta)\), then \[ \mathbb{E}\left[ f(\theta) \right] = \int_\theta f(\theta) p(\theta) d\theta \approx \frac{1}{S} \sum_{s=1}^S f(\theta^s) \]

References

See chapter 10 of Gelman et al. (2014) for more details or section 5 of Betancourt (2018) for a more precise mathematical treatment.

Monte Carlo: Implementation

A deceptively simple idea:

Monte Carlo Expectations

Given these samples, we can compute expectations of any function of the samples. For example, we can compute the mean damage, the 99th percentile of damage, and the probability of any damage occurring

3-element Vector{Float64}:
 0.00235
 0.07008
 0.03486

Parameter uncertainty

Today

  1. Motivation

  2. Flood depths

  3. Functions of random variables

  4. Parameter uncertainty

  5. Wrapup

Overview

We have been working with a single probability distribution for flood depths, which we computed by maximum likelihood.

These values are not precise. What happens if we consider the lognormal distribution with slightly different, but still plausible, parameters?

What about the depth-damage parameters \(x_0\) and \(k\)?

Parameter uncertainty: flood distribution

Flood distribution ➡️ damages

3×3 DataFrame
Row params MLE alt_dist
String Float64 Float64
1 Average Annual Loss 0.00235 0.00975
2 Q99 Annual Loss 0.07008 0.16531
3 Probability of Loss 0.03486 0.10076

Parameter uncertainty: depth-damage curve

Depth-damage curve ➡️ damages

3×5 DataFrame
Row params MLE alt_dist alt_curve alt_both
String Float64 Float64 Float64 Float64
1 Average Annual Loss 0.00235 0.00975 0.00407 0.01529
2 Q99 Annual Loss 0.07008 0.16531 0.12715 0.24093
3 Probability of Loss 0.03486 0.10076 0.03308 0.09746

Wrapup

Today

  1. Motivation

  2. Flood depths

  3. Functions of random variables

  4. Parameter uncertainty

  5. Wrapup

Reflection

  1. Common problem: \(\mathbb{E} \left[ f(x) \right]\) where \(x \sim p(x)\)
  2. Solution: \(\mathbb{E} \left[ f(x) \right] = \int_x p(x) f(x) \, dx\)
  3. Monte Carlo approach:
    1. Sample \(x^1, x^2, \ldots, x^S \sim p(x)\)
    2. Compute \(\frac{1}{S} \sum_{s=1}^S f(x^s)\)
  4. Uncertainties in our model parameters propagate to uncertainties in the things we care about.

Up next

  • Tomrorow 9/12 at 10AM: Office hours (Ryon 215 or Zoom?)
  • Wednesday: intro to Bayesian inference
  • Friday: Bayes lab

References

Betancourt, M. (2018, October). Probability Theory (For Scientists and Engineers). Retrieved from https://betanalpha.github.io/assets/case_studies/probability_theory.html#7_conclusion
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014). Bayesian Data Analysis (3rd ed.). Chapman & Hall/CRC Boca Raton, FL, USA.