Row | year | lsl |
---|---|---|

Int64 | Quantity… | |

1 | 1928 | 5.05343 ft |

2 | 1929 | 3.90728 ft |

3 | 1930 | 4.60245 ft |

Lecture

2023-09-11

Today

Motivation

Flood depths

Functions of random variables

Parameter uncertainty

Wrapup

Recall: \[ \mathbb{E}\left[L(a, \theta) \right] = \int_\theta L(a, \theta) p(\theta) d\theta \] Where \(\theta\) is a vector of parameters, \(a\) is some action or decision, and \(L\) is the loss function.

**Note**

We previously called \(\theta\) a “state of the world” and \(L\) a “reward function”.

You have been comissioned by a client to assess their exposure to flooding. Specifically, they want to know the probability distribution of annual flood damages at their property if they do not elevate or floodproof their building.

- \(p(h)\): probability distribution of annual maximum flood heights at their property
- \(d(h)\): flood damages as a deterministic function of flood height
- \(p(d) = \int_h d(h) p(h) \, dh\)

With this information, they can compute metrics like the expected annual damage, the 99th percentile annual damage, and the probability of any flood occurring that will help them make a decision.

Today

Motivation

Flood depths

Functions of random variables

Parameter uncertainty

Wrapup

We fold this long code block to save space.

3×2 DataFrame

Row | year | lsl |
---|---|---|

Int64 | Quantity… | |

1 | 1928 | 5.05343 ft |

2 | 1929 | 3.90728 ft |

3 | 1930 | 4.60245 ft |

- Recall the
`*=`

syntax:`x *= 2`

is equivalent ot`x = x * 2`

. We use`.*=`

to work element-wise on the vector. - This adds the points to the plot
`normalize=:pdf`

normalizes the histogram so that the area under the curve is 1.- The \(y\) axis of the histogram matches that of the line plot, so we remove any y ticks from the histogram.
- We can define very flexible layouts for combining multiple plots. See the docs.
`suptitle`

adds a title to the entire figure.

This data comes from the NOAA Tides and Currents database, specifically a gauge at Sewell’s Point, VA, with sea level rise removed.

We want a probability distribution for flood depths \(p(h)\). We can work with the log of the flood depths and treat them as normally distributed: \[ \log h_i \sim \mathcal{N}(\mu, \sigma^2) \]

We call this a lognormal distribution: \[ h_i \sim \text{LN}(\mu, \sigma) \]

- normalizes the histogram so that the area under the curve is 1.
- The
`fit_mle`

function requires a vector of numbers, without units. We use`ustrip`

to convert the units to a number, using feet (`u"ft"`

) as the reference unit. - This accesses the
`:lsl_ft`

column we created.

**Note**

We’ll spend a whole module on extreme value distributions. They should do a better job of modeling annual maxima but require some subtelty.

Flood probabilities are often plotted as return periods. This is just a visualization the CDF on a log (or log-log) scale.

- This is a trick to get the ticks on the axes right when we plot on a log scale.

**Note**

Work through this code and make sure you understand it

It is common to add the data points as dots on the return period curve. This begs the question: what return period is assigned to each point? This is a subjective choice, but a common one is the Weibull plotting position:

- We are again using the vector that we converted to a scalar, in feet.

**🤔**

How well does this fit the data?

Today

Motivation

Flood depths

Functions of random variables

Parameter uncertainty

Wrapup

A bounded logistic function provides a plausible *depth-damage model* for now: \[
d(h) = \mathbb{I}\left[x > 0 \right] \frac{L}{1 + \exp(-k(x - x_0))}
\] where \(d\) is damage as a percent of total value, \(h\) is water depth, \(\mathbb{I}\) is the indicator function, \(L\) is the maximum loss, \(k\) is the slope, and \(x_0\) is the inflection point. We fix \(L=1\) (known upper bound: 100% damage) so

We will use \(x_0 = 4\) and \(k = 0.75\).

- This is another syntax for defining a function. It is equivalent to
`f(x) = blogistic(x, x0, k)`

. It’s similar to lambda functions in python

Plugging in our bounded logistic model for \(d(h)\) and our lognormal model for \(h\): \[ \begin{align} p(d) &= \int_h d(h) p(h) \, dh \\ &= \int_{-\infty}^\infty \mathbb{I}[h > 0] \text{logistic}(h) \mathcal{N}(h | \mu, \sigma^2) \, dh\\ &= \int_0^\infty \text{logistic}(h) \mathcal{N}(\mu, \sigma^2) \, dh\\ &= \int_0^\infty \frac{1}{1 + \exp(-k * (x - x0))} \frac {1}{\sigma {\sqrt {2\pi }}} \exp \left\{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2} \right\} \, dh \end{align} \]

We might be able to solve this analytically (Wolfram Alpha can’t…). But…

- Numerous simplifying assumptions and approximations.
- What if we want to use a different distribution?
- A different damage model?

Monte Carlo methods are a set of computational techniques for:

- Generating samples from a target distribution
- approximating the expectations of some random quantities under this distribution.

We want to approximate the quantity \[ \int_0^\infty \text{logistic}(h) \mathcal{N}(\mu, \sigma^2) \, dh \]::::: {.incremental} :::: {.columns} ::: {.column width=“50%”} A deterministic strategy:

- sample \(h^s = 0, \Delta h, 2\Delta h, \ldots, (S-1)\Delta h\)
- compute \(\text{logistic}(h^s) \mathcal{N}(h^s | \mu, \sigma^2)\) at each point and sum
- drawbacks: we have to go to \(\infty\) and select \(\Delta h\).

::: ::: {.column width=“50%”} A sampling strategy

- sample \(h^1, h^2, \ldots, h^S \sim p(h)\) – which we can do because we have a model for \(p(h)\)
- for each value: compute \(\mathbb{I}(h^s > 0) \text{logistic}(h^s)\) and take the average
- this converges to the correct expectation!

::: :::: :::::

If \(\theta^s \sim p(\theta)\), then \[ \mathbb{E}\left[ f(\theta) \right] = \int_\theta f(\theta) p(\theta) d\theta \approx \frac{1}{S} \sum_{s=1}^S f(\theta^s) \]

A deceptively simple idea:

Given these samples, we can **compute expectations** of any function of the samples. For example, we can compute the mean damage, the 99th percentile of damage, and the probability of any damage occurring

```
3-element Vector{Float64}:
0.00235
0.07008
0.03486
```

Today

Motivation

Flood depths

Functions of random variables

Parameter uncertainty

Wrapup

We have been working with a single probability distribution for flood depths, which we computed by maximum likelihood.

These values are not precise. What happens if we consider the lognormal distribution with slightly different, but still plausible, parameters?

What about the depth-damage parameters \(x_0\) and \(k\)?

3×3 DataFrame

Row | params | MLE | alt_dist |
---|---|---|---|

String | Float64 | Float64 | |

1 | Average Annual Loss | 0.00235 | 0.00975 |

2 | Q99 Annual Loss | 0.07008 | 0.16531 |

3 | Probability of Loss | 0.03486 | 0.10076 |

3×5 DataFrame

Row | params | MLE | alt_dist | alt_curve | alt_both |
---|---|---|---|---|---|

String | Float64 | Float64 | Float64 | Float64 | |

1 | Average Annual Loss | 0.00235 | 0.00975 | 0.00407 | 0.01529 |

2 | Q99 Annual Loss | 0.07008 | 0.16531 | 0.12715 | 0.24093 |

3 | Probability of Loss | 0.03486 | 0.10076 | 0.03308 | 0.09746 |

Today

Motivation

Flood depths

Functions of random variables

Parameter uncertainty

Wrapup

- Common problem: \(\mathbb{E} \left[ f(x) \right]\) where \(x \sim p(x)\)
- Solution: \(\mathbb{E} \left[ f(x) \right] = \int_x p(x) f(x) \, dx\)
- Monte Carlo approach:
- Sample \(x^1, x^2, \ldots, x^S \sim p(x)\)
- Compute \(\frac{1}{S} \sum_{s=1}^S f(x^s)\)

**Uncertainties in our model parameters propagate to uncertainties in the things we care about.**

- Tomrorow 9/12 at 10AM: Office hours (Ryon 215 or Zoom?)
- Wednesday: intro to Bayesian inference
- Friday: Bayes lab

Betancourt, M. (2018, October). Probability Theory (For Scientists and Engineers). Retrieved from https://betanalpha.github.io/assets/case_studies/probability_theory.html#7_conclusion

Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014). *Bayesian Data Analysis* (3rd ed.). Chapman & Hall/CRC Boca Raton, FL, USA.