Marginal, conditional, and joint distributions


Lecture

2023-08-30

PDFs and CDFs

Today

  1. PDFs and CDFs

  2. Joint, marginal, and conditional distributions

  3. Example: linear regression

  4. Example: negative binomial as a mixture

  5. Wrapup

PDF and CDF

If \(F_X\) is the cumulative distribution function (CDF) of \(X\) and \(f_X\) is the probability density function (PDF) of \(X\), then: \[ F_X ( x ) = \int_{-\infty}^x f_X(u) \, du, \] and (if \(f_X\) is continuous at \(x\) which it typically will be) \[ f_{X}(x)={\frac {d}{dx}}F_{X}(x). \] A useful property is \[ \Pr[a\leq X\leq b]=\int _{a}^{b}f_{X}(x)\,dx \]

Important

We can only talk about the probability that \(y\) is in some interval \([a, b]\), which is given by the integral of the PDF over that interval. The probability that \(y\) takes on the value \(y^*\), written \(p(y=y^*)\), is zero.

PDF example

Simple example to illustrate that \[ F_X(2) = \int_{-\infty}^2 f_X(u) \, du \]

We will use a standard Normal distribution as an example

(0.9771562639858903, 0.9772498680518208)
  1. Mean 0 and standard deviation 1 by default
  2. pdf(d, x) tells us the probability density function of distribution d evaluated at x
  3. quad_trap is a trapezoidal approximation of the integral with arguments: function, lower bound, upper bound, and number of points

PMFs

  • Discrete distributions (like the Poisson) have a probability mass function (PMF) instead of a PDF
  • For PMFs, \(p(y=y^*)\) is the probability that \(y\) takes on the value \(y^*\), and is defined
  1. In the Distributions package, both PDFs and PMFs are called pdf

Joint, marginal, and conditional distributions

Today

  1. PDFs and CDFs

  2. Joint, marginal, and conditional distributions

  3. Example: linear regression

  4. Example: negative binomial as a mixture

  5. Wrapup

Bayes’ Rule

\[ p(\theta, y) = p(\theta) p(y | \theta) \] and thus \[ p(\theta | y) = \frac{p(\theta, y)}{p(y)} = \frac{p(\theta) p(y | \theta)}{p(y)} \] generally: \[ p(\theta | y) \propto p(\theta) p(y | \theta) \]

Marginal probability

Probability of event \(A\): \(\Pr(A)\)

We will write the marginal probability density function as \[ p(\theta) \quad \text{or} \quad p(y) \]

Joint probability

Probability of events \(A\) and \(B\): \(\Pr(A \& B)\)

\[ p(\theta, y) \]

Conditional probability

Probability of event \(A\) given event \(B\): \(\Pr(A | B)\)

\[ p(\theta | y) \quad \text{or} \quad p(y | \theta) \]

Example: two-dice wager

A gambler presents you with an even-money wager. You will roll two dice, and if the highest number showing is one, two, three or four, then you win. If the highest number on either die is five or six, then she wins. Should you take the bet?

Example: linear regression

Today

  1. PDFs and CDFs

  2. Joint, marginal, and conditional distributions

  3. Example: linear regression

  4. Example: negative binomial as a mixture

  5. Wrapup

Overview

Standard linear regression model, let’s assume \(x \in \mathbb{R}\) for simplicity (1 predictor): \[ y_i = ax_i + b + \epsilon_i \] where \(\epsilon_i \sim N(0, \sigma^2)\).

Conditional distribution of \(y_i\)

The conditional probability density of \(y_i\) given \(x_i\) is \[ p(y_i | x_i, a, b, \sigma) = N(ax_i + b, \sigma^2) \] which is a shorthand for writing out the full equation for the Normal PDF. We can (and often will) write this as \[ y_i \sim \mathcal{N}(ax_i + b, \sigma^2) \] Finally, we will sometimes write \(p(y_i | x_i)\) as a shorthand for \(p(y_i | x_i, a, b, \sigma)\). While fine in many circumstances, we should take care to make sure we are extremely clear about what parameters we are conditioning on.

Marginal distribution of \(y_i\)

The marginal probability density of \(y_i\) is \[ p(y_i | a, b, \sigma) = \int p(y_i | x_i, a, b, \sigma) p(x_i) \, dx_i \] where \(p(x_i)\) is the probability density of \(x_i\).

Joint distribution of \(y_i\) and \(x_i\)

The joint probability density of \(y_i\) and \(x_i\) is \[ p(y_i, x_i | a, b, \sigma) = p(y_i | x_i, a, b, \sigma) p(x_i) \] where \(p(x_i)\) is the probability density of \(x_i\).

Simulation

If \(x=2\), we can simulate from the conditional distribution of \(y\):

If \(x \sim N(0, 1)\), then we can simulate from the joint distribution of \(x\) and \(y\):

  1. A list comprehension here is less elegant than writing rand.(Normal.(m .* x .+ b, σ)) but it is easy to read. The results are the same.

Finally, assuming the same distribution, we can simulate from the marginal distribution of \(y\):

Example: negative binomial as a mixture

Today

  1. PDFs and CDFs

  2. Joint, marginal, and conditional distributions

  3. Example: linear regression

  4. Example: negative binomial as a mixture

  5. Wrapup

Overview

The Negative Binomial distribution (see last lecture) can be interpreted as a Gamma-Poisson mixture:

\[ \begin{align} y &\sim \textrm{Poisson}(\lambda) \\ \lambda &\sim \textrm{Gamma}\left(r, \frac{p}{1-p} \right) \end{align} \]

Mathematical derivation

We can show mathematically that if \(y ~ \textrm{Negative Binomial}(r, p)\), that is equivalent to the mixture model \(y ~ \textrm{Poisson}(\lambda)\) and \(\lambda ~ \textrm{Gamma}(r, p / (1 - p))\). \[ \begin{align} & \int_0^{\infty} f_{\text {Poisson }(\lambda)}(y) \times f_{\operatorname{Gamma}\left(r, \frac{p}{1-p}\right)}(\lambda) \mathrm{d} \lambda \\ & = \int_0^{\infty} \frac{\lambda^y}{y !} e^{-\lambda} \times \frac{1}{\Gamma(r)}\left(\frac{p}{1-p} \lambda\right)^{r-1} e^{-\frac{p}{1-p} \lambda}\left(\frac{p}{1-p} \mathrm{~d} \lambda\right) \\ \ldots \\ &= f_{\text {Negative Binomial }(r, p)}(y) \end{align} \] For all the steps see Wikipedia.

Simulation example

We can see this with simulation. First we define a function to simulate from the Gamma-Poisson mixture:

gamma_poisson (generic function with 1 method)

Then we can simulate from the mixture and compare to the Negative Binomial distribution:

So what?

I don’t need you to know all the details of this particular mixture model. What I do want you to understand is:

  1. We can model data using combinations of simpler distributions
  2. We can use simple simulation approaches to approximate more complex relationships
    1. For example, if we wanted to know \(\Pr(y > 10)\) when \(y \sim \text{Negative Binomial}(r, p)\) but we didn’t have a Negative Binomial distribution in our software package we could estimate our quantity of interest
    2. This isn’t very interesting for this model (there is an analytic solution!) but lots of models we might want to write down don’t have analytic solutions

Wrapup

Today

  1. PDFs and CDFs

  2. Joint, marginal, and conditional distributions

  3. Example: linear regression

  4. Example: negative binomial as a mixture

  5. Wrapup

Key ideas

  • Conditional probability
  • Joint probability
  • Marginal probability
  • Bayes’ Rule
  • Likelihood
  • Posterior
  • Simulation methods