Marginal, conditional, and joint distributions


Lecture

2023-08-30

PDFs and CDFs

Today

  1. PDFs and CDFs

  2. Joint, marginal, and conditional distributions

  3. Example: linear regression

  4. Example: negative binomial as a mixture

  5. Wrapup

PDF and CDF

If \(F_X\) is the cumulative distribution function (CDF) of \(X\) and \(f_X\) is the probability density function (PDF) of \(X\), then: \[ F_X ( x ) = \int_{-\infty}^x f_X(u) \, du, \] and (if \(f_X\) is continuous at \(x\) which it typically will be) \[ f_{X}(x)={\frac {d}{dx}}F_{X}(x). \] A useful property is \[ \Pr[a\leq X\leq b]=\int _{a}^{b}f_{X}(x)\,dx \]

Important

We can only talk about the probability that \(y\) is in some interval \([a, b]\), which is given by the integral of the PDF over that interval. The probability that \(y\) takes on the value \(y^*\), written \(p(y=y^*)\), is zero.

PDF example

Simple example to illustrate that \[ F_X(2) = \int_{-\infty}^2 f_X(u) \, du \]

We will use a standard Normal distribution as an example

(0.9771562639858903, 0.9772498680518208)
  1. Mean 0 and standard deviation 1 by default
  2. pdf(d, x) tells us the probability density function of distribution d evaluated at x
  3. quad_trap is a trapezoidal approximation of the integral with arguments: function, lower bound, upper bound, and number of points

PMFs

  • Discrete distributions (like the Poisson) have a probability mass function (PMF) instead of a PDF
  • For PMFs, \(p(y=y^*)\) is the probability that \(y\) takes on the value \(y^*\), and is defined
  1. In the Distributions package, both PDFs and PMFs are called pdf

Joint, marginal, and conditional distributions

Today

  1. PDFs and CDFs

  2. Joint, marginal, and conditional distributions

  3. Example: linear regression

  4. Example: negative binomial as a mixture

  5. Wrapup

Bayes’ Rule

\[ p(\theta, y) = p(\theta) p(y | \theta) \] and thus \[ p(\theta | y) = \frac{p(\theta, y)}{p(y)} = \frac{p(\theta) p(y | \theta)}{p(y)} \] generally: \[ p(\theta | y) \propto p(\theta) p(y | \theta) \]

Marginal probability

Probability of event \(A\): \(\Pr(A)\)

We will write the marginal probability density function as \[ p(\theta) \quad \text{or} \quad p(y) \]

Joint probability

Probability of events \(A\) and \(B\): \(\Pr(A \& B)\)

\[ p(\theta, y) \]

Conditional probability

Probability of event \(A\) given event \(B\): \(\Pr(A | B)\)

\[ p(\theta | y) \quad \text{or} \quad p(y | \theta) \]

Example: two-dice wager

A gambler presents you with an even-money wager. You will roll two dice, and if the highest number showing is one, two, three or four, then you win. If the highest number on either die is five or six, then she wins. Should you take the bet?

Example: linear regression

Today

  1. PDFs and CDFs

  2. Joint, marginal, and conditional distributions

  3. Example: linear regression

  4. Example: negative binomial as a mixture

  5. Wrapup

Overview

Standard linear regression model, let’s assume \(x \in \mathbb{R}\) for simplicity (1 predictor): \[ y_i = ax_i + b + \epsilon_i \] where \(\epsilon_i \sim N(0, \sigma^2)\).

Conditional distribution of \(y_i\)

The conditional probability density of \(y_i\) given \(x_i\) is \[ p(y_i | x_i, a, b, \sigma) = N(ax_i + b, \sigma^2) \] which is a shorthand for writing out the full equation for the Normal PDF. We can (and often will) write this as \[ y_i \sim \mathcal{N}(ax_i + b, \sigma^2) \] Finally, we will sometimes write \(p(y_i | x_i)\) as a shorthand for \(p(y_i | x_i, a, b, \sigma)\). While fine in many circumstances, we should take care to make sure we are extremely clear about what parameters we are conditioning on.

Marginal distribution of \(y_i\)

The marginal probability density of \(y_i\) is \[ p(y_i | a, b, \sigma) = \int p(y_i | x_i, a, b, \sigma) p(x_i) \, dx_i \] where \(p(x_i)\) is the probability density of \(x_i\).

Joint distribution of \(y_i\) and \(x_i\)

The joint probability density of \(y_i\) and \(x_i\) is \[ p(y_i, x_i | a, b, \sigma) = p(y_i | x_i, a, b, \sigma) p(x_i) \] where \(p(x_i)\) is the probability density of \(x_i\).

Simulation

If \(x=2\), we can simulate from the conditional distribution of \(y\):

If \(x \sim N(0, 1)\), then we can simulate from the joint distribution of \(x\) and \(y\):