Lecture
2023-10-11
Today
Overview
Practical Tips
More
Wrapup
The loading vector \(\phi_1\) defines a direction in feature space along which the data vary the most
Principal components analysis (PCA) scores and vectors for climate, soil, topography, and land cover variables. Sites are colored by estimated baseflow yield, and the percent of variance explained by each axis is indicated in the axis titles.
Consider representing the data \(X = (X_1, X_2, \ldots, X_p)\) as a linear model \[ f(Z) = \mu + \phi_q Z \] where:
Minimize the reconstruction error: \[ \min \sum_{i=1}^n \| X_i - \phi_q Z \|_2^2 \] assuming \(\mu = 0\) (centered data – more later)
We can write the solution as a singular value decomposition (SVD) of the empirical covariance matrix. This reference explains things quite straightforwardly.
Since we are using the covariance matrix, we are implicitly assuming that variance is a good way to measure variability
When might this be a poor assumption?
Each principal component loading vector is unique, up to a sign flip.
Because we often use space-time data in climate science, we can interpret the principal components as spatial patterns and time series:
Today
Overview
Practical Tips
More
Wrapup
It is common in climate science to deconstruct a time series into a mean and anomalies: \[ x(t) = \overline{x}(t) + x'(t) \] where \(\overline{x}(t)\) is the climatology and \(x'(t)\) is the anomaly. Typically, this is defined at each location separately.
How to define the climatology? Common approaches include:
Today
Overview
Practical Tips
More
Wrapup
Today
Overview
Practical Tips
More
Wrapup
PCA is a versatile tool for dimensionality reduction, data visualization, and compression. By understanding its underlying principles and practical applications, we can effectively analyze and interpret complex datasets.