The Covariance Matrix
Variance measures the spread of one uncertain quantity. When several vary together, we need to
know not just how much each spreads but how they co-vary. The
covariance of two variables is the average product of their deviations:
\operatorname{cov}(X, Y) = \mathbb{E}\big[(X - \mu_X)(Y - \mu_Y)\big].
Positive covariance: they tend to rise together. Negative: one rises as the other falls. Zero:
no linear relationship. Dividing by the two standard deviations gives the
correlation \rho = \operatorname{cov}(X,Y)/(\sigma_X\sigma_Y) \in [-1, 1].
Packing it into a matrix
For a vector of variables \mathbf{x} = (X_1, \dots, X_n), all the
pairwise covariances assemble into the covariance matrix
\Sigma = \mathbb{E}\big[(\mathbf{x} - \boldsymbol\mu)(\mathbf{x} - \boldsymbol\mu)^{\mathsf T}\big], \qquad \Sigma_{ij} = \operatorname{cov}(X_i, X_j).
Its diagonal holds the variances \Sigma_{ii} = \sigma_i^2; the
off-diagonal entries hold the covariances. Two structural facts matter throughout inverse
theory: \Sigma is symmetric (\Sigma_{ij} = \Sigma_{ji})
and positive semidefinite (no direction has negative variance). Geometrically,
\Sigma describes an ellipse of uncertainty: its
eigenvectors are the principal axes and its eigenvalues are the variances along them.
The ellipse of uncertainty
The ellipse is the one-standard-deviation contour of a 2-D distribution with covariance
\Sigma = \begin{psmallmatrix}\sigma_x^2 & \rho\sigma_x\sigma_y\\ \rho\sigma_x\sigma_y & \sigma_y^2\end{psmallmatrix}.
Stretch \sigma_x, \sigma_y to change the spread along each axis; turn
up the correlation \rho and the ellipse tilts — the variables become
entangled, and knowing one tells you about the other.
- \Sigma_{ij} = \operatorname{cov}(X_i, X_j); the diagonal is the variances.
- \Sigma is symmetric and positive semidefinite.
- Its eigenvectors/eigenvalues are the axes/variances of the uncertainty ellipse.