The Covariance Matrix

Variance measures the spread of one uncertain quantity. When several vary together, we need to know not just how much each spreads but how they co-vary. The covariance of two variables is the average product of their deviations:

\operatorname{cov}(X, Y) = \mathbb{E}\big[(X - \mu_X)(Y - \mu_Y)\big].

Positive covariance: they tend to rise together. Negative: one rises as the other falls. Zero: no linear relationship. Dividing by the two standard deviations gives the correlation \rho = \operatorname{cov}(X,Y)/(\sigma_X\sigma_Y) \in [-1, 1].

Packing it into a matrix

For a vector of variables \mathbf{x} = (X_1, \dots, X_n), all the pairwise covariances assemble into the covariance matrix

\Sigma = \mathbb{E}\big[(\mathbf{x} - \boldsymbol\mu)(\mathbf{x} - \boldsymbol\mu)^{\mathsf T}\big], \qquad \Sigma_{ij} = \operatorname{cov}(X_i, X_j).

Its diagonal holds the variances \Sigma_{ii} = \sigma_i^2; the off-diagonal entries hold the covariances. Two structural facts matter throughout inverse theory: \Sigma is symmetric (\Sigma_{ij} = \Sigma_{ji}) and positive semidefinite (no direction has negative variance). Geometrically, \Sigma describes an ellipse of uncertainty: its eigenvectors are the principal axes and its eigenvalues are the variances along them.

The ellipse of uncertainty

The ellipse is the one-standard-deviation contour of a 2-D distribution with covariance \Sigma = \begin{psmallmatrix}\sigma_x^2 & \rho\sigma_x\sigma_y\\ \rho\sigma_x\sigma_y & \sigma_y^2\end{psmallmatrix}. Stretch \sigma_x, \sigma_y to change the spread along each axis; turn up the correlation \rho and the ellipse tilts — the variables become entangled, and knowing one tells you about the other.

\Sigma_{ij} = \operatorname{cov}(X_i, X_j); the diagonal is the variances.
\Sigma is symmetric and positive semidefinite.
Its eigenvectors/eigenvalues are the axes/variances of the uncertainty ellipse.