Priors Are Regularization

Here is the punchline that unites the two halves of this course. Take the Bayesian MAP objective with a zero-mean Gaussian prior m \sim N(0, C_M) and uniform Gaussian noise C_D = \sigma^2 I. The negative log-posterior is, up to constants,

-\log p(m\mid d) \;=\; \frac{1}{\sigma^2}\|d - Gm\|^2 \;+\; m^{\mathsf T} C_M^{-1} m.

That is exactly a general Tikhonov functional. The data-fit term is the likelihood; the penalty term is the prior. Maximising the posterior is regularized least squares.

The dictionary

Term by term, the Bayesian and deterministic pictures are the same object in two languages:

So the regularization parameter was never arbitrary: \alpha is the ratio of how noisy you think the data is to how large you expect the model to be. And the penalty was never a mere mathematical trick — it is a precise statement of prior belief. The negative log-prior is the penalty, and the parabola m^2/(2\tau^2) is the L2 penalty drawn out.

The penalty drawn from the prior

The bell curve is a zero-mean Gaussian prior on a model component; the upward parabola is its negative logarithm — the penalty m^2/(2\tau^2) that regularization adds. Narrow the prior (smaller \tau, more confident) and the parabola steepens — a stronger pull toward zero, i.e. a larger \alpha.