Priors Are Regularization

Here is the punchline that unites the two halves of this course. Take the Bayesian MAP objective with a zero-mean Gaussian prior m \sim N(0, C_M) and uniform Gaussian noise C_D = \sigma^2 I. The negative log-posterior is, up to constants,

-\log p(m\mid d) \;=\; \frac{1}{\sigma^2}\|d - Gm\|^2 \;+\; m^{\mathsf T} C_M^{-1} m.

That is exactly a general Tikhonov functional. The data-fit term is the likelihood; the penalty term is the prior. Maximising the posterior is regularized least squares.

The dictionary

Term by term, the Bayesian and deterministic pictures are the same object in two languages:

A prior N(0, \tau^2 I) ⇒ the penalty \|m\|^2/\tau^2 — standard Tikhonov with \alpha^2 = \sigma^2/\tau^2.
A smoothness prior (covariance favouring smooth fields) ⇒ a derivative penalty \|Lm\|^2 — general Tikhonov, with L^{\mathsf T}L = C_M^{-1}.
A confident prior (small \tau) ⇒ a large \alpha — heavy regularization.

So the regularization parameter was never arbitrary: \alpha is the ratio of how noisy you think the data is to how large you expect the model to be. And the penalty was never a mere mathematical trick — it is a precise statement of prior belief. The negative log-prior is the penalty, and the parabola m^2/(2\tau^2) is the L2 penalty drawn out.

The penalty drawn from the prior

The bell curve is a zero-mean Gaussian prior on a model component; the upward parabola is its negative logarithm — the penalty m^2/(2\tau^2) that regularization adds. Narrow the prior (smaller \tau, more confident) and the parabola steepens — a stronger pull toward zero, i.e. a larger \alpha.

Negative log-posterior = data misfit + (negative log-prior) = a Tikhonov functional.
Gaussian prior N(0,\tau^2 I) ⇒ L2 penalty with \alpha^2 = \sigma^2/\tau^2; smoothness prior ⇒ derivative penalty.
\alpha is the noise-to-prior ratio: regularization strength is a statement of belief.