The Bayesian Formulation
The Bayesian view reframes the whole enterprise. The answer to an inverse problem is not a single
model but a probability distribution over models — the
posterior
p(m \mid d), which captures both the best estimate and how sure we are
of it. Bayes' theorem assembles it:
p(m \mid d) \;\propto\; \underbrace{p(d \mid m)}_{\text{likelihood}}\; \underbrace{p(m)}_{\text{prior}}.
The likelihood carries the physics and the noise; the prior
carries everything we knew before the experiment (smoothness, positivity, a rough magnitude).
Gaussian everything
Take Gaussian noise e \sim N(0, C_D) and a Gaussian prior
m \sim N(m_{\text{prior}}, C_M). Both factors are Gaussian, so the
posterior is Gaussian too, and its peak — the
MAP estimate — minimises the
sum of two Mahalanobis terms:
\hat m = \arg\min_m\Big[(d - Gm)^{\mathsf T}C_D^{-1}(d - Gm) + (m - m_{\text{prior}})^{\mathsf T}C_M^{-1}(m - m_{\text{prior}})\Big],
\hat m = m_{\text{prior}} + \big(G^{\mathsf T}C_D^{-1}G + C_M^{-1}\big)^{-1}G^{\mathsf T}C_D^{-1}\,(d - G m_{\text{prior}}).
The prior term C_M^{-1} is added to G^{\mathsf T}C_D^{-1}G
— and that addition is exactly what makes the matrix invertible. The prior is the
regularizer, a point the next page makes precise.
Prior and data combine
The one-parameter picture: a prior belief about a model component, the
likelihood the data provides, and the posterior that fuses them.
Tighten the prior (small C_M) to lean on prior knowledge; tighten the
data (small C_D) to lean on the measurement. Either way the posterior is
narrower than both — combining information always reduces uncertainty.
- The solution is the posterior p(m\mid d) \propto p(d\mid m)\,p(m).
- Gaussian likelihood + Gaussian prior ⇒ Gaussian posterior; its mean/mode is the MAP estimate.
- The prior precision C_M^{-1} added to G^{\mathsf T}C_D^{-1}G stabilises the inversion — the prior is the regularizer.