The Bayesian Formulation

The Bayesian view reframes the whole enterprise. The answer to an inverse problem is not a single model but a probability distribution over models — the posterior p(m \mid d), which captures both the best estimate and how sure we are of it. Bayes' theorem assembles it:

p(m \mid d) \;\propto\; \underbrace{p(d \mid m)}_{\text{likelihood}}\; \underbrace{p(m)}_{\text{prior}}.

The likelihood carries the physics and the noise; the prior carries everything we knew before the experiment (smoothness, positivity, a rough magnitude).

Gaussian everything

Take Gaussian noise e \sim N(0, C_D) and a Gaussian prior m \sim N(m_{\text{prior}}, C_M). Both factors are Gaussian, so the posterior is Gaussian too, and its peak — the MAP estimate — minimises the sum of two Mahalanobis terms:

\hat m = \arg\min_m\Big[(d - Gm)^{\mathsf T}C_D^{-1}(d - Gm) + (m - m_{\text{prior}})^{\mathsf T}C_M^{-1}(m - m_{\text{prior}})\Big], \hat m = m_{\text{prior}} + \big(G^{\mathsf T}C_D^{-1}G + C_M^{-1}\big)^{-1}G^{\mathsf T}C_D^{-1}\,(d - G m_{\text{prior}}).

The prior term C_M^{-1} is added to G^{\mathsf T}C_D^{-1}G — and that addition is exactly what makes the matrix invertible. The prior is the regularizer, a point the next page makes precise.

Prior and data combine

The one-parameter picture: a prior belief about a model component, the likelihood the data provides, and the posterior that fuses them. Tighten the prior (small C_M) to lean on prior knowledge; tighten the data (small C_D) to lean on the measurement. Either way the posterior is narrower than both — combining information always reduces uncertainty.