Smoothing and General Tikhonov
Plain
Tikhonov
penalises \|m\|^2 — it prefers a small model. But "small" is
rarely what we actually want; usually we want smooth. General Tikhonov
swaps the penalty for the norm of a linear operator L applied to the
model:
\hat m = \arg\min_m \Big( \|Gm - d\|^2 + \alpha^2\,\|L m\|^2 \Big).
Choosing L is choosing what "simple" means — it encodes the prior
belief about the solution.
- L = I: penalise size → a small solution (standard Tikhonov).
- L = D_1 (first difference, a discrete derivative): penalise slope → a flat solution.
- L = D_2 (second difference): penalise curvature → a smooth solution.
The penalty is a prior
The solution is again a modified normal equation,
(G^{\mathsf T}G + \alpha^2 L^{\mathsf T}L)\,\hat m = G^{\mathsf T}d.
The derivative penalty makes oscillatory, jagged solutions expensive, so the minimiser comes out
smooth — exactly suppressing the high-frequency noise that ill-posedness amplifies. This is the
same instinct as a smoothing prior, and it foreshadows the
Bayesian reading:
the penalty operator L is the inverse square root of a prior covariance
— a precise statement of "I expect the answer to be smooth".
Smooth vs jagged
Two candidate solutions that fit the data about equally well: a smooth one and a jagged one
carrying high-frequency wiggles. A first- or second-derivative penalty
\|Lm\|^2 is small for the smooth curve and large for the jagged one, so
general Tikhonov picks the smooth candidate — the wiggles are exactly the amplified noise we want
gone.
- Minimise \|Gm-d\|^2 + \alpha^2\|Lm\|^2; solve (G^{\mathsf T}G + \alpha^2 L^{\mathsf T}L)m = G^{\mathsf T}d.
- L = I → small; L = D_1 → flat; L = D_2 → smooth.
- The choice of L encodes a prior about the solution.