Smoothing and General Tikhonov

Plain Tikhonov penalises \|m\|^2 — it prefers a small model. But "small" is rarely what we actually want; usually we want smooth. General Tikhonov swaps the penalty for the norm of a linear operator L applied to the model:

\hat m = \arg\min_m \Big( \|Gm - d\|^2 + \alpha^2\,\|L m\|^2 \Big).

Choosing L is choosing what "simple" means — it encodes the prior belief about the solution.

L = I: penalise size → a small solution (standard Tikhonov).
L = D_1 (first difference, a discrete derivative): penalise slope → a flat solution.
L = D_2 (second difference): penalise curvature → a smooth solution.

The penalty is a prior

The solution is again a modified normal equation, (G^{\mathsf T}G + \alpha^2 L^{\mathsf T}L)\,\hat m = G^{\mathsf T}d. The derivative penalty makes oscillatory, jagged solutions expensive, so the minimiser comes out smooth — exactly suppressing the high-frequency noise that ill-posedness amplifies. This is the same instinct as a smoothing prior, and it foreshadows the Bayesian reading: the penalty operator L is the inverse square root of a prior covariance — a precise statement of "I expect the answer to be smooth".

Smooth vs jagged

Two candidate solutions that fit the data about equally well: a smooth one and a jagged one carrying high-frequency wiggles. A first- or second-derivative penalty \|Lm\|^2 is small for the smooth curve and large for the jagged one, so general Tikhonov picks the smooth candidate — the wiggles are exactly the amplified noise we want gone.

Minimise \|Gm-d\|^2 + \alpha^2\|Lm\|^2; solve (G^{\mathsf T}G + \alpha^2 L^{\mathsf T}L)m = G^{\mathsf T}d.
L = I → small; L = D_1 → flat; L = D_2 → smooth.
The choice of L encodes a prior about the solution.