Choosing α: the L-Curve

Regularization is only as good as its parameter. Too small an \alpha and noise survives; too large and the solution is over-smoothed into mush. How do we choose well, without knowing the true answer? Three classic strategies:

The L-curve. Plot the model size \|m_\alpha\| against the data misfit \|Gm_\alpha - d\| on log–log axes, as \alpha sweeps. The graph is L-shaped; the best \alpha sits at the corner.
The discrepancy principle. If you know the noise level, pick the \alpha that makes the misfit equal to that noise — fit the data exactly as well as the noise allows, no better.
Generalized cross-validation (GCV). Choose the \alpha that best predicts left-out data, needing no noise estimate.

Why the corner is the sweet spot

The L-curve has two arms. The steep vertical arm (small \alpha) is the under-regularized regime: the misfit barely improves while the model norm rockets up — you are fitting noise. The flat horizontal arm (large \alpha) is over-regularized: the model is tiny but the misfit grows as real signal is thrown away. The corner is the balance point — the largest drop in misfit for the smallest growth in model size.

Find the corner

The curve is the L-curve for a noisy test problem (axes are log model-norm vs log misfit). Slide \alpha and the marker travels along it: down the vertical arm for small \alpha (noise amplified), along the horizontal arm for large \alpha (over-smoothed). Park it at the corner — that is the \alpha the L-curve criterion picks.

L-curve: the corner of log \|m_\alpha\| vs log \|Gm_\alpha - d\|.
Discrepancy principle: make the misfit match the known noise level.
GCV: minimise predicted error on left-out data — no noise estimate needed.