Choosing α: the L-Curve
Regularization
is only as good as its parameter. Too small an \alpha and noise
survives; too large and the solution is over-smoothed into mush. How do we choose well, without
knowing the true answer? Three classic strategies:
- The L-curve. Plot the model size \|m_\alpha\| against the data misfit \|Gm_\alpha - d\| on log–log axes, as \alpha sweeps. The graph is L-shaped; the best \alpha sits at the corner.
- The discrepancy principle. If you know the noise level, pick the \alpha that makes the misfit equal to that noise — fit the data exactly as well as the noise allows, no better.
- Generalized cross-validation (GCV). Choose the \alpha that best predicts left-out data, needing no noise estimate.
Why the corner is the sweet spot
The L-curve has two arms. The steep vertical arm (small
\alpha) is the under-regularized regime: the misfit barely improves
while the model norm rockets up — you are fitting noise. The flat horizontal arm
(large \alpha) is over-regularized: the model is tiny but the misfit
grows as real signal is thrown away. The corner is the balance point — the
largest drop in misfit for the smallest growth in model size.
Find the corner
The curve is the L-curve for a noisy test problem (axes are log model-norm vs log misfit). Slide
\alpha and the marker travels along it: down the vertical arm for small
\alpha (noise amplified), along the horizontal arm for large
\alpha (over-smoothed). Park it at the corner — that is the
\alpha the L-curve criterion picks.
- L-curve: the corner of log \|m_\alpha\| vs log \|Gm_\alpha - d\|.
- Discrepancy principle: make the misfit match the known noise level.
- GCV: minimise predicted error on left-out data — no noise estimate needed.