Resolution
Regularization buys stability — but at a price. The recovered model is not the true one; it is a
blurred version of it. The model resolution matrix measures
exactly how blurred. Writing the regularized solution as
\hat m = G^{\sharp} d for some inverse operator
G^{\sharp} (the pseudoinverse, or a Tikhonov-filtered version), and
substituting d = G m_{\text{true}},
\hat m = G^{\sharp} G\, m_{\text{true}} = R\, m_{\text{true}}, \qquad R = G^{\sharp} G.
So R maps the truth to what we actually recover. If
R = I, resolution is perfect — every parameter is recovered exactly. In
practice R \neq I.
Averaging kernels and the bias it buys
Each row of R is an averaging kernel: the
recovered value \hat m_i is a weighted average of nearby true values,
not the single value m_i. A sharp, spiky row (close to a row of the
identity) means good local resolution; a broad, smeared row means you can only see a blurred
regional average. The width of the kernel is your resolution length.
This is the
bias–variance trade-off
in disguise. Heavy regularization broadens the kernels (more bias, more
blurring) but suppresses noise (less variance); light regularization sharpens
resolution but lets the noise back in. There is no free lunch — only a choice of where to sit.
A blurred identity
A heatmap of a resolution matrix R. Perfect resolution would be a clean
bright diagonal (R = I). Regularization smears each diagonal entry into
its neighbours — the bright band has width, and that width is exactly how far apart two features
must be before you can tell them apart.
- R = G^{\sharp}G maps the true model to the recovered one: \hat m = R\,m_{\text{true}}.
- R = I is perfect resolution; rows of R are averaging kernels whose width is the resolution length.
- More regularization → broader kernels (more bias, less noise): the bias–variance trade-off.