With the data now random, we can ask which model makes the observations most probable — the
Maximising this is minimising the exponent. So maximum likelihood is weighted least squares:
The inverse covariance
This is the bridge between the deterministic and statistical stories: the least-squares method we used for stability was, all along, the maximum-likelihood estimate under Gaussian noise. It still offers no help with ill-posedness, though — for that we must add a prior, which is the next step.