Data Analysis and Fitting
You have spent an afternoon in the lab hanging weights on a spring and reading its stretch off a
ruler. Hooke's law promises a straight line — the extension should be proportional to the force —
but your points do not fall on any line. They scatter. The ruler has a finite width, your
eye wobbles, a draught nudges the pan. Every real measurement arrives wrapped in noise, and yet the
clean physical law is hiding in there somewhere. Data analysis is the craft of
pulling that law back out: given noisy points, what are the best model parameters, and — just as
important — how sure are we of them?
This page is about the single most useful tool for that job: least-squares fitting.
We will take a set of measured (x_i, y_i) points, assume a model — often
the straight line y = m x + c — and find the slope and intercept that fit
the data best. Then we will make "best" precise, learn to weight points by their error bars
with the \chi^2 statistic, judge whether the fit is any
good at all, and finally squeeze out the uncertainties
\sigma_m and \sigma_c on the fitted numbers —
because a slope quoted without its error bar is only half a measurement.
What does "best fit" even mean?
Draw any candidate line through the cloud of points. For each measured point it misses by a vertical
gap — the residual
r_i = y_i - f(x_i),
the difference between what you measured and what the model predicts at that
x. A good line makes all the residuals small; a bad line leaves big gaps.
So we need a single number that summarises "how badly the whole line misses". You might first reach
for the sum of the residuals — but positive and negative gaps would cancel, and a wild line that
happened to balance above and below would score zero. Absolute values avoid the cancellation but have
an awkward kink at zero. The choice that is smooth, always positive, punishes big misses hardest, and
(it turns out) is exactly right when the noise is Gaussian, is the sum of squared
residuals:
S = \sum_{i=1}^{N} r_i^{\,2} = \sum_{i=1}^{N} \bigl(y_i - f(x_i)\bigr)^2.
Least squares is the rule: pick the parameters that make
S as small as possible. Squaring means a point twice as far off
contributes four times the penalty, so the fit leans hard against outliers and settles into the
compromise line that threads the cloud.
On New Year's Day 1801 the astronomer Giuseppe Piazzi spotted a faint new object — the dwarf
planet Ceres — tracked it for six weeks, and then lost it in the Sun's glare. He
had only a short, noisy arc of positions and no idea where in the whole sky it would re-emerge
months later. The 24-year-old Carl Friedrich Gauss
took the scattered observations, fitted an orbit to them by minimising the squared residuals, and
predicted where Ceres would reappear. Astronomers pointed their telescopes there — and found it.
Least squares was so spectacularly useful that Gauss became famous overnight, and the method has
been the backbone of experimental science ever since. Every time you fit a line to data, you are
using Gauss's asteroid trick.
The straight line, solved in closed form
For the model f(x) = m x + c we do not have to hunt for the minimum by
trial and error — calculus hands it to us exactly. The sum of squares is
S(m, c) = \sum_i \bigl(y_i - m x_i - c\bigr)^2.
At the minimum, the slope of S with respect to each parameter is
zero. Setting \partial S/\partial m = 0 and
\partial S/\partial c = 0 gives two linear equations — the
normal equations — which we can solve for m and
c. Writing N for the number of points, the
answer is:
-
Slope.
m = \frac{N\sum x_i y_i - \sum x_i \sum y_i}{N\sum x_i^2 - \left(\sum x_i\right)^2}.
-
Intercept. Once the slope is known, the line passes through the centroid
(\bar{x}, \bar{y}) of the data:
c = \bar{y} - m\,\bar{x},
where \bar{x} = \tfrac{1}{N}\sum x_i and
\bar{y} = \tfrac{1}{N}\sum y_i.
So the whole fit reduces to accumulating five running sums as you sweep through the data once:
\sum x_i, \sum y_i,
\sum x_i y_i, \sum x_i^2, and the count
N. No iteration, no guessing — a single pass and two divisions. Let's turn
that recipe into code.
The fit, in code
Here is the centrepiece: a complete linear least-squares fit on a small hardcoded dataset. It builds
the five sums, plugs them into the normal equations, and then prints the residual left at every
point so you can see how well the line threads the data. The x values are
the force in newtons, the y values the spring's extension in centimetres —
Hooke's law says these should lie on a line through a slope of roughly
2\ \text{cm/N}.
// Linear least-squares fit: y = m*x + c
const x: number[] = [1, 2, 3, 4, 5, 6];
const y: number[] = [2.1, 3.9, 6.3, 7.8, 10.2, 11.7];
const N = x.length;
let Sx = 0, Sy = 0, Sxx = 0, Sxy = 0;
for (let i = 0; i < N; i++) {
Sx += x[i];
Sy += y[i];
Sxx += x[i] * x[i];
Sxy += x[i] * y[i];
}
const denom = N * Sxx - Sx * Sx;
const m = (N * Sxy - Sx * Sy) / denom; // slope
const c = Sy / N - m * (Sx / N); // intercept, through the centroid
console.log("slope m =", m.toFixed(4));
console.log("interc c =", c.toFixed(4));
// Residuals: how far each point sits from the fitted line
for (let i = 0; i < N; i++) {
const predicted = m * x[i] + c;
const r = y[i] - predicted;
console.log(`x=${x[i]} y=${y[i]} fit=${predicted.toFixed(2)} resid=${r.toFixed(3)}`);
}
Run it. The fitted slope comes out close to 1.94\ \text{cm/N} and the
residuals are all just a few hundredths of a centimetre — small and hopping between plus and minus,
which is exactly what random measurement noise around a good model looks like. If the residuals
showed a clear pattern (all positive in the middle, say, and negative at the ends), that would be a
warning that a straight line is the wrong model and the data is secretly curved.
Seeing the fit
The chart below shows the same idea graphically: the noisy measurements as a scattered curve, and the
clean least-squares line drawn through them. The line is the single straight path that keeps the
total squared vertical gap as small as it can possibly be — no other line scores lower.
This picture — data points with a fitted line laid over them — is probably the single most common
figure in all of experimental physics. Behind every such plot sits the little sum-crunching fit you
just ran.
Weighting by error bars: the χ² statistic
Plain least squares treats every point as equally trustworthy. But real measurements come with
different uncertainties: a point you measured carefully with a good instrument deserves more
say than one you rushed at the end of the day. If point i has a standard
uncertainty \sigma_i (its error bar), we should divide each residual by
its own error bar before squaring, so that a miss is measured in units of that point's own
uncertainty. The weighted sum of squares gets its own name — chi-squared:
\chi^2 = \sum_{i=1}^{N} \left(\frac{y_i - f(x_i)}{\sigma_i}\right)^2.
A residual of 0.2 on a point whose error bar is
0.1 counts as being "two sigma off" and contributes
4 to \chi^2; the same residual on a sloppy point
with error bar 0.4 is only "half a sigma off" and contributes just
0.25. Minimising \chi^2 instead of
S is a weighted fit, and it automatically leans on your
best points.
The real magic of \chi^2 is that its size tells you whether the
fit is believable. Each point that is "typically about one error bar away" contributes roughly
1. But we also spent some of the data's freedom pinning down the
parameters — two of them for a straight line — so the fair yardstick is the number of
degrees of freedom, \nu = N - 2 (data points minus fitted
parameters). Comparing \chi^2 to \nu gives the
reduced chi-squared:
-
Definition.
\chi^2_\nu = \frac{\chi^2}{\nu}, \qquad \nu = N - (\text{number of fitted parameters}).
-
A good fit has \chi^2_\nu \approx 1: the model misses
each point by about one error bar, exactly as random noise should.
-
\chi^2_\nu \gg 1: the points scatter more
than their error bars allow — either the model is wrong, or the error bars were underestimated.
-
\chi^2_\nu \ll 1: the points hug the line
closer than their error bars — usually a sign the error bars were overestimated
(not, alas, a sign of a wonderful measurement).
χ² in code
Take the same spring fit and suppose each extension was read to about
\sigma = 0.15\ \text{cm}. This block computes
\chi^2, divides by the degrees of freedom, and prints a verdict.
// Reduced chi-squared for the spring fit (per-point sigma = 0.15 cm)
const x: number[] = [1, 2, 3, 4, 5, 6];
const y: number[] = [2.1, 3.9, 6.3, 7.8, 10.2, 11.7];
const sigma = 0.15;
// Fitted line from the least-squares pass (see the earlier block)
const m = 1.9429;
const c = 0.1667;
let chi2 = 0;
for (let i = 0; i < x.length; i++) {
const resid = y[i] - (m * x[i] + c);
chi2 += (resid / sigma) ** 2;
}
const dof = x.length - 2; // N data points minus 2 fitted parameters
const reduced = chi2 / dof;
console.log("chi-squared =", chi2.toFixed(3));
console.log("degrees of freedom =", dof);
console.log("reduced chi2 =", reduced.toFixed(3));
if (reduced > 2) console.log("verdict: poor fit (or error bars too small)");
else if (reduced < 0.5) console.log("verdict: suspiciously good (error bars too big?)");
else console.log("verdict: good fit, chi2/dof near 1");
Here \chi^2_\nu lands comfortably near 1, so the
straight-line model and the quoted 0.15\ \text{cm} error bars are telling a
consistent story. Try shrinking \sigma to
0.02 and re-running: \chi^2_\nu explodes, and
the verdict flips to "poor fit" — not because the data changed, but because you claimed a precision
the scatter doesn't support.
How sure are we? Uncertainties on the fit
A fit that only reported m = 1.94 would be scientifically incomplete. The
real deliverable is m = 1.94 \pm \sigma_m — the slope
and its error bar. Where does \sigma_m come from? From the
curvature of \chi^2 near its minimum. If
\chi^2 rises steeply as you nudge the slope away from its best value, the
data pins the slope down tightly and \sigma_m is small; if the minimum is a
shallow valley, many slopes fit almost as well and \sigma_m is large. This
curvature is captured by the error
propagation of the measurement noise through the normal equations, and for a straight
line it has a tidy closed form. Writing \Delta = N\sum x_i^2 - (\sum x_i)^2
for the denominator you already computed, and assuming every point shares the same uncertainty
\sigma:
-
Slope uncertainty.
\sigma_m = \sigma\,\sqrt{\frac{N}{\Delta}}.
-
Intercept uncertainty.
\sigma_c = \sigma\,\sqrt{\frac{\sum x_i^2}{\Delta}}.
-
Where \sigma comes from. If you don't know the point
error bars ahead of time, estimate them from the scatter of the fit itself:
\sigma^2 \approx \frac{1}{N-2}\sum_i \bigl(y_i - m x_i - c\bigr)^2.
Notice how the shape of the errors follows common sense: the denominator
\Delta grows when your x values are spread over
a wide range, so a long lever arm pins the slope down better — measuring over a wide
span of forces gives a sharper Hooke's-law slope than crowding all your weights together.
// Standard errors on the fitted slope and intercept
const x: number[] = [1, 2, 3, 4, 5, 6];
const y: number[] = [2.1, 3.9, 6.3, 7.8, 10.2, 11.7];
const N = x.length;
let Sx = 0, Sy = 0, Sxx = 0, Sxy = 0;
for (let i = 0; i < N; i++) { Sx += x[i]; Sy += y[i]; Sxx += x[i] * x[i]; Sxy += x[i] * y[i]; }
const delta = N * Sxx - Sx * Sx;
const m = (N * Sxy - Sx * Sy) / delta;
const c = Sy / N - m * (Sx / N);
// Estimate the point uncertainty from the residual scatter
let ssr = 0;
for (let i = 0; i < N; i++) ssr += (y[i] - m * x[i] - c) ** 2;
const sigma = Math.sqrt(ssr / (N - 2));
const sigma_m = sigma * Math.sqrt(N / delta);
const sigma_c = sigma * Math.sqrt(Sxx / delta);
console.log(`sigma (point scatter) = ${sigma.toFixed(4)} cm`);
console.log(`m = ${m.toFixed(3)} +/- ${sigma_m.toFixed(3)} cm/N`);
console.log(`c = ${c.toFixed(3)} +/- ${sigma_c.toFixed(3)} cm`);
Now we have a complete, quotable result:
m = 1.94 \pm 0.02\ \text{cm/N} and a small intercept consistent with zero —
a proper measurement, uncertainty and all.
Not necessarily. A big reduced chi-squared,
\chi^2_\nu \gg 1, is a genuine alarm — but it does not say what is
wrong. There are two very different culprits, and beginners almost always blame the first while the
second is at least as common:
- the model is wrong (the data is really curved and you forced a line through it);
- the error bars are too small (the data is fine, but you were over-optimistic
about your instrument's precision).
A perfectly good measurement can produce a terrible \chi^2 simply because
you told the fit your uncertainties were 0.02 when honestly they were
0.15. And the reverse trap is just as sneaky:
\chi^2_\nu \ll 1 feels like a triumph — the points hug the line! — but it
usually means the opposite, that your error bars were too big. Chi-squared judges the model
and your honesty about the errors together; you can never separate the two without
thinking about where those error bars came from.
A tempting shortcut: to fit an exponential decay y = A e^{-kt}, take the
logarithm of both sides to get \ln y = \ln A - k t — a straight line! — and
run your linear least-squares machine on (t, \ln y). It works, and people do
it all the time. But there is a hidden cost. Least squares assumes each point's noise has the
same spread, yet the logarithm stretches small values and squashes large ones.
A point near y = 0.01 with a tiny absolute error can have a huge error in
\ln y, while a big point barely moves. So the "linearised" fit silently
over-weights the noisy tail and gives a biased slope. The honest fix is either to carry the transformed
error bars through properly (weight each log-point by \sigma_i / y_i) or to
fit the original nonlinear model directly. Linearising is a fine sketch — just don't quote its error
bars without a second thought.