Data Analysis and Fitting

You have spent an afternoon in the lab hanging weights on a spring and reading its stretch off a ruler. Hooke's law promises a straight line — the extension should be proportional to the force — but your points do not fall on any line. They scatter. The ruler has a finite width, your eye wobbles, a draught nudges the pan. Every real measurement arrives wrapped in noise, and yet the clean physical law is hiding in there somewhere. Data analysis is the craft of pulling that law back out: given noisy points, what are the best model parameters, and — just as important — how sure are we of them?

This page is about the single most useful tool for that job: least-squares fitting. We will take a set of measured (x_i, y_i) points, assume a model — often the straight line y = m x + c — and find the slope and intercept that fit the data best. Then we will make "best" precise, learn to weight points by their error bars with the \chi^2 statistic, judge whether the fit is any good at all, and finally squeeze out the uncertainties \sigma_m and \sigma_c on the fitted numbers — because a slope quoted without its error bar is only half a measurement.

What does "best fit" even mean?

Draw any candidate line through the cloud of points. For each measured point it misses by a vertical gap — the residual

r_i = y_i - f(x_i),

the difference between what you measured and what the model predicts at that x. A good line makes all the residuals small; a bad line leaves big gaps. So we need a single number that summarises "how badly the whole line misses". You might first reach for the sum of the residuals — but positive and negative gaps would cancel, and a wild line that happened to balance above and below would score zero. Absolute values avoid the cancellation but have an awkward kink at zero. The choice that is smooth, always positive, punishes big misses hardest, and (it turns out) is exactly right when the noise is Gaussian, is the sum of squared residuals:

S = \sum_{i=1}^{N} r_i^{\,2} = \sum_{i=1}^{N} \bigl(y_i - f(x_i)\bigr)^2.

Least squares is the rule: pick the parameters that make S as small as possible. Squaring means a point twice as far off contributes four times the penalty, so the fit leans hard against outliers and settles into the compromise line that threads the cloud.

On New Year's Day 1801 the astronomer Giuseppe Piazzi spotted a faint new object — the dwarf planet Ceres — tracked it for six weeks, and then lost it in the Sun's glare. He had only a short, noisy arc of positions and no idea where in the whole sky it would re-emerge months later. The 24-year-old Carl Friedrich Gauss took the scattered observations, fitted an orbit to them by minimising the squared residuals, and predicted where Ceres would reappear. Astronomers pointed their telescopes there — and found it. Least squares was so spectacularly useful that Gauss became famous overnight, and the method has been the backbone of experimental science ever since. Every time you fit a line to data, you are using Gauss's asteroid trick.

The straight line, solved in closed form

For the model f(x) = m x + c we do not have to hunt for the minimum by trial and error — calculus hands it to us exactly. The sum of squares is

S(m, c) = \sum_i \bigl(y_i - m x_i - c\bigr)^2.

At the minimum, the slope of S with respect to each parameter is zero. Setting \partial S/\partial m = 0 and \partial S/\partial c = 0 gives two linear equations — the normal equations — which we can solve for m and c. Writing N for the number of points, the answer is:

So the whole fit reduces to accumulating five running sums as you sweep through the data once: \sum x_i, \sum y_i, \sum x_i y_i, \sum x_i^2, and the count N. No iteration, no guessing — a single pass and two divisions. Let's turn that recipe into code.

The fit, in code

Here is the centrepiece: a complete linear least-squares fit on a small hardcoded dataset. It builds the five sums, plugs them into the normal equations, and then prints the residual left at every point so you can see how well the line threads the data. The x values are the force in newtons, the y values the spring's extension in centimetres — Hooke's law says these should lie on a line through a slope of roughly 2\ \text{cm/N}.

// Linear least-squares fit: y = m*x + c const x: number[] = [1, 2, 3, 4, 5, 6]; const y: number[] = [2.1, 3.9, 6.3, 7.8, 10.2, 11.7]; const N = x.length; let Sx = 0, Sy = 0, Sxx = 0, Sxy = 0; for (let i = 0; i < N; i++) { Sx += x[i]; Sy += y[i]; Sxx += x[i] * x[i]; Sxy += x[i] * y[i]; } const denom = N * Sxx - Sx * Sx; const m = (N * Sxy - Sx * Sy) / denom; // slope const c = Sy / N - m * (Sx / N); // intercept, through the centroid console.log("slope m =", m.toFixed(4)); console.log("interc c =", c.toFixed(4)); // Residuals: how far each point sits from the fitted line for (let i = 0; i < N; i++) { const predicted = m * x[i] + c; const r = y[i] - predicted; console.log(`x=${x[i]} y=${y[i]} fit=${predicted.toFixed(2)} resid=${r.toFixed(3)}`); }

Run it. The fitted slope comes out close to 1.94\ \text{cm/N} and the residuals are all just a few hundredths of a centimetre — small and hopping between plus and minus, which is exactly what random measurement noise around a good model looks like. If the residuals showed a clear pattern (all positive in the middle, say, and negative at the ends), that would be a warning that a straight line is the wrong model and the data is secretly curved.

Seeing the fit

The chart below shows the same idea graphically: the noisy measurements as a scattered curve, and the clean least-squares line drawn through them. The line is the single straight path that keeps the total squared vertical gap as small as it can possibly be — no other line scores lower.

This picture — data points with a fitted line laid over them — is probably the single most common figure in all of experimental physics. Behind every such plot sits the little sum-crunching fit you just ran.

Weighting by error bars: the χ² statistic

Plain least squares treats every point as equally trustworthy. But real measurements come with different uncertainties: a point you measured carefully with a good instrument deserves more say than one you rushed at the end of the day. If point i has a standard uncertainty \sigma_i (its error bar), we should divide each residual by its own error bar before squaring, so that a miss is measured in units of that point's own uncertainty. The weighted sum of squares gets its own name — chi-squared:

\chi^2 = \sum_{i=1}^{N} \left(\frac{y_i - f(x_i)}{\sigma_i}\right)^2.

A residual of 0.2 on a point whose error bar is 0.1 counts as being "two sigma off" and contributes 4 to \chi^2; the same residual on a sloppy point with error bar 0.4 is only "half a sigma off" and contributes just 0.25. Minimising \chi^2 instead of S is a weighted fit, and it automatically leans on your best points.

The real magic of \chi^2 is that its size tells you whether the fit is believable. Each point that is "typically about one error bar away" contributes roughly 1. But we also spent some of the data's freedom pinning down the parameters — two of them for a straight line — so the fair yardstick is the number of degrees of freedom, \nu = N - 2 (data points minus fitted parameters). Comparing \chi^2 to \nu gives the reduced chi-squared:

χ² in code

Take the same spring fit and suppose each extension was read to about \sigma = 0.15\ \text{cm}. This block computes \chi^2, divides by the degrees of freedom, and prints a verdict.

// Reduced chi-squared for the spring fit (per-point sigma = 0.15 cm) const x: number[] = [1, 2, 3, 4, 5, 6]; const y: number[] = [2.1, 3.9, 6.3, 7.8, 10.2, 11.7]; const sigma = 0.15; // Fitted line from the least-squares pass (see the earlier block) const m = 1.9429; const c = 0.1667; let chi2 = 0; for (let i = 0; i < x.length; i++) { const resid = y[i] - (m * x[i] + c); chi2 += (resid / sigma) ** 2; } const dof = x.length - 2; // N data points minus 2 fitted parameters const reduced = chi2 / dof; console.log("chi-squared =", chi2.toFixed(3)); console.log("degrees of freedom =", dof); console.log("reduced chi2 =", reduced.toFixed(3)); if (reduced > 2) console.log("verdict: poor fit (or error bars too small)"); else if (reduced < 0.5) console.log("verdict: suspiciously good (error bars too big?)"); else console.log("verdict: good fit, chi2/dof near 1");

Here \chi^2_\nu lands comfortably near 1, so the straight-line model and the quoted 0.15\ \text{cm} error bars are telling a consistent story. Try shrinking \sigma to 0.02 and re-running: \chi^2_\nu explodes, and the verdict flips to "poor fit" — not because the data changed, but because you claimed a precision the scatter doesn't support.

How sure are we? Uncertainties on the fit

A fit that only reported m = 1.94 would be scientifically incomplete. The real deliverable is m = 1.94 \pm \sigma_m — the slope and its error bar. Where does \sigma_m come from? From the curvature of \chi^2 near its minimum. If \chi^2 rises steeply as you nudge the slope away from its best value, the data pins the slope down tightly and \sigma_m is small; if the minimum is a shallow valley, many slopes fit almost as well and \sigma_m is large. This curvature is captured by the error propagation of the measurement noise through the normal equations, and for a straight line it has a tidy closed form. Writing \Delta = N\sum x_i^2 - (\sum x_i)^2 for the denominator you already computed, and assuming every point shares the same uncertainty \sigma:

Notice how the shape of the errors follows common sense: the denominator \Delta grows when your x values are spread over a wide range, so a long lever arm pins the slope down better — measuring over a wide span of forces gives a sharper Hooke's-law slope than crowding all your weights together.

// Standard errors on the fitted slope and intercept const x: number[] = [1, 2, 3, 4, 5, 6]; const y: number[] = [2.1, 3.9, 6.3, 7.8, 10.2, 11.7]; const N = x.length; let Sx = 0, Sy = 0, Sxx = 0, Sxy = 0; for (let i = 0; i < N; i++) { Sx += x[i]; Sy += y[i]; Sxx += x[i] * x[i]; Sxy += x[i] * y[i]; } const delta = N * Sxx - Sx * Sx; const m = (N * Sxy - Sx * Sy) / delta; const c = Sy / N - m * (Sx / N); // Estimate the point uncertainty from the residual scatter let ssr = 0; for (let i = 0; i < N; i++) ssr += (y[i] - m * x[i] - c) ** 2; const sigma = Math.sqrt(ssr / (N - 2)); const sigma_m = sigma * Math.sqrt(N / delta); const sigma_c = sigma * Math.sqrt(Sxx / delta); console.log(`sigma (point scatter) = ${sigma.toFixed(4)} cm`); console.log(`m = ${m.toFixed(3)} +/- ${sigma_m.toFixed(3)} cm/N`); console.log(`c = ${c.toFixed(3)} +/- ${sigma_c.toFixed(3)} cm`);

Now we have a complete, quotable result: m = 1.94 \pm 0.02\ \text{cm/N} and a small intercept consistent with zero — a proper measurement, uncertainty and all.

Not necessarily. A big reduced chi-squared, \chi^2_\nu \gg 1, is a genuine alarm — but it does not say what is wrong. There are two very different culprits, and beginners almost always blame the first while the second is at least as common:

A perfectly good measurement can produce a terrible \chi^2 simply because you told the fit your uncertainties were 0.02 when honestly they were 0.15. And the reverse trap is just as sneaky: \chi^2_\nu \ll 1 feels like a triumph — the points hug the line! — but it usually means the opposite, that your error bars were too big. Chi-squared judges the model and your honesty about the errors together; you can never separate the two without thinking about where those error bars came from.

A tempting shortcut: to fit an exponential decay y = A e^{-kt}, take the logarithm of both sides to get \ln y = \ln A - k t — a straight line! — and run your linear least-squares machine on (t, \ln y). It works, and people do it all the time. But there is a hidden cost. Least squares assumes each point's noise has the same spread, yet the logarithm stretches small values and squashes large ones. A point near y = 0.01 with a tiny absolute error can have a huge error in \ln y, while a big point barely moves. So the "linearised" fit silently over-weights the noisy tail and gives a biased slope. The honest fix is either to carry the transformed error bars through properly (weight each log-point by \sigma_i / y_i) or to fit the original nonlinear model directly. Linearising is a fine sketch — just don't quote its error bars without a second thought.