Polynomial Interpolation

You are handed a short table of numbers — the density of water at 0, 10, 20 and 30\,^\circ\text{C}, say, or the readings a satellite beamed down at four separate instants — and you need a value between the entries the table happens to list. There is no formula in the book; there is only the data. What curve should you draw through the dots?

The oldest and most natural answer is a polynomial. Polynomials are the friendliest functions a computer knows: you can evaluate one with nothing but multiplications and additions, you can differentiate and integrate one in your head, and — the fact this whole page rests on — there is exactly one polynomial of low enough degree threaded through any given set of points. Give me n+1 points with distinct x-values and there is a single, unique polynomial of degree at most n that passes through every last one of them. This page teaches you to build that polynomial two different ways — the Lagrange form and the Newton form — to understand how wrong it can be between the data, and to meet the spectacular way it can misbehave: Runge's phenomenon.

One set of points, one polynomial

Two points determine a unique straight line (degree 1). Three points that don't sit on a line determine a unique parabola (degree 2). The pattern continues forever, and it is the bedrock of everything below.

Given n+1 data points (x_0,y_0), (x_1,y_1), \ldots, (x_n,y_n) with the x_i all distinct, there exists a polynomial p of degree \le n with p(x_i) = y_i for every i.
That polynomial is unique: no other polynomial of degree \le n hits all the same points.

Why unique? Suppose two such polynomials p and q both fit. Their difference d = p - q has degree at most n, yet it is zero at all n+1 of the x_i. A non-zero polynomial of degree \le n can have at most n roots — so d having n+1 of them forces d \equiv 0, meaning p = q. The interpolant is the same object no matter how you compute it; the Lagrange and Newton forms below are just two different set of clothes on one person.

You can — and it exposes the uniqueness beautifully. Write p(x) = a_0 + a_1 x + \cdots + a_n x^n and demand p(x_i) = y_i. That is a linear system V\mathbf{a} = \mathbf{y} whose matrix

V = \begin{pmatrix} 1 & x_0 & x_0^2 & \cdots & x_0^n \\ 1 & x_1 & x_1^2 & \cdots & x_1^n \\ \vdots & & & & \vdots \\ 1 & x_n & x_n^2 & \cdots & x_n^n \end{pmatrix}

is the famous Vandermonde matrix. Its determinant is \prod_{i<j}(x_j - x_i), which is non-zero exactly when the x_i are distinct — so the system has one and only one solution. It works, but Vandermonde systems are notoriously ill-conditioned, so in practice we build the interpolant a smarter way. That's what Lagrange and Newton give us.

The Lagrange form: build it from switches

Here is a lovely idea. Suppose, for each data point i, we could manufacture a polynomial that is equal to 1 at x_i and equal to 0 at every other node. Call it L_i(x) — a little "switch" that is on only at its own node. Then the weighted sum

p(x) = \sum_{i=0}^{n} y_i\, L_i(x)

automatically passes through every point: plug in x_k and every term vanishes except the k-th, which contributes y_k \cdot 1 = y_k. Done. The only question is how to build the switches, and that is almost obvious once you want a polynomial that is zero at all the other nodes: multiply together a factor (x - x_j) for each j \ne i, then divide by whatever it takes to make the value at x_i come out to 1:

L_i(x) = \prod_{j \ne i} \frac{x - x_j}{x_i - x_j}.

The numerator kills the polynomial at every node but i; the denominator is just the numerator evaluated at x_i, so L_i(x_i) = 1. This on/off behaviour has a name — the Kronecker delta, L_i(x_j) = \delta_{ij} (1 when i=j, 0 otherwise). Let's build the whole thing in code and evaluate the interpolant through four points at a place strictly between them:

// Lagrange interpolation through (x_i, y_i), evaluated at a query point X. const xs: number[] = [0, 1, 2, 3]; const ys: number[] = [1, 2, 0, 5]; // no formula — just data function lagrange(X: number): number { let total = 0; for (let i = 0; i < xs.length; i++) { let Li = 1; // the switch L_i(X) for (let j = 0; j < xs.length; j++) { if (j !== i) Li *= (X - xs[j]) / (xs[i] - xs[j]); } total += ys[i] * Li; // weight the switch by y_i } return total; } // It must reproduce the data exactly at the nodes... for (let i = 0; i < xs.length; i++) { console.log(`p(${xs[i]}) = ${lagrange(xs[i])} (data: ${ys[i]})`); } // ...and gives a smooth value in between: console.log(`p(1.5) = ${lagrange(1.5).toFixed(4)}`); console.log(`p(2.5) = ${lagrange(2.5).toFixed(4)}`);

The interpolant reproduces the four y-values on the nose and hands us sensible in-between numbers. The Lagrange form is gorgeous for theory — the error formula further down falls out of it — but notice its practical wart: add one new data point and every single L_i changes (they all gain a factor), so you must rebuild from scratch. The Newton form fixes exactly that.

The Newton form: cheap to grow

Newton writes the same polynomial as a nested product, each term adding one more node:

p(x) = c_0 + c_1(x-x_0) + c_2(x-x_0)(x-x_1) + \cdots + c_n\!\!\prod_{j=0}^{n-1}(x-x_j).

The magic is in the coefficients c_k. Each is a divided difference — a discrete cousin of the derivative, written c_k = f[x_0, x_1, \ldots, x_k]. They are defined recursively: the zeroth-order ones are just the data, f[x_i] = y_i, and each higher order is a difference of two lower ones divided by the spread of nodes:

f[x_i,\ldots,x_{i+k}] = \frac{f[x_{i+1},\ldots,x_{i+k}] - f[x_i,\ldots,x_{i+k-1}]}{x_{i+k} - x_i}.

Two things make this form the working numerical analyst's favourite. First, the coefficients c_k = f[x_0,\ldots,x_k] are the top edge of a triangular table you fill column by column. Second — the payoff — adding a new data point costs only one extra row: the coefficients you already computed don't change, you just append c_{n+1}. And you evaluate the whole thing with Horner-style nesting, cheap as can be. Let's build the divided-difference table and print it:

// Newton divided differences for (x_i, y_i). Build the triangular table, // then read the interpolant's coefficients off the top diagonal. const xs: number[] = [0, 1, 2, 3]; const ys: number[] = [1, 2, 0, 5]; const n: number = xs.length; // dd[i][0] = y_i; dd[i][k] = f[x_i, ..., x_{i+k}] const dd: number[][] = xs.map((_, i) => [ys[i]]); for (let k = 1; k < n; k++) { for (let i = 0; i + k < n; i++) { const num = dd[i + 1][k - 1] - dd[i][k - 1]; dd[i][k] = num / (xs[i + k] - xs[i]); } } // The Newton coefficients c_k are the top row: dd[0][k]. const coef: number[] = dd[0].slice(); console.log(`Newton coefficients c_0..c_${n - 1}:`); coef.forEach((c, k) => console.log(` c_${k} = ${c}`)); // Evaluate by Horner-style nesting from the innermost bracket outward. function newtonEval(X: number): number { let p = coef[n - 1]; for (let k = n - 2; k >= 0; k--) p = p * (X - xs[k]) + coef[k]; return p; } console.log(`p(1.5) = ${newtonEval(1.5).toFixed(4)}`); // same value as Lagrange! console.log(`p(2.5) = ${newtonEval(2.5).toFixed(4)}`);

Run it and compare p(1.5) and p(2.5) with the Lagrange block above: identical to every digit. Of course they are — there is only one interpolating polynomial, and both methods found it. Newton just wrote it in a form that is cheap to evaluate and cheap to extend.

Because it is one in disguise. The first divided difference f[x_0,x_1] = \dfrac{y_1 - y_0}{x_1 - x_0} is precisely a slope — a secant line. As the two nodes slide together, x_1 \to x_0, it becomes f'(x_0) exactly. Higher divided differences approximate higher derivatives the same way: k!\,f[x_0,\ldots,x_k] \approx f^{(k)} somewhere in the range. This is why the interpolation error formula below wears an (n+1)-th derivative — the coefficient that would come next in the Newton series.

How wrong is it between the points?

At the nodes the interpolant is exact by construction. Everywhere else it can drift from the true function f it was sampled from — and there is a clean, quotable bound on that drift, structurally a twin of the remainder in Taylor's theorem.

If f is (n+1)-times differentiable and p is its degree-\le n interpolant at the nodes x_0,\ldots,x_n, then for each x there is a point \xi in the spanned interval with f(x) - p(x) = \frac{f^{(n+1)}(\xi)}{(n+1)!}\, \prod_{j=0}^{n}(x - x_j).
The error is a product of two competing pieces: a smoothness factor f^{(n+1)}(\xi)/(n+1)! you don't control, and a node polynomial \prod (x - x_j) you do — it is zero at every node (error vanishes there) and swells in the gaps, worst of all near the ends.

Read the formula as a warning label. The factor 1/(n+1)! grows small fast, which tempts you to think "more points, higher degree, smaller error." But the other two factors can fight back viciously: the derivative f^{(n+1)} may explode with n, and the node polynomial \prod(x-x_j) develops huge humps between equally-spaced nodes near the ends of the interval. When those two win the tug-of-war, adding points makes things worse. That disaster has a name.

Runge's phenomenon: when more points hurt

In 1901 Carl Runge interpolated the innocent-looking bell curve

f(x) = \frac{1}{1 + 25x^2} \qquad \text{on } [-1, 1]

at equally-spaced nodes and watched, with rising degree, the interpolant grow enormous oscillations near the two ends — wiggles that get taller, not shorter, as you add more points. The chart shows the degree-10 interpolant on 11 equally-spaced nodes: through the middle it hugs the true curve, but near x = \pm 1 it flings itself up and down, overshooting wildly between the last few nodes.

Let's measure the damage directly. We build the degree-10 interpolant and compare it to the true function both in the calm middle and out near the wild end:

// Runge's function and its equally-spaced interpolant of degree 10. const f = (x: number): number => 1 / (1 + 25 * x * x); const N: number = 11; // 11 nodes -> degree 10 const xs: number[] = []; const ys: number[] = []; for (let i = 0; i < N; i++) { const x = -1 + (2 * i) / (N - 1); // equally spaced on [-1, 1] xs.push(x); ys.push(f(x)); } // Newton divided-difference coefficients (top row of the table). const c: number[] = ys.slice(); for (let k = 1; k < N; k++) { for (let i = N - 1; i >= k; i--) { c[i] = (c[i] - c[i - 1]) / (xs[i] - xs[i - k]); } } const p = (X: number): number => { let v = c[N - 1]; for (let k = N - 2; k >= 0; k--) v = v * (X - xs[k]) + c[k]; return v; }; // Error in the middle vs. near the troublesome end: for (const X of [0.0, 0.3, 0.9, 0.96]) { const err = Math.abs(f(X) - p(X)); console.log(`x = ${X.toFixed(2)}: f = ${f(X).toFixed(4)} p = ${p(X).toFixed(4)} |err| = ${err.toFixed(4)}`); }

In the middle the interpolant is essentially perfect; near x = 0.96 the error is enormous — larger than the function itself. Push N to 21 in your head (or in the editor) and the end error grows past 50. This is the cautionary tale that launched modern approximation theory.

The seductive wrong intuition is "a higher-degree polynomial fits more points, so it must be more accurate." Runge's phenomenon is the counterexample burned into every numerical analyst's memory: for equally-spaced nodes on a fixed interval, raising the degree makes the interpolant diverge near the endpoints — the maximum error grows without bound as n \to \infty, even though f is perfectly smooth. The culprit is the node polynomial \prod(x - x_j) in the error formula, which grows monstrous humps between the outermost equally-spaced nodes.

There are two standard cures, and neither is "use fewer points":

Move the nodes. Cluster them toward the ends using Chebyshev nodes x_k = \cos\!\big(\tfrac{k\pi}{n}\big), which are dense where the trouble lives. This flattens the node polynomial's humps and the error behaves.
Stop chasing one high-degree polynomial. Use many low-degree pieces instead — a spline, which stitches cubics together smoothly and never oscillates.