Polynomial Interpolation
You are handed a short table of numbers — the density of water at
0, 10, 20 and 30\,^\circ\text{C}, say, or the
readings a satellite beamed down at four separate instants — and you need a value between the
entries the table happens to list. There is no formula in the book; there is only the data. What curve
should you draw through the dots?
The oldest and most natural answer is a polynomial. Polynomials are the friendliest
functions a computer knows: you can evaluate one with nothing but multiplications and additions, you can
differentiate and integrate one in your head, and — the fact this whole page rests on — there is
exactly one polynomial of low enough degree threaded through any given set of points.
Give me n+1 points with distinct x-values and there
is a single, unique polynomial of degree at most n that passes through every
last one of them. This page teaches you to build that polynomial two different ways — the
Lagrange form and the Newton form — to understand
how wrong it can be between the data, and to meet the spectacular way it can misbehave:
Runge's phenomenon.
One set of points, one polynomial
Two points determine a unique straight line (degree 1). Three points that
don't sit on a line determine a unique parabola (degree 2). The pattern
continues forever, and it is the bedrock of everything below.
-
Given n+1 data points
(x_0,y_0), (x_1,y_1), \ldots, (x_n,y_n) with the
x_i all distinct, there
exists a polynomial p of degree
\le n with p(x_i) = y_i for every
i.
-
That polynomial is unique: no other polynomial of degree
\le n hits all the same points.
Why unique? Suppose two such polynomials p and q
both fit. Their difference d = p - q has degree at most
n, yet it is zero at all n+1 of the
x_i. A non-zero polynomial of degree \le n can have
at most n roots — so d having
n+1 of them forces d \equiv 0, meaning
p = q. The interpolant is the same object no matter how you compute
it; the Lagrange and Newton forms below are just two different set of clothes on one person.
You can — and it exposes the uniqueness beautifully. Write
p(x) = a_0 + a_1 x + \cdots + a_n x^n and demand
p(x_i) = y_i. That is a linear system
V\mathbf{a} = \mathbf{y} whose matrix
V = \begin{pmatrix} 1 & x_0 & x_0^2 & \cdots & x_0^n \\ 1 & x_1 & x_1^2 & \cdots & x_1^n \\ \vdots & & & & \vdots \\ 1 & x_n & x_n^2 & \cdots & x_n^n \end{pmatrix}
is the famous Vandermonde matrix. Its determinant is
\prod_{i<j}(x_j - x_i), which is non-zero exactly when the
x_i are distinct — so the system has one and only one solution. It works,
but Vandermonde systems are notoriously ill-conditioned, so in practice we build the
interpolant a smarter way. That's what Lagrange and Newton give us.
The Lagrange form: build it from switches
Here is a lovely idea. Suppose, for each data point i, we could manufacture a
polynomial that is equal to 1 at
x_i and equal to 0 at every other
node. Call it L_i(x) — a little "switch" that is on only at its own
node. Then the weighted sum
p(x) = \sum_{i=0}^{n} y_i\, L_i(x)
automatically passes through every point: plug in x_k and every term
vanishes except the k-th, which contributes
y_k \cdot 1 = y_k. Done. The only question is how to build the switches, and
that is almost obvious once you want a polynomial that is zero at all the other nodes: multiply
together a factor (x - x_j) for each j \ne i, then
divide by whatever it takes to make the value at x_i come out to
1:
L_i(x) = \prod_{j \ne i} \frac{x - x_j}{x_i - x_j}.
The numerator kills the polynomial at every node but i; the denominator is
just the numerator evaluated at x_i, so
L_i(x_i) = 1. This on/off behaviour has a name — the
Kronecker delta, L_i(x_j) = \delta_{ij} (1 when
i=j, 0 otherwise). Let's build the whole thing in code and evaluate the
interpolant through four points at a place strictly between them:
// Lagrange interpolation through (x_i, y_i), evaluated at a query point X.
const xs: number[] = [0, 1, 2, 3];
const ys: number[] = [1, 2, 0, 5]; // no formula — just data
function lagrange(X: number): number {
let total = 0;
for (let i = 0; i < xs.length; i++) {
let Li = 1; // the switch L_i(X)
for (let j = 0; j < xs.length; j++) {
if (j !== i) Li *= (X - xs[j]) / (xs[i] - xs[j]);
}
total += ys[i] * Li; // weight the switch by y_i
}
return total;
}
// It must reproduce the data exactly at the nodes...
for (let i = 0; i < xs.length; i++) {
console.log(`p(${xs[i]}) = ${lagrange(xs[i])} (data: ${ys[i]})`);
}
// ...and gives a smooth value in between:
console.log(`p(1.5) = ${lagrange(1.5).toFixed(4)}`);
console.log(`p(2.5) = ${lagrange(2.5).toFixed(4)}`);
The interpolant reproduces the four y-values on the nose and hands us
sensible in-between numbers. The Lagrange form is gorgeous for theory — the error formula
further down falls out of it — but notice its practical wart: add one new data point and every single
L_i changes (they all gain a factor), so you must rebuild from scratch. The
Newton form fixes exactly that.
The Newton form: cheap to grow
Newton writes the same polynomial as a nested product, each term adding one more node:
p(x) = c_0 + c_1(x-x_0) + c_2(x-x_0)(x-x_1) + \cdots + c_n\!\!\prod_{j=0}^{n-1}(x-x_j).
The magic is in the coefficients c_k. Each is a divided
difference — a discrete cousin of the derivative, written
c_k = f[x_0, x_1, \ldots, x_k]. They are defined recursively: the zeroth-order
ones are just the data, f[x_i] = y_i, and each higher order is a difference of
two lower ones divided by the spread of nodes:
f[x_i,\ldots,x_{i+k}] = \frac{f[x_{i+1},\ldots,x_{i+k}] - f[x_i,\ldots,x_{i+k-1}]}{x_{i+k} - x_i}.
Two things make this form the working numerical analyst's favourite. First, the coefficients
c_k = f[x_0,\ldots,x_k] are the top edge of a triangular table
you fill column by column. Second — the payoff — adding a new data point costs only one extra
row: the coefficients you already computed don't change, you just append
c_{n+1}. And you evaluate the whole thing with
Horner-style nesting, cheap as
can be. Let's build the divided-difference table and print it:
// Newton divided differences for (x_i, y_i). Build the triangular table,
// then read the interpolant's coefficients off the top diagonal.
const xs: number[] = [0, 1, 2, 3];
const ys: number[] = [1, 2, 0, 5];
const n: number = xs.length;
// dd[i][0] = y_i; dd[i][k] = f[x_i, ..., x_{i+k}]
const dd: number[][] = xs.map((_, i) => [ys[i]]);
for (let k = 1; k < n; k++) {
for (let i = 0; i + k < n; i++) {
const num = dd[i + 1][k - 1] - dd[i][k - 1];
dd[i][k] = num / (xs[i + k] - xs[i]);
}
}
// The Newton coefficients c_k are the top row: dd[0][k].
const coef: number[] = dd[0].slice();
console.log(`Newton coefficients c_0..c_${n - 1}:`);
coef.forEach((c, k) => console.log(` c_${k} = ${c}`));
// Evaluate by Horner-style nesting from the innermost bracket outward.
function newtonEval(X: number): number {
let p = coef[n - 1];
for (let k = n - 2; k >= 0; k--) p = p * (X - xs[k]) + coef[k];
return p;
}
console.log(`p(1.5) = ${newtonEval(1.5).toFixed(4)}`); // same value as Lagrange!
console.log(`p(2.5) = ${newtonEval(2.5).toFixed(4)}`);
Run it and compare p(1.5) and p(2.5) with the
Lagrange block above: identical to every digit. Of course they are — there is only one
interpolating polynomial, and both methods found it. Newton just wrote it in a form that is cheap to
evaluate and cheap to extend.
Because it is one in disguise. The first divided difference
f[x_0,x_1] = \dfrac{y_1 - y_0}{x_1 - x_0} is precisely a slope — a secant
line. As the two nodes slide together, x_1 \to x_0, it becomes
f'(x_0) exactly. Higher divided differences approximate higher derivatives
the same way: k!\,f[x_0,\ldots,x_k] \approx f^{(k)} somewhere in the range.
This is why the interpolation error formula below wears an (n+1)-th
derivative — the coefficient that would come next in the Newton series.
How wrong is it between the points?
At the nodes the interpolant is exact by construction. Everywhere else it can drift from the true
function f it was sampled from — and there is a clean, quotable bound on that
drift, structurally a twin of the remainder in Taylor's theorem.
-
If f is (n+1)-times differentiable and
p is its degree-\le n interpolant at the nodes
x_0,\ldots,x_n, then for each x there is a
point \xi in the spanned interval with
f(x) - p(x) = \frac{f^{(n+1)}(\xi)}{(n+1)!}\, \prod_{j=0}^{n}(x - x_j).
-
The error is a product of two competing pieces: a smoothness factor
f^{(n+1)}(\xi)/(n+1)! you don't control, and a node
polynomial \prod (x - x_j) you do — it is zero at every
node (error vanishes there) and swells in the gaps, worst of all near the ends.
Read the formula as a warning label. The factor 1/(n+1)! grows small fast,
which tempts you to think "more points, higher degree, smaller error." But the other two factors can
fight back viciously: the derivative f^{(n+1)} may explode with
n, and the node polynomial \prod(x-x_j) develops
huge humps between equally-spaced nodes near the ends of the interval. When those two win the tug-of-war,
adding points makes things worse. That disaster has a name.
Runge's phenomenon: when more points hurt
In 1901 Carl Runge interpolated the innocent-looking bell curve
f(x) = \frac{1}{1 + 25x^2} \qquad \text{on } [-1, 1]
at equally-spaced nodes and watched, with rising degree, the interpolant grow enormous
oscillations near the two ends — wiggles that get taller, not shorter, as you
add more points. The chart shows the degree-10 interpolant on
11 equally-spaced nodes: through the middle it hugs the true curve, but near
x = \pm 1 it flings itself up and down, overshooting wildly between the last
few nodes.
Let's measure the damage directly. We build the degree-10 interpolant and
compare it to the true function both in the calm middle and out near the wild end:
// Runge's function and its equally-spaced interpolant of degree 10.
const f = (x: number): number => 1 / (1 + 25 * x * x);
const N: number = 11; // 11 nodes -> degree 10
const xs: number[] = [];
const ys: number[] = [];
for (let i = 0; i < N; i++) {
const x = -1 + (2 * i) / (N - 1); // equally spaced on [-1, 1]
xs.push(x);
ys.push(f(x));
}
// Newton divided-difference coefficients (top row of the table).
const c: number[] = ys.slice();
for (let k = 1; k < N; k++) {
for (let i = N - 1; i >= k; i--) {
c[i] = (c[i] - c[i - 1]) / (xs[i] - xs[i - k]);
}
}
const p = (X: number): number => {
let v = c[N - 1];
for (let k = N - 2; k >= 0; k--) v = v * (X - xs[k]) + c[k];
return v;
};
// Error in the middle vs. near the troublesome end:
for (const X of [0.0, 0.3, 0.9, 0.96]) {
const err = Math.abs(f(X) - p(X));
console.log(`x = ${X.toFixed(2)}: f = ${f(X).toFixed(4)} p = ${p(X).toFixed(4)} |err| = ${err.toFixed(4)}`);
}
In the middle the interpolant is essentially perfect; near x = 0.96 the error
is enormous — larger than the function itself. Push N to
21 in your head (or in the editor) and the end error grows past
50. This is the cautionary tale that launched modern approximation theory.
The seductive wrong intuition is "a higher-degree polynomial fits more points, so it must be more
accurate." Runge's phenomenon is the counterexample burned into every numerical analyst's memory: for
equally-spaced nodes on a fixed interval, raising the degree makes the interpolant
diverge near the endpoints — the maximum error grows without bound as
n \to \infty, even though f is perfectly smooth.
The culprit is the node polynomial \prod(x - x_j) in the error formula,
which grows monstrous humps between the outermost equally-spaced nodes.
There are two standard cures, and neither is "use fewer points":
-
Move the nodes. Cluster them toward the ends using
Chebyshev nodes x_k = \cos\!\big(\tfrac{k\pi}{n}\big),
which are dense where the trouble lives. This flattens the node polynomial's humps and the error
behaves.
-
Stop chasing one high-degree polynomial. Use many
low-degree pieces instead — a
spline, which stitches
cubics together smoothly and never oscillates.