Root Finding and Linear Solvers
Somewhere overhead a satellite is exactly where Kepler's laws say it should be — and yet no one can
write down, in closed form, where that is. To find a planet's position along its orbit you must solve
Kepler's equation,
M = E - e\,\sin E,
for the eccentric anomaly E, given the mean anomaly
M (which just ticks along uniformly with time) and the orbit's
eccentricity e. That little equation has haunted astronomers since the
1600s, because E is trapped both outside and inside a sine: there
is no formula that isolates it. You cannot algebra your way out. Instead you rearrange
it into "find where a function crosses zero,"
f(E) = E - e\,\sin E - M = 0,
and hand it to a machine that hunts for the crossing. This is root finding: the art of
solving f(x)=0 when no tidy formula exists. It is one of the two great
workhorses of computational
physics. The other is its cousin, the linear solver — the machinery for
cracking A\mathbf{x} = \mathbf{b}, a whole system of equations at once. This
page teaches both, and shows you the two most important root-finders in action: the slow-but-certain
bisection method and the fast-but-fickle Newton–Raphson method.
Where do these "solve for zero" problems come from?
Once you start looking, physics is full of equations you cannot solve by hand:
-
Equilibrium points. A ball rolling in a potential U(x)
rests where the force vanishes, f(x) = -U'(x) = 0. For any messy potential
that is a root-finding problem.
-
Resonance and eigen-conditions. The allowed frequencies of a vibrating drum, or the
bound-state energies of a quantum well, are the values that make some determinant or transcendental
condition equal zero.
-
Implicit time steps. Stable ("implicit") integrators for stiff problems define the
next state only implicitly — each step you must solve f(x_{n+1}) = 0
to find where you land.
-
Fitting and calibration. Matching a model to data means driving the derivative of a
cost function to zero.
A root of f is simply an input x^\*
where f(x^\*) = 0 — where the graph of f pierces
the horizontal axis. Here is the Kepler residual f(E) = E - e\sin E - M for a
modest orbit; watch for the single place the curve slices through zero.
Bisection: slow, dull, and unstoppable
The most trustworthy root-finder is also the simplest, and it rests on a fact you already believe: if a
continuous function is negative at one point and positive at another, then somewhere
in between it must pass through zero. (This is the Intermediate Value Theorem, but you knew it in your
bones long before it had a name.) So to trap a root, first find two points a
and b that bracket it — points where the function has
opposite signs:
f(a)\cdot f(b) < 0.
Now squeeze. Look at the midpoint m = (a+b)/2. Check the sign of
f(m). The root must lie in whichever half still shows a sign
change — either [a,m] or [m,b] — so throw the
other half away and repeat. Every step the bracket gets half as wide, and the root
can never escape it.
-
Requirement. A bracket [a,b] with
f(a)\,f(b) < 0 (a genuine sign change).
-
Step. Set m=(a+b)/2; keep the half whose endpoints
still straddle zero.
-
Convergence is linear. After n steps the bracket width
is the starting width divided by 2^n, so the error falls like
2^{-n} — you gain about one binary digit (roughly one decimal digit
every 3.3 steps) per step. Steady, and guaranteed.
Let's watch it find \sqrt{2} by solving f(x)=x^2-2=0.
Start with the bracket [0,2], since f(0)=-2<0
and f(2)=2>0. Press Run and watch the interval collapse
onto 1.41421\ldots
function f(x: number): number {
return x * x - 2; // a root sits at x = √2
}
let a = 0, b = 2; // f(a) < 0 and f(b) > 0, so a root is trapped inside
for (let n = 1; n <= 10; n++) {
const m = (a + b) / 2; // the midpoint
if (f(a) * f(m) < 0) {
b = m; // sign change is in [a, m]: keep the left half
} else {
a = m; // otherwise the root is in [m, b]: keep the right half
}
console.log(`step ${n}: [${a.toFixed(6)}, ${b.toFixed(6)}] width = ${(b - a).toFixed(6)}`);
}
const root = (a + b) / 2;
console.log(`root ≈ ${root.toFixed(6)} (√2 = ${Math.SQRT2.toFixed(6)})`);
Notice the width halving like clockwork: 2, 1, 0.5, 0.25,\ldots It never
speeds up and it never fails. For a rough answer that is wonderful; for many digits it is
exasperatingly slow. To reach a tolerance of 10^{-12} from a starting width
of 2 takes about 41 steps. We can do far better —
if we are willing to take a risk.
Newton–Raphson: ride the tangent
Bisection ignores everything about f except its sign. That is a
lot of thrown-away information — the function also has a slope, and the slope points
roughly toward the root. Newton's method (developed by
Isaac Newton and refined by Joseph Raphson) exploits
it. Stand at your current guess x_n, draw the tangent line
to the curve there, and slide down it to where it hits the axis. That landing point is your next,
better guess.
The tangent at x_n has slope f'(x_n) and passes
through \bigl(x_n, f(x_n)\bigr). Setting its height to zero and solving for
the crossing gives the whole method in one line:
-
The update.
x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.
-
Convergence is quadratic. Near a simple root the number of correct digits
roughly doubles every step — 3\to 6\to 12 — so a handful of
iterations reaches machine precision.
-
The price. You must know the derivative f', and you
must start close enough. Far from the root, or where f'\approx 0, it can
overshoot wildly or diverge.
Take the same problem, f(x)=x^2-2, whose derivative is
f'(x)=2x. The update becomes
x_{n+1} = x_n - \dfrac{x_n^2-2}{2x_n}, which you may recognise as the
ancient Babylonian square-root rule "average your guess with 2/x."
Starting from x_0=2, watch the error nose-dive:
function f(x: number): number { return x * x - 2; }
function df(x: number): number { return 2 * x; } // the derivative f'(x)
let x = 2; // starting guess
for (let n = 1; n <= 6; n++) {
x = x - f(x) / df(x); // slide down the tangent to the axis
const error = Math.abs(x - Math.SQRT2);
console.log(`step ${n}: x = ${x.toFixed(15)} error = ${error.toExponential(2)}`);
}
Count the leading nines and zeros: the error goes roughly
10^{-1}\to 10^{-2}\to 10^{-5}\to 10^{-10}\to essentially zero. Where
bisection needed about 41 steps to reach 10^{-12},
Newton gets there in five. That is the difference between linear and quadratic
convergence — and it is why, when you can supply a derivative and a decent starting guess, Newton wins
by a landslide.
Back to the planets: Newton solves Kepler
Now we can crack the equation we opened with. To find the eccentric anomaly we hunt for the root of
f(E) = E - e\sin E - M, whose derivative is a gift:
f'(E) = 1 - e\cos E. Because 0 \le e < 1, that
derivative is always positive, so the residual rises monotonically and Newton behaves beautifully. A
classic first guess is simply E_0 = M. Solve for a body one radian of mean
anomaly into an orbit of eccentricity e=0.2:
const e = 0.2; // orbital eccentricity (0 = circle)
const M = 1.0; // mean anomaly in radians (grows uniformly with time)
// Kepler's equation: E − e·sin(E) = M. Root-find on the residual f(E).
function f(E: number): number { return E - e * Math.sin(E) - M; }
function df(E: number): number { return 1 - e * Math.cos(E); }
let E = M; // the standard starting guess
for (let n = 1; n <= 5; n++) {
E = E - f(E) / df(E);
console.log(`step ${n}: E = ${E.toFixed(12)} residual = ${f(E).toExponential(2)}`);
}
console.log(`eccentric anomaly E ≈ ${E.toFixed(9)} rad`);
Four hundred years of astronomers would have wept with joy: a problem with no closed form
solved to twelve digits in four lines and a few microseconds. Every planetarium app, every spacecraft
navigation routine, runs a loop like this millions of times.
Watch out — faster is not the same as safer. Newton's quadratic speed is real, but
it comes with sharp teeth. Feed it a bad starting point and it can hurl your guess to the far side of
the number line; land it near a spot where f'(x)\approx 0 (a nearly flat
tangent) and the step f/f' blows up, launching you off to infinity. It can
even fall into a cycle, bouncing between two wrong values forever. Try
f(x)=x^3-2x+2 from x_0=0 and you get
0\to 1\to 0\to 1\to\cdots, trapped for eternity.
Bisection cannot do any of that. Give it a genuine bracket and it is guaranteed to converge,
no derivative required, no drama. Its only demand is a real sign change — which is also its one
blind spot: it cannot find a double root that merely touches the axis
without crossing it (like f(x)=x^2 at the origin), because there is no sign
change to bracket. In practice the pros use hybrid methods (see below) that take
Newton's fast steps when they're behaving and fall back to a safe bisection step when they're not —
the best of both temperaments.
Gloriously true. Push Newton's method into the complex plane — apply it to something like
f(z)=z^3-1, which has three roots spaced around a circle — and colour each
starting point by which root it eventually tumbles into. The result is a
Newton fractal: three interlocking basins whose boundary is infinitely intricate,
swirling and self-similar at every scale, with the astonishing property that at any point on the
border, all three colours meet. The lesson is humbling: "start close enough" hides a boundary of
bottomless complexity. A tiny nudge in your initial guess can send you to a completely different
answer.
The other workhorse: solving A\mathbf{x}=\mathbf{b}
Root finding tackles one equation in one unknown. But physics is forever handing us many
equations in many unknowns, all bound together. Balance the forces in a truss, apply
Kirchhoff's laws to a circuit, or lay a grid over a region and demand that a temperature or potential
field satisfy a differential equation at every grid point — in each case you end up with a
linear system: a rectangle of coefficients A multiplying a
column of unknowns \mathbf{x} to give a column of knowns
\mathbf{b},
A\mathbf{x} = \mathbf{b}.
For a concrete taste, take three equations in three unknowns:
\begin{aligned} 2x + \phantom{1}y - \phantom{2}z &= 8,\\ -3x - \phantom{1}y + 2z &= -11,\\ -2x + \phantom{1}y + 2z &= -3. \end{aligned}
The standard hand-method is Gaussian elimination: use the first equation to wipe
x out of the other two, then the (new) second to wipe out
y, until the system is upper-triangular — a staircase with the last
equation involving only z. Solve that bottom line instantly, then
back-substitute upward: knowing z gives
y, and knowing both gives x. (Storing the
multipliers you used turns this into the celebrated LU decomposition, which factors
A = LU once and then solves for many different right-hand sides cheaply.)
Here is the whole procedure, coded and verified:
// Solve A x = b by Gaussian elimination, then back-substitution.
const A: number[][] = [
[ 2, 1, -1],
[-3, -1, 2],
[-2, 1, 2],
];
const b: number[] = [8, -11, -3];
const n = 3;
// Keep a pristine copy so we can check the answer at the end.
const A0 = A.map((row) => row.slice());
const b0 = b.slice();
// --- Forward elimination: make A upper-triangular ---
for (let k = 0; k < n; k++) {
for (let i = k + 1; i < n; i++) {
const factor = A[i][k] / A[k][k]; // how much of row k to subtract from row i
for (let j = k; j < n; j++) A[i][j] -= factor * A[k][j];
b[i] -= factor * b[k];
}
}
// --- Back-substitution: solve from the bottom row upward ---
const x: number[] = [0, 0, 0];
for (let i = n - 1; i >= 0; i--) {
let sum = b[i];
for (let j = i + 1; j < n; j++) sum -= A[i][j] * x[j];
x[i] = sum / A[i][i];
}
console.log(`solution: x = ${x[0].toFixed(3)}, y = ${x[1].toFixed(3)}, z = ${x[2].toFixed(3)}`);
// --- Verify A x ≈ b using the untouched original matrix ---
for (let i = 0; i < n; i++) {
const lhs = A0[i][0] * x[0] + A0[i][1] * x[1] + A0[i][2] * x[2];
console.log(`check row ${i}: A·x = ${lhs.toFixed(3)} (b = ${b0[i]})`);
}
Out pops x=2,\ y=3,\ z=-1, and the check confirms
A\mathbf{x}=\mathbf{b} to the last digit. For a
3\times3 system this is overkill you could do by hand — but the exact same
loop, unchanged, solves systems with thousands or millions of unknowns. Its cost grows like
n^3, which is why the big leagues of physics live and die by
sparse solvers that exploit all the zeros in A (most grid
points only talk to their neighbours) to bring that cost crashing down.
Because the universe is local and calculus is linear in the small. Discretise a
field — temperature, voltage, a quantum wavefunction — onto a grid, and each point's value gets tied
to its immediate neighbours by the differential equation (a Laplacian becomes "this point equals the
average of its neighbours," roughly). Every grid point contributes one equation; every unknown value
is one component of \mathbf{x}. A million-cell simulation is a
million-by-million matrix A — almost entirely zeros, since a point only
couples to a few neighbours. Solving A\mathbf{x}=\mathbf{b} quickly is
therefore the inner loop of computational physics, from weather models to stress analysis to
electronic-structure calculations. And when the physics is non-linear? Newton's method turns
it back into a sequence of linear solves — the two workhorses of this page, harnessed together.