Root Finding and Linear Solvers

Somewhere overhead a satellite is exactly where Kepler's laws say it should be — and yet no one can write down, in closed form, where that is. To find a planet's position along its orbit you must solve Kepler's equation,

M = E - e\,\sin E,

for the eccentric anomaly E, given the mean anomaly M (which just ticks along uniformly with time) and the orbit's eccentricity e. That little equation has haunted astronomers since the 1600s, because E is trapped both outside and inside a sine: there is no formula that isolates it. You cannot algebra your way out. Instead you rearrange it into "find where a function crosses zero,"

f(E) = E - e\,\sin E - M = 0,

and hand it to a machine that hunts for the crossing. This is root finding: the art of solving f(x)=0 when no tidy formula exists. It is one of the two great workhorses of computational physics. The other is its cousin, the linear solver — the machinery for cracking A\mathbf{x} = \mathbf{b}, a whole system of equations at once. This page teaches both, and shows you the two most important root-finders in action: the slow-but-certain bisection method and the fast-but-fickle Newton–Raphson method.

Where do these "solve for zero" problems come from?

Once you start looking, physics is full of equations you cannot solve by hand:

Equilibrium points. A ball rolling in a potential U(x) rests where the force vanishes, f(x) = -U'(x) = 0. For any messy potential that is a root-finding problem.
Resonance and eigen-conditions. The allowed frequencies of a vibrating drum, or the bound-state energies of a quantum well, are the values that make some determinant or transcendental condition equal zero.
Implicit time steps. Stable ("implicit") integrators for stiff problems define the next state only implicitly — each step you must solve f(x_{n+1}) = 0 to find where you land.
Fitting and calibration. Matching a model to data means driving the derivative of a cost function to zero.

A root of f is simply an input x^\* where f(x^\*) = 0 — where the graph of f pierces the horizontal axis. Here is the Kepler residual f(E) = E - e\sin E - M for a modest orbit; watch for the single place the curve slices through zero.

Bisection: slow, dull, and unstoppable

The most trustworthy root-finder is also the simplest, and it rests on a fact you already believe: if a continuous function is negative at one point and positive at another, then somewhere in between it must pass through zero. (This is the Intermediate Value Theorem, but you knew it in your bones long before it had a name.) So to trap a root, first find two points a and b that bracket it — points where the function has opposite signs:

f(a)\cdot f(b) < 0.

Now squeeze. Look at the midpoint m = (a+b)/2. Check the sign of f(m). The root must lie in whichever half still shows a sign change — either [a,m] or [m,b] — so throw the other half away and repeat. Every step the bracket gets half as wide, and the root can never escape it.

Requirement. A bracket [a,b] with f(a)\,f(b) < 0 (a genuine sign change).
Step. Set m=(a+b)/2; keep the half whose endpoints still straddle zero.
Convergence is linear. After n steps the bracket width is the starting width divided by 2^n, so the error falls like 2^{-n} — you gain about one binary digit (roughly one decimal digit every 3.3 steps) per step. Steady, and guaranteed.

Let's watch it find \sqrt{2} by solving f(x)=x^2-2=0. Start with the bracket [0,2], since f(0)=-2<0 and f(2)=2>0. Press Run and watch the interval collapse onto 1.41421\ldots

Notice the width halving like clockwork: 2, 1, 0.5, 0.25,\ldots It never speeds up and it never fails. For a rough answer that is wonderful; for many digits it is exasperatingly slow. To reach a tolerance of 10^{-12} from a starting width of 2 takes about 41 steps. We can do far better — if we are willing to take a risk.

Newton–Raphson: ride the tangent

Bisection ignores everything about f except its sign. That is a lot of thrown-away information — the function also has a slope, and the slope points roughly toward the root. Newton's method (developed by Isaac Newton and refined by Joseph Raphson) exploits it. Stand at your current guess x_n, draw the tangent line to the curve there, and slide down it to where it hits the axis. That landing point is your next, better guess.

The tangent at x_n has slope f'(x_n) and passes through \bigl(x_n, f(x_n)\bigr). Setting its height to zero and solving for the crossing gives the whole method in one line:

The update. x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.
Convergence is quadratic. Near a simple root the number of correct digits roughly doubles every step — 3\to 6\to 12 — so a handful of iterations reaches machine precision.
The price. You must know the derivative f', and you must start close enough. Far from the root, or where f'\approx 0, it can overshoot wildly or diverge.

Take the same problem, f(x)=x^2-2, whose derivative is f'(x)=2x. The update becomes x_{n+1} = x_n - \dfrac{x_n^2-2}{2x_n}, which you may recognise as the ancient Babylonian square-root rule "average your guess with 2/x." Starting from x_0=2, watch the error nose-dive:

Count the leading nines and zeros: the error goes roughly 10^{-1}\to 10^{-2}\to 10^{-5}\to 10^{-10}\to essentially zero. Where bisection needed about 41 steps to reach 10^{-12}, Newton gets there in five. That is the difference between linear and quadratic convergence — and it is why, when you can supply a derivative and a decent starting guess, Newton wins by a landslide.

Back to the planets: Newton solves Kepler

Now we can crack the equation we opened with. To find the eccentric anomaly we hunt for the root of f(E) = E - e\sin E - M, whose derivative is a gift: f'(E) = 1 - e\cos E. Because 0 \le e < 1, that derivative is always positive, so the residual rises monotonically and Newton behaves beautifully. A classic first guess is simply E_0 = M. Solve for a body one radian of mean anomaly into an orbit of eccentricity e=0.2:

Four hundred years of astronomers would have wept with joy: a problem with no closed form solved to twelve digits in four lines and a few microseconds. Every planetarium app, every spacecraft navigation routine, runs a loop like this millions of times.

Watch out — faster is not the same as safer. Newton's quadratic speed is real, but it comes with sharp teeth. Feed it a bad starting point and it can hurl your guess to the far side of the number line; land it near a spot where f'(x)\approx 0 (a nearly flat tangent) and the step f/f' blows up, launching you off to infinity. It can even fall into a cycle, bouncing between two wrong values forever. Try f(x)=x^3-2x+2 from x_0=0 and you get 0\to 1\to 0\to 1\to\cdots, trapped for eternity.

Bisection cannot do any of that. Give it a genuine bracket and it is guaranteed to converge, no derivative required, no drama. Its only demand is a real sign change — which is also its one blind spot: it cannot find a double root that merely touches the axis without crossing it (like f(x)=x^2 at the origin), because there is no sign change to bracket. In practice the pros use hybrid methods (see below) that take Newton's fast steps when they're behaving and fall back to a safe bisection step when they're not — the best of both temperaments.

Gloriously true. Push Newton's method into the complex plane — apply it to something like f(z)=z^3-1, which has three roots spaced around a circle — and colour each starting point by which root it eventually tumbles into. The result is a Newton fractal: three interlocking basins whose boundary is infinitely intricate, swirling and self-similar at every scale, with the astonishing property that at any point on the border, all three colours meet. The lesson is humbling: "start close enough" hides a boundary of bottomless complexity. A tiny nudge in your initial guess can send you to a completely different answer.

The other workhorse: solving A\mathbf{x}=\mathbf{b}

Root finding tackles one equation in one unknown. But physics is forever handing us many equations in many unknowns, all bound together. Balance the forces in a truss, apply Kirchhoff's laws to a circuit, or lay a grid over a region and demand that a temperature or potential field satisfy a differential equation at every grid point — in each case you end up with a linear system: a rectangle of coefficients A multiplying a column of unknowns \mathbf{x} to give a column of knowns \mathbf{b},

A\mathbf{x} = \mathbf{b}.

For a concrete taste, take three equations in three unknowns:

\begin{aligned} 2x + \phantom{1}y - \phantom{2}z &= 8,\\ -3x - \phantom{1}y + 2z &= -11,\\ -2x + \phantom{1}y + 2z &= -3. \end{aligned}

The standard hand-method is Gaussian elimination: use the first equation to wipe x out of the other two, then the (new) second to wipe out y, until the system is upper-triangular — a staircase with the last equation involving only z. Solve that bottom line instantly, then back-substitute upward: knowing z gives y, and knowing both gives x. (Storing the multipliers you used turns this into the celebrated LU decomposition, which factors A = LU once and then solves for many different right-hand sides cheaply.) Here is the whole procedure, coded and verified:

// Solve A x = b by Gaussian elimination, then back-substitution. const A: number[][] = [ [ 2, 1, -1], [-3, -1, 2], [-2, 1, 2], ]; const b: number[] = [8, -11, -3]; const n = 3; // Keep a pristine copy so we can check the answer at the end. const A0 = A.map((row) => row.slice()); const b0 = b.slice(); // --- Forward elimination: make A upper-triangular --- for (let k = 0; k < n; k++) { for (let i = k + 1; i < n; i++) { const factor = A[i][k] / A[k][k]; // how much of row k to subtract from row i for (let j = k; j < n; j++) A[i][j] -= factor * A[k][j]; b[i] -= factor * b[k]; } } // --- Back-substitution: solve from the bottom row upward --- const x: number[] = [0, 0, 0]; for (let i = n - 1; i >= 0; i--) { let sum = b[i]; for (let j = i + 1; j < n; j++) sum -= A[i][j] * x[j]; x[i] = sum / A[i][i]; } console.log(`solution: x = ${x[0].toFixed(3)}, y = ${x[1].toFixed(3)}, z = ${x[2].toFixed(3)}`); // --- Verify A x ≈ b using the untouched original matrix --- for (let i = 0; i < n; i++) { const lhs = A0[i][0] * x[0] + A0[i][1] * x[1] + A0[i][2] * x[2]; console.log(`check row ${i}: A·x = ${lhs.toFixed(3)} (b = ${b0[i]})`); }

Out pops x=2,\ y=3,\ z=-1, and the check confirms A\mathbf{x}=\mathbf{b} to the last digit. For a 3\times3 system this is overkill you could do by hand — but the exact same loop, unchanged, solves systems with thousands or millions of unknowns. Its cost grows like n^3, which is why the big leagues of physics live and die by sparse solvers that exploit all the zeros in A (most grid points only talk to their neighbours) to bring that cost crashing down.

Because the universe is local and calculus is linear in the small. Discretise a field — temperature, voltage, a quantum wavefunction — onto a grid, and each point's value gets tied to its immediate neighbours by the differential equation (a Laplacian becomes "this point equals the average of its neighbours," roughly). Every grid point contributes one equation; every unknown value is one component of \mathbf{x}. A million-cell simulation is a million-by-million matrix A — almost entirely zeros, since a point only couples to a few neighbours. Solving A\mathbf{x}=\mathbf{b} quickly is therefore the inner loop of computational physics, from weather models to stress analysis to electronic-structure calculations. And when the physics is non-linear? Newton's method turns it back into a sequence of linear solves — the two workhorses of this page, harnessed together.