Root Finding: Bisection and Newton

Solve x^2 - 2 = 0 and you may write down x = \sqrt{2} with a flourish — but that symbol is a promise, not a number. Ask what \sqrt2 actually is, to a dozen decimal places, and the flourish evaporates: 1.41421356\ldots comes from somewhere, and that somewhere is an algorithm. Now try x - \cos x = 0, or x\,e^{x} = 1, or a fifth-degree polynomial. Here there is not even a symbol to hide behind: Abel and Galois proved in the 1820s that no formula in radicals can solve the general quintic, and transcendental mixes of polynomials with sines and exponentials are hopeless from the start. The equations of real mathematics are, overwhelmingly, equations you cannot solve in closed form.

So we change the question. Instead of "what is x?", we ask "for which x does some function f hit zero?", and we build a machine that hunts the crossing to as many digits as we please. This is root finding — solving f(x) = 0 by iteration — and it is the beating heart of numerical analysis. This page teaches the two methods every mathematician should know cold: bisection, the tortoise that is slow but never fails, and Newton–Raphson, the hare that is blisteringly fast but occasionally hurls itself off a cliff. You will meet both, watch them race to \sqrt2, and learn exactly when to trust which.

What "solve f(x)=0" really means

A root (or zero) of a function f is a value x^\* with f(x^\*) = 0 — geometrically, a point where the graph of f pierces the horizontal axis. Almost every "solve for x" problem can be pushed into this shape by moving everything to one side: g(x) = h(x) becomes f(x) = g(x) - h(x) = 0. Want a fixed point x = \cos x? Root-find on f(x) = x - \cos x. Want to invert a function, y = \phi(x) for a known y? Root-find on f(x) = \phi(x) - y. The single template "drive a function to zero" swallows an astonishing fraction of computational mathematics.

Here is a concrete target for the rest of the page: the function f(x) = x^2 - 2, whose positive root is exactly \sqrt 2. Watch where the curve slices through the axis — that single crossing between x=1 and x=2 is the number we are hunting.

Bisection: the Intermediate Value Theorem, made mechanical

The safest root-finder rests on a theorem you have believed since before you could name it. If a continuous function is negative somewhere and positive somewhere else, then between those two places it must cross zero — it cannot leap the axis without touching it. That is the Intermediate Value Theorem, and it hands us a way to trap a root. Find two points a and b whose function values have opposite signs — a bracket:

f(a)\cdot f(b) < 0.

The root is somewhere inside [a,b]. Now squeeze. Evaluate the midpoint m = (a+b)/2 and look at the sign of f(m). The sign change must survive in exactly one of the two halves [a,m] or [m,b] — so keep that half and discard the other. Repeat. Each step the interval containing the root gets half as wide, and the root can never slip out.

Requirement. A continuous f and a bracket [a,b] with f(a)\,f(b) < 0 (a genuine sign change).
Step. Set m = (a+b)/2; replace the endpoint whose sign matches f(m), so the surviving half still straddles zero.
Linear convergence. After n steps the bracket has width b_n - a_n = \frac{b_0 - a_0}{2^{\,n}}, so the error falls like 2^{-n} — you win one binary digit (one bit) per step, about one decimal digit every \log_2 10 \approx 3.32 steps. Unspectacular, but guaranteed.

Let us find \sqrt 2 by bisecting f(x) = x^2 - 2 on [0,2], where f(0) = -2 < 0 and f(2) = 2 > 0. Press Run and watch the interval collapse, the error column shedding one bit each line:

The width marches down 2,\,1,\,0.5,\,0.25,\ldots — a perfect geometric halving — and the error trails it, gaining a single bit each row and never once faltering. That reliability has a cost: to squeeze the starting width of 2 down to a tolerance of 10^{-12} you need n \ge \log_2(2/10^{-12}) \approx 41 steps. Forty-one evaluations for twelve digits is dull work. Surely we can use more of what the function is telling us than just its sign?

Newton–Raphson: ride the tangent to the axis

Bisection throws away almost everything about f — it looks only at the sign, never the steepness. But the slope of the curve points, quite literally, toward the root. Newton's method uses it. Stand at your current guess x_n, draw the tangent line to the graph there, and follow that line down to where it meets the axis. Because a smooth curve hugs its tangent nearby, that landing point is usually a far better guess than where you started.

Turning the picture into algebra needs one ingredient: the derivative f'(x_n), which is the slope of that tangent. The tangent line at x_n passes through \bigl(x_n, f(x_n)\bigr) with slope f'(x_n), so its equation is

y = f(x_n) + f'(x_n)\,(x - x_n).

Set y = 0 and solve for the crossing x = x_{n+1}. Rearranging 0 = f(x_n) + f'(x_n)(x_{n+1} - x_n) gives the whole method in a single line:

The update. x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.
Quadratic convergence. Near a simple root the error obeys |e_{n+1}| \approx C\,|e_n|^2, so the number of correct digits roughly doubles every step — 2 \to 4 \to 8 \to 16 — and a handful of iterations reaches machine precision.
The price. You must supply the derivative f', and you must start close enough. Where f'(x_n) \approx 0 the step f/f' blows up, and from a poor start the iteration can overshoot, diverge, or cycle.

Take the same problem, f(x) = x^2 - 2 with f'(x) = 2x. The update becomes x_{n+1} = x_n - \frac{x_n^2 - 2}{2x_n} = \frac{1}{2}\!\left(x_n + \frac{2}{x_n}\right), which you may recognise as the ancient Babylonian square-root rule: "average your guess with 2/x_n." Start from x_0 = 2 and watch the error nose-dive — each row roughly squares the last:

Read the error column as a story of doubling precision: it falls roughly 10^{-1} \to 10^{-2} \to 10^{-5} \to 10^{-10} \to essentially the machine's floor. Where bisection needed about 41 steps to reach 10^{-12}, Newton is already past it by step 5. That gulf — linear versus quadratic convergence — is the whole reason Newton dominates whenever a derivative is available and the start is decent.

A one-line Taylor argument. Let e_n = x_n - x^\* be the error. Expand f about the root x^\* (where f(x^\*)=0): f(x_n) = f'(x^\*)e_n + \tfrac12 f''(x^\*)e_n^2 + \cdots. Feed this and f'(x_n) \approx f'(x^\*) into the update and the first-order terms cancel exactly — that is the point of aiming at the tangent's root — leaving e_{n+1} \approx \frac{f''(x^\*)}{2\,f'(x^\*)}\;e_n^{\,2}. The new error is proportional to the square of the old one, which is precisely "the correct digits double." Notice the catch hiding in the denominator: if f'(x^\*) = 0 — a repeated root, where the curve only touches the axis — the cancellation is spoiled and Newton drops back to merely linear speed.

A second worked example: a cubic with no nice root

Square roots are almost too friendly. Try f(x) = x^3 - x - 2 = 0, a cubic whose one real root is an ugly irrational near 1.52 (the closed-form Cardano expression exists but is a horror of nested radicals — nobody sane evaluates it by hand). Its derivative is f'(x) = 3x^2 - 1, so Newton reads x_{n+1} = x_n - \frac{x_n^3 - x_n - 2}{3x_n^2 - 1}. We add a proper stopping rule this time — quit when the residual |f(x)| is below a tolerance, and keep a maximum-iteration guard so a misbehaving run can never loop forever:

Four or five steps, twelve correct digits, and a residual indistinguishable from zero — for a number no formula will politely hand you. This little loop, with its tolerance test and iteration cap, is the skeleton of essentially every production root-finder you will ever use.

Newton's Achilles heel is that it demands f'. When the derivative is expensive, unavailable, or you are simply too lazy, replace the true tangent with the secant through your two most recent points — approximating the slope by a finite difference: x_{n+1} = x_n - f(x_n)\,\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}. This secant method needs no derivative and converges at rate \varphi \approx 1.618 (yes, the golden ratio) — slower than Newton's 2 but far faster than bisection's 1, and each step costs only one new function evaluation. It is the pragmatic middle child of the family.

Knowing when to stop

An iteration produces an endless stream of guesses; you must decide when one is good enough. There is no single right answer, so a robust solver combines several tests:

Small residual: stop when |f(x_n)| < \tau_f. The catch — a nearly-flat function can have a small residual far from its root, so this alone can lie.
Small step / interval: stop when successive guesses barely move, |x_{n+1} - x_n| < \tau_x, or (for bisection) when the bracket width b - a drops below \tau_x. This directly bounds how far you can be from the root.
Iteration guard: always cap the loop at some N_{\max}. It costs nothing when things go well and is your only defence when they do not — a divergent or cycling Newton run must never become an infinite loop.

Push the tolerances too tight and you smash into the wall from the previous lesson: |f(x)| can never get meaningfully below the floating-point noise floor near the root, so asking for a tolerance smaller than a few times machine epsilon just spins the loop forever. Sensible defaults are a tolerance a modest multiple of \varepsilon_{\text{mach}} and a generous but finite iteration cap.

Quadratic speed comes with sharp teeth. Newton offers no guarantee of convergence, and it fails in more ways than beginners expect:

Flat-tangent overshoot. If a guess lands where f'(x_n) \approx 0, the step f/f' divides by almost nothing and flings the next guess to the far side of the number line.
Divergence from a bad start. Too far from the root, the tangent can point the wrong way entirely — the classic f(x) = \sqrt[3]{x} from any x_0 \ne 0 marches steadily away from its root, the error growing by a factor of two each step.
Cycling. Newton can fall into a closed loop and orbit forever. For f(x) = x^3 - 2x + 2 started at x_0 = 0 the iterates bounce 0 \to 1 \to 0 \to 1 \to \cdots, never nearing the true root. Run it and watch:

And bisection has its own blind spot, the mirror image of these: it cannot find a double root that only touches the axis without crossing it — like f(x) = x^2 at the origin — because there is no sign change to bracket. The professional's answer is a hybrid (Brent's method is the famous one): take Newton's fast steps while they behave, and fall back to a safe bisection step the instant they misbehave — the speed of the hare with the safety net of the tortoise.