The Inverse Function Theorem

A robot arm takes joint angles and returns the position of its hand — that is forward kinematics, a smooth map from angles to points in space. But the useful question runs the other way: I want the hand there; what angles get it there? That is inverse kinematics, and it asks you to undo a nonlinear map. When can a smooth map be undone, and does the undoing stay smooth?

The Inverse Function Theorem gives a stunningly clean answer, and it comes straight from the idea that a derivative is the best linear map. A map f near a point looks like its linear approximation Df(a). Linear maps are invertible exactly when their matrix is invertible — when the determinant is nonzero. The theorem says the nonlinear map inherits this: if its linearisation is invertible at a point, then f itself is invertible in a small neighbourhood of that point. Check one determinant, and you have earned a local inverse.

This is the workhorse behind changes of variable (polar, spherical), the solvability of equations, the shape of solution sets, and the whole toolkit of coordinate systems on manifolds. And the proof is not magic — it is the Contraction Mapping Theorem wearing a disguise.

The statement — one determinant does all the work

Let f : \mathbb{R}^n \to \mathbb{R}^n be continuously differentiable (C^1) on an open set, and let a be a point where the derivative matrix is invertible.

Suppose f is C^1 near a and the Jacobian Df(a) is invertible, i.e. \det Df(a) \ne 0. Then:

There are open neighbourhoods U of a and V of f(a) on which f : U \to V is a bijection — a local inverse f^{-1} : V \to U exists.
The inverse is itself C^1, and its derivative is the matrix inverse of the derivative of f:

D\big(f^{-1}\big)\big(f(a)\big) = \big[\,Df(a)\,\big]^{-1}.

In one dimension this is completely familiar. "Invertible Df(a)" means "f'(a) \ne 0," and the matrix-inverse formula collapses to the reciprocal rule you already know:

\big(f^{-1}\big)'\big(f(a)\big) = \frac{1}{f'(a)}.

A nonzero slope means the graph is strictly monotone through a, so it passes the horizontal-line test locally and can be read backwards. The theorem is the multivariable upgrade of that single, visual fact.

Reflection, reciprocal slope — and the one point where it fails

The picture below shows f(x) = x^3 and its inverse, the cube root, as mirror images across the dashed diagonal y = x — because reflecting a graph swaps the roles of input and output, which is exactly what inversion does. Slide the base point a: the tangent to f at (a, a^3) and the tangent to f^{-1} at the mirrored point (a^3, a) always have reciprocal slopes, 3a^2 and 1/(3a^2).

Now drive a to 0 and watch the hypothesis bite. There f'(0) = 0, so the Jacobian is not invertible — and the reciprocal 1/(3a^2) blows up. Geometrically the inverse's tangent stands bolt upright: the cube root has a vertical tangent at the origin and is not differentiable there. The map x^3 is still invertible at 0 (it is one-to-one), but the inverse is not smooth — precisely the guarantee the theorem withholds when \det Df = 0.

Why it is true — a fixed-point hunt

To solve f(x) = y for x near a, turn it into a fixed-point problem. Write A = Df(a) and define, for each target y, the map

\varphi_y(x) = x + A^{-1}\big(y - f(x)\big).

A point is fixed by \varphi_y exactly when A^{-1}(y - f(x)) = 0, i.e. when f(x) = y — so solving the equation is finding a fixed point. Now compute the derivative: D\varphi_y(x) = I - A^{-1}Df(x) = A^{-1}\big(Df(a) - Df(x)\big). Because f is C^1, Df(x) is close to Df(a) = A for x near a, so D\varphi_y is small — smaller than \tfrac12 in norm on a little ball.

A map whose derivative has norm below \tfrac12 is a contraction (with k = \tfrac12), and the ball is a complete space, so the Banach fixed point theorem delivers a unique x solving f(x) = y — for every y near f(a). That is the local bijection. A little more care (the same C^1 control) shows the resulting f^{-1} is differentiable with derivative [Df]^{-1}, obtained by differentiating the identity f\big(f^{-1}(y)\big) = y with the chain rule and inverting the matrix.

The theorem promises a neighbourhood, and not one inch more. It is easy to over-read "\det Df \ne 0 everywhere" as "globally invertible." It is not.

The cleanest cautionary tale lives in the plane, with complex squaring f(x, y) = (x^2 - y^2,\ 2xy). Its Jacobian determinant is 4(x^2 + y^2), which is nonzero everywhere except the origin. So by the theorem f is locally invertible at every nonzero point — yet globally it is two-to-one: (x, y) and its antipode (-x, -y) both map to the same place (squaring sends z and -z to z^2). Every point has a private neighbourhood on which f is one-to-one, but the neighbourhoods cannot be stitched into a single global inverse.

The one-variable shadow of this: f(x) = x^3 - x is C^1 on all of \mathbb{R}, but it is not monotone (it wiggles), so it is not globally invertible — even though it is locally invertible at every point where f'(x) = 3x^2 - 1 \ne 0. Local is a promise; global is a separate, harder question.

A worked change of variables

Take the polar-coordinate map f(r, \theta) = (r\cos\theta,\ r\sin\theta), the workhorse behind every polar integral. Its Jacobian is

Df(r, \theta) = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}, \qquad \det Df = r\cos^2\theta + r\sin^2\theta = r.

The determinant is r, nonzero as long as r \ne 0. So away from the origin the map is locally invertible — you can recover (r, \theta) from (x, y) — and the local inverse is smooth. At r = 0 the determinant vanishes and invertibility genuinely fails: the entire \theta-axis is crushed to the single point (0, 0), so no angle can be recovered there. That collapsing determinant is exactly the r that reappears as the area factor dA = r\,dr\,d\theta in polar integration — the theorem and the change-of-variables formula are two faces of the same Jacobian.