The Inverse Function Theorem
A robot arm takes joint angles and returns the position of its hand — that is forward
kinematics, a smooth map from angles to points in space. But the useful question runs the other
way: I want the hand there; what angles get it there? That is inverse kinematics,
and it asks you to undo a nonlinear map. When can a smooth map be undone, and does
the undoing stay smooth?
The Inverse Function Theorem gives a stunningly clean answer, and it comes straight
from the idea that
a derivative is the best linear map.
A map f near a point looks like its linear approximation
Df(a). Linear maps are invertible exactly when their matrix is invertible
— when the determinant is nonzero. The theorem says the nonlinear map inherits this: if its
linearisation is invertible at a point, then f itself is
invertible in a small neighbourhood of that point. Check one determinant, and you have earned a local
inverse.
This is the workhorse behind changes of variable (polar, spherical), the solvability of equations,
the shape of solution sets, and the whole toolkit of coordinate systems on manifolds. And the proof
is not magic — it is the
Contraction Mapping Theorem
wearing a disguise.
The statement — one determinant does all the work
Let f : \mathbb{R}^n \to \mathbb{R}^n be
continuously differentiable (C^1) on an open set, and let
a be a point where the derivative matrix is invertible.
Suppose f is C^1 near
a and the Jacobian Df(a) is invertible, i.e.
\det Df(a) \ne 0. Then:
-
There are open neighbourhoods U of a and
V of f(a) on which
f : U \to V is a bijection — a local inverse
f^{-1} : V \to U exists.
-
The inverse is itself C^1, and its derivative is the
matrix inverse of the derivative of f:
D\big(f^{-1}\big)\big(f(a)\big) = \big[\,Df(a)\,\big]^{-1}.
In one dimension this is completely familiar. "Invertible Df(a)" means
"f'(a) \ne 0," and the matrix-inverse formula collapses to the reciprocal
rule you already know:
\big(f^{-1}\big)'\big(f(a)\big) = \frac{1}{f'(a)}.
A nonzero slope means the graph is strictly monotone through a, so it
passes the horizontal-line test locally and can be read backwards. The theorem is the
multivariable upgrade of that single, visual fact.
Reflection, reciprocal slope — and the one point where it fails
The picture below shows f(x) = x^3 and its inverse, the cube root, as
mirror images across the dashed diagonal y = x — because reflecting a
graph swaps the roles of input and output, which is exactly what inversion does. Slide the base point
a: the tangent to f at
(a, a^3) and the tangent to f^{-1} at the
mirrored point (a^3, a) always have reciprocal slopes,
3a^2 and 1/(3a^2).
Now drive a to 0 and watch the hypothesis bite.
There f'(0) = 0, so the Jacobian is not invertible — and the
reciprocal 1/(3a^2) blows up. Geometrically the inverse's tangent stands
bolt upright: the cube root has a vertical tangent at the origin and is not
differentiable there. The map x^3 is still invertible at
0 (it is one-to-one), but the inverse is not smooth — precisely
the guarantee the theorem withholds when \det Df = 0.
Why it is true — a fixed-point hunt
To solve f(x) = y for x near
a, turn it into a fixed-point problem. Write
A = Df(a) and define, for each target y, the map
\varphi_y(x) = x + A^{-1}\big(y - f(x)\big).
A point is fixed by \varphi_y exactly when
A^{-1}(y - f(x)) = 0, i.e. when f(x) = y — so
solving the equation is finding a fixed point. Now compute the derivative:
D\varphi_y(x) = I - A^{-1}Df(x) = A^{-1}\big(Df(a) - Df(x)\big). Because
f is C^1, Df(x) is
close to Df(a) = A for x near
a, so D\varphi_y is small — smaller
than \tfrac12 in norm on a little ball.
A map whose derivative has norm below \tfrac12 is a
contraction (with k = \tfrac12), and the ball is a
complete space, so the
Banach fixed point theorem
delivers a unique x solving
f(x) = y — for every y near
f(a). That is the local bijection. A little more care (the same
C^1 control) shows the resulting f^{-1} is
differentiable with derivative [Df]^{-1}, obtained by differentiating the
identity f\big(f^{-1}(y)\big) = y with the chain rule and inverting the
matrix.
The theorem promises a neighbourhood, and not one inch more. It is easy to over-read
"\det Df \ne 0 everywhere" as "globally invertible." It is not.
The cleanest cautionary tale lives in the plane, with complex squaring
f(x, y) = (x^2 - y^2,\ 2xy). Its Jacobian determinant is
4(x^2 + y^2), which is nonzero everywhere except the origin. So
by the theorem f is locally invertible at every nonzero point — yet
globally it is two-to-one: (x, y) and its antipode
(-x, -y) both map to the same place (squaring sends
z and -z to z^2).
Every point has a private neighbourhood on which f is one-to-one, but the
neighbourhoods cannot be stitched into a single global inverse.
The one-variable shadow of this: f(x) = x^3 - x is
C^1 on all of \mathbb{R}, but it is not
monotone (it wiggles), so it is not globally invertible — even though it is locally invertible at
every point where f'(x) = 3x^2 - 1 \ne 0. Local is a promise;
global is a separate, harder question.
A worked change of variables
Take the polar-coordinate map f(r, \theta) = (r\cos\theta,\ r\sin\theta),
the workhorse behind
every polar
integral. Its Jacobian is
Df(r, \theta) = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}, \qquad \det Df = r\cos^2\theta + r\sin^2\theta = r.
The determinant is r, nonzero as long as r \ne 0.
So away from the origin the map is locally invertible — you can recover
(r, \theta) from (x, y) — and the local inverse
is smooth. At r = 0 the determinant vanishes and invertibility genuinely
fails: the entire \theta-axis is crushed to the single point
(0, 0), so no angle can be recovered there. That collapsing determinant is
exactly the r that reappears as the area factor
dA = r\,dr\,d\theta in polar integration — the theorem and the
change-of-variables formula are two faces of the same Jacobian.