The Hahn–Banach Theorem

Suppose you know the fair price of a few goods, and only those few. You know a loaf costs 2 and a litre of milk costs 1, and you know the rule you are using is linear — the price of a mixed basket is the sum of the parts, and buying twice as much costs twice as much. Now someone hands you a good you have never priced. Can you extend your price list to the new item consistently — keeping it linear, and without ever quoting a price so wild that it violates the market ceiling you have promised never to exceed?

This is exactly the situation of functional analysis, phrased in money. Your partial price list is a bounded linear functional f, defined only on a subspace M of a big space X. The "market ceiling" is a bound on how large f is allowed to be — its norm. The question is whether f can be extended to a functional F on all of X that agrees with f where it was already defined and is no bigger — same norm, not one unit larger.

The astonishing answer is yes, always. This is the Hahn–Banach theorem, and it is the single most-used existence theorem in the subject. It manufactures functionals out of thin air, and every time it does so it certifies that the extension is as small as it possibly could be. From this one guarantee flows almost everything that makes the dual space useful: that there are enough functionals to tell points apart, to measure lengths exactly, and to separate a point from a convex set by a flat wall.

The extension problem, precisely

Let X be a normed vector space over \mathbb{K} (read \mathbb{R} throughout, with the complex case noted where it differs), and let M \subseteq X be a linear subspace. A bounded linear functional on the subspace,

f : M \to \mathbb{K}, \qquad \|f\|_M = \sup_{\substack{m \in M \\ \|m\| = 1}} |f(m)|,

is a perfectly good measuring device — but it can only read vectors that happen to lie in M. We want to promote it to a device that reads every vector of X, without changing any reading it already gives and without amplifying it. Formally we seek an extension F : X \to \mathbb{K} that is

linear on all of X;
agreeing: F(m) = f(m) for every m \in M (written F|_M = f); and
norm-preserving: \|F\|_X = \|f\|_M.

The first two are easy to arrange and cheap: pick any complement, define F however you like off M, and you have a linear extension. The whole difficulty — and the whole content of Hahn–Banach — is the third bullet. Any old extension will generally have a larger norm; making the extension no bigger than the original is the miracle. Note also that \|F\| \ge \|f\| is automatic (a supremum over a bigger set of unit vectors is at least as large), so "norm-preserving" is really the one-sided claim \|F\| \le \|f\|.

The analytic form: domination by a sublinear functional

The cleanest and most general statement does not mention a norm at all. It replaces "the norm ceiling" by an arbitrary sublinear functional — a gauge that behaves like a norm in the two respects the proof actually needs.

A map p : X \to \mathbb{R} is sublinear if it is

subadditive: p(x + y) \le p(x) + p(y) for all x, y; and
positively homogeneous: p(\lambda x) = \lambda\, p(x) for all \lambda \ge 0.

A norm, a seminorm, and the support function p(x) = \sup_{a \in C} \langle a, x\rangle of a bounded convex set are all sublinear. Sublinear functionals need not be symmetric — p(-x) may differ from p(x) — which is exactly what lets the theorem reach into convex geometry.

Let X be a real vector space, p a sublinear functional on X, and M \subseteq X a subspace on which a linear functional f is dominated by p:

f(m) \le p(m) \qquad \text{for all } m \in M.

Then f extends to a linear functional F on all of X that is still dominated everywhere:

F|_M = f \qquad \text{and} \qquad F(x) \le p(x) \ \text{ for all } x \in X.

Notice there is no completeness, no topology, not even a norm in the hypotheses — just a bare real vector space and a sublinear gauge. Over \mathbb{C} the correct statement (Bohnenblust–Sobczyk) uses a seminorm p and controls the modulus, |F(x)| \le p(x); it is proved by applying the real theorem to \operatorname{Re} f and then recovering the imaginary part from f(x) = \operatorname{Re} f(x) - i\operatorname{Re} f(ix).

The normed-space form: norm-preserving extension

The version you meet most often is a one-line corollary of the analytic form: feed it the right gauge. Take p(x) = \|f\|_M\,\|x\| — a genuine sublinear functional (in fact a norm, up to the scalar). On M the defining inequality |f(m)| \le \|f\|_M\,\|m\| says precisely that f \le p there (and -f \le p too), so the hypothesis of the analytic theorem is met.

Let M be a subspace of a normed space X and let f \in M^* be a bounded linear functional. Then there is a bounded linear functional F \in X^* with

F|_M = f \qquad \text{and} \qquad \|F\|_X = \|f\|_M.

Every bounded functional on a subspace extends to the whole space with the same norm.

The domination F(x) \le \|f\|_M\,\|x\| handed back by the analytic theorem, applied to both x and -x, gives |F(x)| \le \|f\|_M\,\|x\|, i.e. \|F\| \le \|f\|_M; and since restricting can only shrink a norm, \|F\| \ge \|f\|_M as well. So \|F\| = \|f\|_M exactly — the ceiling is met with equality.

How the proof works: one dimension at a time, then Zorn

The proof has two movements. The first is a concrete, hands-on lemma; the second is the transfinite bookkeeping that iterates it across the whole space.

Movement 1 — extend by a single new direction. Suppose f \le p on M and pick a vector x_0 \notin M. Every element of the enlarged subspace M' = M \oplus \mathbb{R}x_0 is uniquely m + t x_0, so a linear extension is forced to have the shape

F(m + t x_0) = f(m) + t\,\alpha,

and the only freedom is the single number \alpha = F(x_0). We must choose \alpha so that domination survives: f(m) + t\alpha \le p(m + t x_0) for all m and all t. Splitting into t > 0 and t < 0 and using positive homogeneity, this collapses to the two-sided requirement

\sup_{m' \in M}\big[\, f(m') - p(m' - x_0)\,\big] \;\le\; \alpha \;\le\; \inf_{m \in M}\big[\, p(m + x_0) - f(m)\,\big].

Such an \alpha exists iff the left supremum does not exceed the right infimum — and it does not, thanks to subadditivity: for any m, m' \in M,

f(m') + f(m) = f(m' + m) \le p(m' + m) = p\big((m' - x_0) + (m + x_0)\big) \le p(m' - x_0) + p(m + x_0),

which rearranges to f(m') - p(m' - x_0) \le p(m + x_0) - f(m). Taking the sup on the left and the inf on the right leaves a nonempty gap, and any \alpha in that gap gives a valid one-dimension-larger extension. (That the gap can have positive width, with room to spare, is the seed of the non-uniqueness we flag below.)

Movement 2 — reach all of X with Zorn's lemma. In finite dimensions you would just repeat Movement 1 a finite number of times. In general, order the set of all dominated extensions (N, g) — subspaces N \supseteq M carrying an extension g of f with g \le p — by "extends". Every chain has an upper bound (take the union of the domains, and the union of the maps, which is well defined and still dominated). Zorn's lemma then supplies a maximal dominated extension (N^\ast, F). If N^\ast were not all of X, Movement 1 would enlarge it by one dimension — contradicting maximality. Hence N^\ast = X, and F is the extension we wanted.

Movement 2 leaned on Zorn's lemma, which is equivalent to the axiom of choice. Is that essential, or a convenience? For separable or otherwise countably-generated spaces you can often iterate Movement 1 along a countable dense skeleton and avoid heavy choice. But in full generality Hahn–Banach genuinely transcends the constructive: it produces functionals no formula could write down (the "Banach limits" and finitely-additive measures are built exactly this way).

Curiously, Hahn–Banach does not require the full strength of choice. It follows from the strictly weaker ultrafilter lemma (the Boolean prime ideal theorem), and there are models of set theory where Hahn–Banach holds yet the axiom of choice fails. And note what the hypotheses never mention: completeness plays no role. The theorem works on any normed space — indeed any real vector space with a sublinear gauge — whether or not it is a Banach space. That is unusual and valuable: most of functional analysis needs completeness, and Hahn–Banach is the great exception that does not.

Consequence 1: the dual space is rich — norming functionals

A one-dimensional subspace is the smallest interesting place to start a functional, and Hahn–Banach turns it into a functional on the whole space. This single move is responsible for the fact that X^* is large enough to be useful at all.

For every nonzero x \in X there is a functional f \in X^* with

\|f\| = 1 \qquad \text{and} \qquad f(x) = \|x\|.

Proof. On the line M = \operatorname{span}\{x\} define f_0(\lambda x) = \lambda \|x\|. It is linear, and |f_0(\lambda x)| = |\lambda|\,\|x\| = \|\lambda x\|, so \|f_0\|_M = 1 and f_0(x) = \|x\|. Hahn–Banach extends f_0 to f \in X^* with the same norm 1, still hitting f(x) = \|x\|. □

Three headline corollaries follow immediately, and each is used constantly:

Functionals separate points. If x \ne y, apply the theorem to x - y \ne 0: some f \in X^* has f(x - y) = \|x - y\| \ne 0, so f(x) \ne f(y). The dual is big enough to tell any two vectors apart — without Hahn–Banach, X^* could in principle be tiny or even trivial.
The norm is recovered by testing functionals. The norming functional makes the inequality |f(x)| \le \|f\|\,\|x\| tight, giving the elegant duality formula \|x\| = \sup_{\substack{f \in X^* \\ \|f\| \le 1}} |f(x)| = \max_{\|f\|\le 1} f(x). The supremum is attained (it is a genuine \max), mirroring \|f\| = \sup_{\|x\|\le 1} |f(x)| from the other side.
The canonical embedding is an isometry. Recall J : X \to X^{**}, J(x) = \hat{x} with \hat{x}(f) = f(x). Boundedness gave the easy half \|\hat{x}\| \le \|x\|; the norming functional supplies the reverse, because with \|f\| = 1 and f(x) = \|x\| we get \|\hat{x}\| \ge |\hat{x}(f)| = |f(x)| = \|x\|. Hence \|\hat{x}\| = \|x\|: the map into the double dual is a norm-preserving embedding, and this is where the isometry claimed there is actually earned.

Consequence 2: the geometric form — separating convex sets

Turn the picture ninety degrees and Hahn–Banach becomes a statement about walls. A functional's level set \{x : f(x) = c\} is a hyperplane — a flat wall of codimension one — and it splits the space into two half-spaces \{f \le c\} and \{f \ge c\}. The geometric Hahn–Banach theorem says convex sets can be told apart by such a wall.

Point / convex set. If C \subseteq X is open and convex and x_0 \notin C, there is f \in X^* and c \in \mathbb{R} with f(x) < c \le f(x_0) for all x \in C.
Two convex sets. If A, B are disjoint convex sets with A open, a hyperplane separates them: f(a) < c \le f(b) for all a \in A, b \in B.
Strict (compact / closed) form. If A is compact, B closed, both convex and disjoint (in a locally convex space), they can be strictly separated: f(a) \le c_1 < c_2 \le f(b).

The bridge from the extension theorem to this geometry is the Minkowski gauge of the convex set. Translate C to contain the origin and define p(x) = \inf\{\, t > 0 : x/t \in C \,\}. Convexity makes p sublinear, and C = \{p < 1\}. On the line through x_0 set f_0(\lambda x_0) = \lambda\, p(x_0); because x_0 \notin C we have p(x_0) \ge 1, so f_0 \le p on that line. The analytic Hahn–Banach extends f_0 to f \le p on all of X, and the wall \{f = 1\} then slides between C and x_0. Convex geometry and functional extension are, at bottom, the same theorem.

The special case where the wall touches the set is a supporting hyperplane: at every boundary point of a convex body there is a functional whose level set brushes the boundary there and keeps the whole body on one side. Rotate the outward direction all the way around and the supporting line rolls around the boundary — the picture below.

A worked example you can hold in your hand

Take the plane with the taxicab norm \|(x, y)\|_1 = |x| + |y|, and let M = \{(x, 0) : x \in \mathbb{R}\} be the horizontal axis. Define f(x, 0) = x. On M the norm is \|(x,0)\|_1 = |x|, so \|f\|_M = \sup_{|x| = 1} |x| = 1.

Any linear extension has the form F(x, y) = x + \beta y for some \beta. Its norm on (\mathbb{R}^2, \|\cdot\|_1) is

\|F\| = \sup_{|x| + |y| \le 1} |x + \beta y| = \max(1,\, |\beta|).

So F is a norm-preserving extension precisely when |\beta| \le 1. Hahn–Banach promised at least one such extension exists — and here we see it delivers a whole interval of them, \beta \in [-1, 1]. The endpoints \beta = \pm 1 are the "corner" functionals; \beta = 0 is the flat one. Every one of them restricts to f on the axis and has norm exactly 1.

The norm-preserving extension is not unique. It is tempting to say "the Hahn–Banach extension", but the theorem is an existence statement, not a construction, and the worked example above exhibits a continuum of equally valid extensions. Geometrically this is the same phenomenon as a supporting line at a corner of a convex body: at a smooth boundary point exactly one supporting line touches, but at a sharp vertex a whole fan of lines does. The taxicab unit ball is a diamond, all corners, so its functionals branch.

When is the extension unique? Precisely when the geometry is smooth: uniqueness holds when the dual unit ball is strictly convex, which is the case for a Hilbert space and for \ell^p, L^p with 1 < p < \infty. So do not overload the theorem: it guarantees a norm-preserving extension exists, never that it is the only one. Two further traps to retire while you are here: Hahn–Banach needs no inner product and no completeness — it is a theorem about bare vector spaces, and reaching for "Hilbert" or "Banach" as a hypothesis is a reflex to unlearn.

Why it matters — the shape of what follows

Hahn–Banach is the reason the dual space is worth studying: it certifies that X^* is populous enough to see everything about X. From the norming functional you get that functionals separate points, that the norm can be computed dually, and that X \hookrightarrow X^{**} isometrically — the launch pad for reflexivity and the weak topologies. From the geometric form you get separation of convex sets, which underwrites the whole of convex analysis: supporting hyperplanes, the bipolar theorem, duality in optimisation, and the existence of Lagrange multipliers and subgradients. It sits beside the other three pillars of the subject — the uniform boundedness, open mapping, and closed graph theorems — but unlike those it needs no Baire category and no completeness, only the willingness to extend, one direction at a time, all the way out.