Hilbert Spaces Revisited

You have already met Hilbert spaces as inner-product spaces where lengths and angles make sense. Now we return to them from the functional-analysis viewpoint, sitting one rung above Banach spaces in the hierarchy of spaces. The question that organises this whole page is sharp and worth holding in mind:

Of all the complete normed spaces, what makes a Hilbert space so much better behaved?

The one-word answer is geometry. A Banach space knows how long its vectors are; a Hilbert space also knows the angle between any two of them, and in particular when two vectors are at right angles. That single extra structure — an inner product — buys three theorems that have no counterpart in a general Banach space, and those three theorems are the backbone of least-squares fitting, Fourier analysis, quantum mechanics and signal processing.

A real-world hook to keep in view. A noisy signal or a data cloud is a vector x in some enormous space; the "clean" signals you are willing to accept form a subspace M. The engineer's daily task — find the best approximation of x inside M — is, word for word, the problem of dropping a perpendicular from a point to a plane. In a Hilbert space that perpendicular always exists, is unique, and is computable coefficient by coefficient. That is the payoff, and the rest of the page earns it.

The definition, and the norm behind it

Let H be a vector space over \mathbb{K} (read \mathbb{R} or \mathbb{C}) carrying an inner product \langle \cdot, \cdot \rangle — linear in the first slot, conjugate-symmetric, and positive-definite. Every inner product manufactures a norm,

\|x\| = \sqrt{\langle x, x\rangle},

and a norm manufactures the distance d(x, y) = \|x - y\|. So an inner-product space is automatically a normed space, hence a metric space. The last ingredient is completeness: no Cauchy sequence is allowed to escape.

A Hilbert space is a complete inner-product space — an inner-product space that is complete in the norm \|x\| = \sqrt{\langle x, x\rangle}.

Equivalently: a Banach space whose norm comes from an inner product.
Every Hilbert space is a Banach space; the converse fails (that is the point of this page).
Model examples: \mathbb{R}^n and \mathbb{C}^n with the dot product, the sequence space \ell^2, and the function space L^2[a, b] with \langle f, g\rangle = \int_a^b f\,\overline{g}\,dx.

This raises an obvious question: given a Banach space, can you tell whether its norm secretly came from an inner product? Remarkably, yes — and by a single identity you can check with two vectors.

Expand \|x \pm y\|^2 = \langle x \pm y, x \pm y\rangle = \|x\|^2 \pm 2\,\mathrm{Re}\langle x, y\rangle + \|y\|^2 and add the two versions. The cross terms cancel, leaving the parallelogram law:

\|x + y\|^2 + \|x - y\|^2 = 2\|x\|^2 + 2\|y\|^2.

Geometrically it says the two diagonals of a parallelogram determine its four sides — a fact that holds for genuine Euclidean lengths and, it turns out, only for them. The Jordan–von Neumann theorem makes this a perfect characterisation: a norm satisfies the parallelogram law if and only if it arises from an inner product, in which case the product is recovered from the norm by the polarisation identity (real case)

\langle x, y\rangle = \tfrac{1}{4}\big(\|x + y\|^2 - \|x - y\|^2\big).

So "is this Banach space a Hilbert space?" is settled by a one-line test on the norm. Fail the parallelogram law at even a single pair (x, y), and no inner product can exist.

It is tempting to imagine that once a space is complete and normed, all the familiar geometry comes for free. It does not. The parallelogram law is a genuine restriction, and most of the everyday Banach spaces flunk it.

The sup norm fails. Take \mathbb{R}^2 with \|x\|_\infty = \max(|x_1|, |x_2|), and the vectors x = (1, 0), y = (0, 1). Then \|x + y\|_\infty = 1 and \|x - y\|_\infty = 1, so the left side is 1 + 1 = 2, while the right side is 2(1) + 2(1) = 4. 2 \ne 4: no inner product can reproduce the sup norm. The same collapse happens in C[0, 1] with its uniform norm.

Only p = 2 survives among the \ell^p. Repeat the calculation in \|\cdot\|_p with the same x, y: the left side is 2^{2/p} + 2^{2/p} and the right side is 4. These agree only when 2^{2/p} = 2, i.e. p = 2. So of the entire family \ell^1, \ell^2, \ell^3, \dots, \ell^\infty, exactly one — \ell^2 — is a Hilbert space. \ell^2 and L^2 are geometric islands in an ocean of merely-Banach spaces.

Theorem 1 — orthogonal projection: drop a perpendicular

This is the theorem the real-world hook was pointing at. In a general metric space there is no reason a set should contain a nearest point to a given x — the infimum of the distances need not be attained, and even if it is it might be attained twice. In a Hilbert space, convexity plus completeness rescue both existence and uniqueness.

Let H be a Hilbert space.

Nearest point in a convex set. If K \subseteq H is nonempty, closed and convex, then every x \in H has a unique nearest point p \in K: \|x - p\| = \inf_{k \in K}\|x - k\|.
Projection onto a closed subspace. If M \subseteq H is a closed subspace, the nearest point p = P_M x is characterised by the orthogonality condition x - p \perp M: the residual is perpendicular to every vector of M.
Orthogonal decomposition. Consequently H = M \oplus M^{\perp}: every x splits uniquely as x = p + r with p \in M and r \in M^{\perp}, and (M^{\perp})^{\perp} = M.

Why is the nearest point unique? Suppose p and q both realise the minimal distance \delta. Their midpoint \tfrac{1}{2}(p + q) lies in K by convexity, so it is no closer than \delta either. Now feed x - p and x - q into the parallelogram law:

\|p - q\|^2 = 2\|x - p\|^2 + 2\|x - q\|^2 - 4\Big\|x - \tfrac{p + q}{2}\Big\|^2 \le 2\delta^2 + 2\delta^2 - 4\delta^2 = 0.

So \|p - q\| = 0, i.e. p = q. Notice what did the work: the parallelogram law — the very identity that separates Hilbert spaces from Banach spaces. Uniqueness of best approximation is a Hilbert-space privilege, and it is exactly the least-squares "normal equations" x - p \perp M in disguise.

Drag the sliders below. The blue line is a subspace M through the origin; x is the point being approximated; p = P_M x is its shadow on M; and the dashed residual x - p always meets M at a right angle — that right angle is the reason p is the closest point.

Both hypotheses are load-bearing, and dropping either one breaks the theorem.

Drop closed. In \mathbb{R} take the open interval K = (0, 1) and the point x = 2. The distances to points of K creep down toward 1 but never reach it — the would-be nearest point 1 is missing from K. No minimiser exists.
Drop convex. Take K to be a circle (not the disc) and x its centre. Every point of the circle is equidistant, so the nearest point is wildly non-unique. Convexity is what forbids these ties.
Drop completeness (work in an incomplete inner-product space) and the minimising sequence can be Cauchy yet converge to nothing in the space — existence fails again. All three Hilbert-space hypotheses pull their weight.

Theorem 2 — orthonormal bases and generalised Fourier series

Projection onto a line spanned by a unit vector e is a one-liner: the shadow is \langle x, e\rangle\,e, and the scalar \langle x, e\rangle is the coordinate of x along e. An orthonormal system \{e_n\} — mutually perpendicular unit vectors, \langle e_m, e_n\rangle = \delta_{mn} — lets us do this in every direction at once and assemble the projections into a series.

The numbers \hat{x}_n = \langle x, e_n\rangle are the generalised Fourier coefficients of x. The partial sum s_N = \sum_{n=1}^{N}\langle x, e_n\rangle\,e_n is precisely the orthogonal projection of x onto \mathrm{span}\{e_1, \dots, e_N\} — the best approximation of x by those N directions, by Theorem 1. Because x - s_N \perp s_N, Pythagoras gives \|x\|^2 = \|s_N\|^2 + \|x - s_N\|^2 \ge \|s_N\|^2 = \sum_{n=1}^N |\langle x, e_n\rangle|^2, and letting N \to \infty yields Bessel's inequality.

Let \{e_n\} be an orthonormal system in a Hilbert space H.

Bessel's inequality holds for every orthonormal system: \displaystyle\sum_{n} |\langle x, e_n\rangle|^2 \le \|x\|^2. In particular the coefficients are square-summable and \langle x, e_n\rangle \to 0.
Orthonormal basis. The system is complete (an orthonormal basis) when its closed span is all of H. Then \displaystyle x = \sum_n \langle x, e_n\rangle\,e_n for every x — the generalised Fourier series, convergent in norm.
Parseval's identity. For a basis, Bessel's inequality becomes an equality: \displaystyle \|x\|^2 = \sum_n |\langle x, e_n\rangle|^2. "Energy in the signal equals energy in the coefficients."

The classical Fourier series is the special case H = L^2[-\pi, \pi] with the orthonormal system \big\{\tfrac{1}{\sqrt{2\pi}}e^{inx}\big\}_{n \in \mathbb{Z}}. Parseval's identity then reads \int_{-\pi}^{\pi} |f|^2 = 2\pi\sum_n |c_n|^2 — the statement that a function and its spectrum carry the same total energy, the mathematical heart of every equaliser, MP3 encoder and spectrum analyser. Abstractly, Parseval says a separable Hilbert space is isometrically the coordinate space \ell^2: choose a basis and every vector becomes its sequence of coefficients.

Theorem 3 — Riesz representation: a Hilbert space is its own dual

A bounded linear functional is a continuous linear "measurement" f : H \to \mathbb{K}; together they form the dual space H^{*}. In a Hilbert space every such measurement has a startlingly concrete form: it is just taking the inner product with one fixed vector.

Let H be a Hilbert space and f \in H^{*} a bounded linear functional. Then:

there is a unique y \in H with f(x) = \langle x, y\rangle for all x;
the norms match exactly: \|f\|_{H^{*}} = \|y\|_{H};
the map f \mapsto y is a (conjugate-linear) isometric isomorphism H^{*} \cong H — a Hilbert space is canonically its own dual.

The vector y is built by Theorem 1. If f = 0 take y = 0. Otherwise the kernel M = \ker f is a closed subspace with M \ne H, so M^{\perp} contains a unit vector e — that is where completeness and the projection theorem enter — and one checks that y = \overline{f(e)}\,e does the job. Uniqueness is immediate: if \langle x, y_1\rangle = \langle x, y_2\rangle for all x, put x = y_1 - y_2 to get \|y_1 - y_2\|^2 = 0.

This is a striking contrast with the wider world of Banach spaces, where the dual is usually a different space (the dual of \ell^1 is \ell^\infty, and the dual of \ell^\infty is bigger still). Only in a Hilbert space does H^{*} fold back onto H itself. Self-duality is what lets us define adjoint operators \langle Tx, y\rangle = \langle x, T^{*}y\rangle, and it is the reason the bra \langle \psi| and the ket |\psi\rangle of quantum mechanics are two faces of the same state.

Frigyes Riesz (1880–1956) was one of the founders of functional analysis, and this 1907 result — proved independently the same year by Maurice Fréchet — was among the first to show that abstract Hilbert space had teeth. The magic is that an arbitrary continuous linear rule, however it is described, must secretly be an inner product with a single hidden vector. Infinitely many possible "measurements" collapse to a choice of one direction to measure along.

It also quietly powers the rest of analysis. The existence of weak solutions to partial differential equations (the Lax–Milgram theorem) is Riesz representation wearing a coat; so is the existence of conditional expectation in probability, which is literally an orthogonal projection in L^2. Once you see a Hilbert space, you start seeing "drop a perpendicular" everywhere.

The three theorems, in one breath

Everything above rests on the single fact that a Hilbert-space norm knows angles. From it:

Projection — closed convex sets have unique nearest points, and H = M \oplus M^{\perp}. (Best approximation, least squares.)
Fourier series — x = \sum_n \langle x, e_n\rangle e_n with Bessel's inequality \sum|\hat{x}_n|^2 \le \|x\|^2 and, for a basis, Parseval \|x\|^2 = \sum|\hat{x}_n|^2. (Signal decomposition.)
Riesz — f(x) = \langle x, y\rangle, \|f\| = \|y\|, so H^{*} \cong H. (Self-duality, adjoints, quantum bra–ket.)

A Banach space gives you none of these for free. That gap — closed by the parallelogram law — is the whole reason Hilbert spaces sit in a class of their own.