Normed Vector Spaces

How big is a signal? An engineer cleaning noise from an audio track needs a single number that says "this recording is loud" or "this error is small". A statistician fitting a model wants one number for "how far off" a whole vector of predictions is. A numerical analyst iterating toward the solution of a huge linear system needs to know the residual is shrinking — again, one number for the size of a vector. In every case we are asking for a length: a way to measure the magnitude of an element of a vector space, not the gap between two points but the size of one.

You already know one such rule intimately — the Euclidean length \|x\| = \sqrt{x_1^2 + \dots + x_n^2} in \mathbb{R}^n. But there are many others, each suited to a different job, and the audio engineer's "loudness" is not the statistician's "error". What all honest notions of length share is captured by three axioms. A vector space carrying such a length function is a normed vector space, and it is the central object of functional analysis — the arena where linear algebra and metric-space analysis fuse, so that limits and continuity coexist with addition and scaling.

Throughout, V is a vector space over \mathbb{R} (everything works verbatim over \mathbb{C}, replacing |\lambda| by the complex modulus). A norm is a function \|\cdot\| : V \to [0, \infty) assigning a length to each vector, subject to three axioms. For all x, y \in V and every scalar \lambda \in \mathbb{R}:

\textbf{(N1)} \;\; \|x\| \ge 0, \quad \text{and} \quad \|x\| = 0 \iff x = 0 \qquad (\text{positive definiteness}), \textbf{(N2)} \;\; \|\lambda x\| = |\lambda|\,\|x\| \qquad (\text{absolute homogeneity}), \textbf{(N3)} \;\; \|x + y\| \le \|x\| + \|y\| \qquad (\text{the triangle inequality}).

The pair (V, \|\cdot\|) is a normed vector space. Read the axioms as three demands on any reasonable "length": only the zero vector has zero length (N1); scaling a vector by \lambda scales its length by |\lambda| — doubling a vector doubles its length, and reversing it (\lambda = -1) leaves the length unchanged (N2); and the length of a sum never exceeds the sum of the lengths (N3).

What each axiom buys — and what homogeneity forces on you

The difference between a norm and a mere metric is that a norm must respect the linear structure of the space. (N2) and (N3) are not just about "closeness" — they tie length to scaling and addition. Two immediate consequences fall out of the axioms and are worth internalising.

0 = \|0\| = \|x + (-x)\| \le \|x\| + \|-x\| = \|x\| + |-1|\,\|x\| = 2\,\|x\|,

so \|x\| \ge 0 automatically. (This is the exact analogue of the "free non-negativity" you met for metrics — and it uses both linear axioms, which a bare metric does not have.) A third handy consequence, the reverse triangle inequality, drops out of (N3) the same way it did for the absolute value:

\big|\, \|x\| - \|y\| \,\big| \le \|x - y\|,

obtained by writing \|x\| = \|(x - y) + y\| \le \|x - y\| + \|y\| and its mirror image. It says the length function itself is continuous — a small change in a vector makes a small change in its length.

A norm on a real vector space V is a map \|\cdot\| : V \to [0, \infty) such that, for all x, y \in V and \lambda \in \mathbb{R}:

Every norm is a metric in disguise — with extra symmetry

A norm measures the size of one vector; a metric measures the gap between two. The bridge is to measure the gap between x and y as the length of their difference:

d(x, y) := \|x - y\|.

Claim: this d is always a metric. The three metric axioms drop straight out of the three norm axioms:

So every normed space is automatically a metric space, and all the machinery of metric spaces — open balls, convergence, continuity, completeness — is instantly available. Convergence x_n \to x just means \|x_n - x\| \to 0.

But a norm-metric is special: it is compatible with the linear structure in two ways an arbitrary metric need not be.

These two properties are exactly the fingerprint of a norm. A metric that has them does come from a norm (define \|x\| := d(x, 0) and check the axioms); a metric that lacks them cannot — which is the trap the "Watch out!" box below exposes.

It is tempting to think "metric" and "norm" are two words for the same idea. They are not: a norm always yields a metric, but many honest metrics can never be written as \|x - y\| for any norm.

Take the discrete metric on \mathbb{R}: d(x, y) = 1 when x \ne y and 0 otherwise. It is a perfectly good metric. But suppose it came from a norm, d(x, y) = \|x - y\|. Homogeneity would force \|2 \cdot 1\| = |2|\,\|1\| = 2\,\|1\|, i.e. d(2, 0) = 2\,d(1, 0). Yet in the discrete metric d(2, 0) = 1 and 2\,d(1, 0) = 2 — contradiction. The discrete metric is not translation-and-scale compatible, so no norm induces it. "Getting twice as far" has no meaning in it, and that is precisely the linear structure a norm insists on.

The same reasoning shows the bounded metric \rho(x, y) = \min(1, |x - y|) on \mathbb{R} is not from a norm either: distances can never exceed 1, but a norm's \|\lambda x\| = |\lambda|\,\|x\| is unbounded. Moral: a norm is strictly more than a metric — it is a metric that plays nicely with the algebra.

The family of p-norms on \mathbb{R}^n

The single most important family of examples lives on \mathbb{R}^n. For a real number p \ge 1 and a vector x = (x_1, \dots, x_n), define the p-norm

\|x\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p}.

Three members earn their own names and appear constantly:

For any fixed vector these three are ordered \|x\|_\infty \le \|x\|_2 \le \|x\|_1: a larger exponent flattens the contribution of the smaller coordinates. Verifying (N1) and (N2) for \|\cdot\|_p is routine — a sum of non-negative p-th powers is 0 only when every coordinate is 0, and pulling |\lambda| out of every |\lambda x_i|^p = |\lambda|^p |x_i|^p and then taking the p-th root gives homogeneity. The triangle inequality (N3) is the deep part: for \|\cdot\|_p it is Minkowski's inequality, and for the special case p = 2 it follows from Cauchy–Schwarz, worked out next.

Worked example 1: the Euclidean 2-norm satisfies the triangle inequality

We prove \|x + y\|_2 \le \|x\|_2 + \|y\|_2 on \mathbb{R}^n, the one axiom that is not immediate. The single tool we borrow is the Cauchy–Schwarz inequality \big|\langle x, y \rangle\big| \le \|x\|_2\,\|y\|_2, where \langle x, y \rangle = \sum_i x_i y_i.

Step 1 — square the target. Both sides are non-negative, so it suffices to compare their squares. Expand using the inner product:

\|x + y\|_2^2 = \langle x + y,\, x + y \rangle = \|x\|_2^2 + 2\,\langle x, y \rangle + \|y\|_2^2.

Step 2 — bound the cross term. By Cauchy–Schwarz, \langle x, y \rangle \le |\langle x, y \rangle| \le \|x\|_2\,\|y\|_2, so

\|x + y\|_2^2 \le \|x\|_2^2 + 2\,\|x\|_2\,\|y\|_2 + \|y\|_2^2 = \big(\|x\|_2 + \|y\|_2\big)^2.

Step 3 — take square roots. Both sides are non-negative and the square root is increasing, so \|x + y\|_2 \le \|x\|_2 + \|y\|_2. Done. Notice the shape of the argument — "square, apply an inner-product inequality, un-square" — recurs for every norm built from an inner product, and it is the engine behind Hilbert-space geometry.

The general p case (Minkowski) has no inner product to lean on, so it instead runs through Hölder's inequality, the p-generalisation of Cauchy–Schwarz. The strategy — reduce (N3) to a deeper "product" inequality — is the same.

From finite to infinite: the sequence spaces \ell^p

The p-norm has an obvious infinite-dimensional sibling. Instead of an n-tuple, take an infinite sequence x = (x_1, x_2, x_3, \dots) of reals and let the sum run to infinity. For 1 \le p < \infty, the space

\ell^p = \left\{\, x = (x_n)_{n\ge1} : \sum_{n=1}^{\infty} |x_n|^p < \infty \,\right\}, \qquad \|x\|_p = \left(\sum_{n=1}^{\infty} |x_n|^p\right)^{1/p},

consists of exactly those sequences whose p-norm is finite — the ones that are "p-summable". The companion \ell^\infty is the space of bounded sequences with \|x\|_\infty = \sup_n |x_n|. These are genuine vector spaces (Minkowski's inequality is what guarantees that the sum of two p-summable sequences is again p-summable, so the set is closed under addition), and they are the first infinite-dimensional Banach spaces most students meet.

A striking new phenomenon appears here that has no analogue in \mathbb{R}^n: the choice of p now changes which sequences belong at all, not merely their measured length. The harmonic-type sequence x_n = 1/n lies in \ell^2 (since \sum 1/n^2 = \pi^2/6 < \infty) but not in \ell^1 (the harmonic series \sum 1/n diverges). In fact \ell^p \subsetneq \ell^q whenever p < q — the spaces genuinely differ. This is the first sign that in infinite dimensions the different p-norms are not interchangeable, a theme that culminates in the equivalent-norms discussion below.

Worked example 2: the sup norm on C[a,b] is a norm

The most important function space in a first course is C[a, b], the vector space of continuous real-valued functions on a closed interval, with pointwise addition and scaling. Its natural length is the supremum (uniform) norm

\|f\|_\infty = \sup_{x \in [a, b]} |f(x)| \;=\; \max_{x \in [a, b]} |f(x)|,

where the supremum is actually attained (a continuous function on a compact interval is bounded and reaches its extreme values, by the extreme value theorem — so the \sup is a genuine \max and is finite). Let us verify all three axioms; this is the archetype for checking that a proposed length on a function space is a norm.

(N1) Positive definiteness. Each |f(x)| \ge 0, so its supremum is \ge 0. If \|f\|_\infty = 0 then \sup_x |f(x)| = 0, which forces |f(x)| = 0 for every x, i.e. f is the zero function. Conversely the zero function plainly has \|f\|_\infty = 0. (This step quietly uses continuity's cousin — it holds for any bounded function — but definiteness is where "the zero vector" means "the zero function", the whole graph flat on the axis.)

(N2) Absolute homogeneity. Pull the constant out of the supremum:

\|\lambda f\|_\infty = \sup_x |\lambda f(x)| = \sup_x |\lambda|\,|f(x)| = |\lambda| \sup_x |f(x)| = |\lambda|\,\|f\|_\infty,

valid because |\lambda| \ge 0 is a constant and scaling a set of non-negative numbers by a non-negative constant scales its supremum.

(N3) Triangle inequality. Fix any x \in [a, b]. The pointwise (scalar) triangle inequality gives

|f(x) + g(x)| \le |f(x)| + |g(x)| \le \sup_t |f(t)| + \sup_t |g(t)| = \|f\|_\infty + \|g\|_\infty.

The right-hand side is a single constant that bounds |f(x) + g(x)| for every x. A number that bounds a set is at least its supremum, so taking the supremum over x on the left preserves the inequality:

\|f + g\|_\infty = \sup_x |f(x) + g(x)| \le \|f\|_\infty + \|g\|_\infty.

All three axioms hold, so (C[a, b], \|\cdot\|_\infty) is a normed space — and, crucially, it is complete (a uniform limit of continuous functions is continuous), making it a Banach space. The induced metric d(f, g) = \|f - g\|_\infty = \sup_x |f(x) - g(x)| is exactly the uniform distance: two functions are close when their graphs are close everywhere, and convergence in this norm is uniform convergence.

You could put a different norm on the same functions: \|f\|_1 = \int_a^b |f(x)|\,dx, the "area under |f|". It is a norm on C[a, b] (definiteness needs continuity: a continuous non-zero function has positive area). But it measures something genuinely different — a tall, thin spike has huge sup norm yet tiny L^1 norm. A sequence of ever-thinner spikes converges to 0 in \|\cdot\|_1 while its \|\cdot\|_\infty stays at its peak height. On an infinite-dimensional space the two norms are not equivalent — they disagree about which sequences converge — which is impossible in \mathbb{R}^n. That contrast is the punchline of the next card.

Equivalent norms: when do two lengths agree on "convergence"?

A space can carry many norms. When do two of them describe the same analysis — the same convergent sequences, the same open sets, the same continuous maps? The answer is equivalence.

On \mathbb{R}^n the three p-norms are all equivalent — concretely,

\|x\|_\infty \le \|x\|_2 \le \|x\|_1 \le n\,\|x\|_\infty, \qquad \|x\|_2 \le \sqrt{n}\,\|x\|_\infty,

so no matter which you pick, a sequence of vectors converges in one iff it converges in all. This is no accident of the p-norms; it is a theorem of real power:

In infinite dimensions this collapses. As the spike example showed, \|\cdot\|_\infty and \|\cdot\|_1 on C[a, b] are not equivalent: no constant C can satisfy \|f\|_\infty \le C\,\|f\|_1 for all f, because a spike of height h and width 1/h^2 has \|f\|_\infty = h \to \infty while \|f\|_1 \to 0. This is the reason functional analysis is harder and richer than linear algebra: in infinite dimensions the norm you choose genuinely matters, and much of the subject is about which norm makes a given problem tractable.

Seeing the norm: unit balls

The cleanest way to see a norm is to draw its unit ball B = \{\, x : \|x\| \le 1 \,\} — the set of vectors of length at most one. Because a norm-metric is translation-invariant and homogeneous, this single shape encodes the whole geometry: every other ball is a scaled, shifted copy. In \mathbb{R}^2 the boundary \{\,|x|^p + |y|^p = 1\,\} morphs beautifully with p:

Drag the slider. As p grows from 1 the diamond swells through the circle and puffs out toward the square. Two features are worth naming. First, a bigger unit ball means a smaller norm — the square (largest ball) is \|\cdot\|_\infty, the smallest of the three, matching \|x\|_\infty \le \|x\|_2 \le \|x\|_1. Second, for every p \ge 1 the ball is convex — and that is no coincidence. The triangle inequality is precisely the statement that the unit ball is convex. If \|x\| \le 1 and \|y\| \le 1, then for t \in [0, 1], \|t x + (1-t) y\| \le t\|x\| + (1-t)\|y\| \le 1, so the segment between any two ball points stays in the ball.

Now push the slider below 1. The shape pinches inward and becomes a concave star — its "ball" is no longer convex. By the equivalence just noted, the triangle inequality must fail, so \|\cdot\|_p is not a norm for 0 < p < 1. You are literally watching an axiom break. The next box makes it numerical.

The formula \|x\|_p = \left(\sum |x_i|^p\right)^{1/p} makes sense for any p > 0, and it satisfies positive definiteness (N1) and homogeneity (N2) for every such p. So it is easy to assume it is always a norm. It is not: for 0 < p < 1 the triangle inequality (N3) fails.

Take p = \tfrac12 in \mathbb{R}^2 with x = (1, 0) and y = (0, 1). Then \|x\|_{1/2} = \|y\|_{1/2} = 1, but

\|x + y\|_{1/2} = \big(|1|^{1/2} + |1|^{1/2}\big)^{2} = (1 + 1)^2 = 4 \;>\; 2 = \|x\|_{1/2} + \|y\|_{1/2}.

The length of the sum exceeds the sum of the lengths — (N3) is violated, badly. The geometric shadow of this failure is exactly the non-convex "pinched star" you saw when the slider dropped below 1: the segment from (1,0) to (0,1) bulges outside the unit ball. So while \|\cdot\|_p for p < 1 still gives a metric (via \sum |x_i - y_i|^p, no outer root — that one does obey the triangle inequality), it is emphatically not a norm.