Normed Vector Spaces
How big is a signal? An engineer cleaning noise from an audio track needs a single number that
says "this recording is loud" or "this error is small". A statistician fitting a model wants one
number for "how far off" a whole vector of predictions is. A numerical analyst iterating toward the
solution of a huge linear system needs to know the residual is shrinking — again, one number for
the size of a vector. In every case we are asking for a length: a way to measure
the magnitude of an element of a
vector space,
not the gap between two points but the size of one.
You already know one such rule intimately — the Euclidean length
\|x\| = \sqrt{x_1^2 + \dots + x_n^2} in
\mathbb{R}^n. But there are many others, each suited to a different job,
and the audio engineer's "loudness" is not the statistician's "error". What all honest notions of
length share is captured by three axioms. A vector space carrying such a length function is a
normed vector space, and it is the central object of
functional analysis — the arena where linear algebra and
metric-space analysis
fuse, so that limits and continuity coexist with addition and scaling.
Throughout, V is a vector space over
\mathbb{R} (everything works verbatim over
\mathbb{C}, replacing |\lambda| by the complex
modulus). A norm is a function
\|\cdot\| : V \to [0, \infty) assigning a length to each vector, subject
to three axioms. For all x, y \in V and every scalar
\lambda \in \mathbb{R}:
\textbf{(N1)} \;\; \|x\| \ge 0, \quad \text{and} \quad \|x\| = 0 \iff x = 0 \qquad (\text{positive definiteness}),
\textbf{(N2)} \;\; \|\lambda x\| = |\lambda|\,\|x\| \qquad (\text{absolute homogeneity}),
\textbf{(N3)} \;\; \|x + y\| \le \|x\| + \|y\| \qquad (\text{the triangle inequality}).
The pair (V, \|\cdot\|) is a normed vector space. Read
the axioms as three demands on any reasonable "length": only the zero vector has zero length (N1);
scaling a vector by \lambda scales its length by
|\lambda| — doubling a vector doubles its length, and reversing it
(\lambda = -1) leaves the length unchanged (N2); and the length of a sum
never exceeds the sum of the lengths (N3).
What each axiom buys — and what homogeneity forces on you
The difference between a norm and a mere
metric is that a
norm must respect the linear structure of the space. (N2) and (N3) are not just about
"closeness" — they tie length to scaling and addition. Two immediate consequences fall out of the
axioms and are worth internalising.
-
The zero vector, and only it, has length zero. Putting
\lambda = 0 in (N2) gives
\|0\| = \|0 \cdot x\| = 0 \cdot \|x\| = 0, so
0 always has length 0. The
definiteness half of (N1) is the converse — the extra promise that nothing else
does. Drop it and you have a seminorm: still useful (it appears everywhere in
the theory of function spaces), but now distinct vectors can have length zero, so "size zero" no
longer means "the zero vector".
-
Non-negativity is not an extra assumption — it is free. The clause
\|x\| \ge 0 follows from the other two. Apply (N3) to
x and -x, then (N2) with
\lambda = -1:
0 = \|0\| = \|x + (-x)\| \le \|x\| + \|-x\| = \|x\| + |-1|\,\|x\| = 2\,\|x\|,
so \|x\| \ge 0 automatically. (This is the exact analogue of the "free
non-negativity" you met for metrics — and it uses both linear axioms, which a bare metric does not
have.) A third handy consequence, the reverse triangle inequality, drops out of
(N3) the same way it did for the absolute value:
\big|\, \|x\| - \|y\| \,\big| \le \|x - y\|,
obtained by writing \|x\| = \|(x - y) + y\| \le \|x - y\| + \|y\| and its
mirror image. It says the length function itself is continuous — a small change in a vector makes a
small change in its length.
A norm on a real vector space V is a map
\|\cdot\| : V \to [0, \infty) such that, for all
x, y \in V and \lambda \in \mathbb{R}:
-
(N1) Positive definiteness: \|x\| \ge 0, and
\|x\| = 0 \iff x = 0.
-
(N2) Absolute homogeneity:
\|\lambda x\| = |\lambda|\,\|x\|.
-
(N3) Triangle inequality:
\|x + y\| \le \|x\| + \|y\|.
-
A vector space equipped with a norm is a normed (vector) space. A normed space
that is complete in the induced metric (below) is a Banach space — the
setting for most of functional analysis.
Every norm is a metric in disguise — with extra symmetry
A norm measures the size of one vector; a metric measures the gap between two. The bridge is to
measure the gap between x and y as the length
of their difference:
d(x, y) := \|x - y\|.
Claim: this d is always a metric. The three metric
axioms drop straight out of the three norm axioms:
-
(M1) d(x, y) = \|x - y\| \ge 0, and it is
0 iff x - y = 0 (by definiteness), i.e.
x = y.
-
(M2) Symmetry is homogeneity with \lambda = -1:
d(y, x) = \|y - x\| = \|-(x - y)\| = |-1|\,\|x - y\| = d(x, y).
-
(M3) Writing x - z = (x - y) + (y - z) and applying
the norm's triangle inequality,
d(x, z) = \|(x - y) + (y - z)\| \le \|x - y\| + \|y - z\| = d(x, y) + d(y, z).
So every normed space is automatically a metric space, and all the machinery of
metric spaces —
open balls, convergence, continuity, completeness — is instantly available. Convergence
x_n \to x just means \|x_n - x\| \to 0.
But a norm-metric is special: it is compatible with the linear structure in two
ways an arbitrary metric need not be.
-
Translation invariance:
d(x + a, y + a) = \|(x + a) - (y + a)\| = \|x - y\| = d(x, y).
Shifting both points by the same vector leaves the distance unchanged — the geometry looks the
same everywhere, so the ball of radius r about any point is a
translate of the ball about 0.
-
Absolute homogeneity of distance:
d(\lambda x, \lambda y) = |\lambda|\,d(x, y). Scaling the whole space
by \lambda scales all distances by |\lambda|.
These two properties are exactly the fingerprint of a norm. A metric that has them does
come from a norm (define \|x\| := d(x, 0) and check the axioms); a metric
that lacks them cannot — which is the trap the "Watch out!" box below exposes.
It is tempting to think "metric" and "norm" are two words for the same idea. They are not: a norm
always yields a metric, but many honest metrics can never be written as
\|x - y\| for any norm.
Take the discrete metric on \mathbb{R}:
d(x, y) = 1 when x \ne y and
0 otherwise. It is a perfectly good metric. But suppose it came from a
norm, d(x, y) = \|x - y\|. Homogeneity would force
\|2 \cdot 1\| = |2|\,\|1\| = 2\,\|1\|, i.e.
d(2, 0) = 2\,d(1, 0). Yet in the discrete metric
d(2, 0) = 1 and 2\,d(1, 0) = 2 —
contradiction. The discrete metric is not translation-and-scale compatible, so no norm
induces it. "Getting twice as far" has no meaning in it, and that is precisely the linear
structure a norm insists on.
The same reasoning shows the bounded metric
\rho(x, y) = \min(1, |x - y|) on
\mathbb{R} is not from a norm either: distances can never exceed
1, but a norm's \|\lambda x\| = |\lambda|\,\|x\|
is unbounded. Moral: a norm is strictly more than a metric — it is a metric that plays
nicely with the algebra.
The family of p-norms on \mathbb{R}^n
The single most important family of examples lives on
\mathbb{R}^n. For a real number
p \ge 1 and a vector
x = (x_1, \dots, x_n), define the p-norm
\|x\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p}.
Three members earn their own names and appear constantly:
-
The 1-norm (Manhattan / taxicab length):
\|x\|_1 = \sum_{i=1}^{n} |x_i| = |x_1| + \dots + |x_n|. Add up the
coordinate magnitudes — total blocks walked.
-
The 2-norm (Euclidean length):
\|x\|_2 = \sqrt{\sum_{i=1}^{n} x_i^2}. The one from Pythagoras; the only
p-norm that comes from an inner product
\langle x, y \rangle = \sum x_i y_i, which is what makes
\|\cdot\|_2 so special (angles, orthogonality, projections all need it).
-
The \infty-norm (maximum / supremum length):
\|x\|_\infty = \max_{1 \le i \le n} |x_i|. It is the limit of
\|x\|_p as p \to \infty — as the exponent
grows, the largest coordinate dominates the sum, and
\left(\sum |x_i|^p\right)^{1/p} \to \max_i |x_i|.
For any fixed vector these three are ordered
\|x\|_\infty \le \|x\|_2 \le \|x\|_1: a larger exponent flattens the
contribution of the smaller coordinates. Verifying (N1) and (N2) for
\|\cdot\|_p is routine — a sum of non-negative p-th
powers is 0 only when every coordinate is
0, and pulling |\lambda| out of every
|\lambda x_i|^p = |\lambda|^p |x_i|^p and then taking the
p-th root gives homogeneity. The triangle inequality (N3) is the deep
part: for \|\cdot\|_p it is Minkowski's inequality, and
for the special case p = 2 it follows from Cauchy–Schwarz, worked out
next.
Worked example 1: the Euclidean 2-norm satisfies the triangle inequality
We prove \|x + y\|_2 \le \|x\|_2 + \|y\|_2 on
\mathbb{R}^n, the one axiom that is not immediate. The single tool we
borrow is the Cauchy–Schwarz inequality
\big|\langle x, y \rangle\big| \le \|x\|_2\,\|y\|_2, where
\langle x, y \rangle = \sum_i x_i y_i.
Step 1 — square the target. Both sides are non-negative, so it suffices to compare
their squares. Expand using the inner product:
\|x + y\|_2^2 = \langle x + y,\, x + y \rangle = \|x\|_2^2 + 2\,\langle x, y \rangle + \|y\|_2^2.
Step 2 — bound the cross term. By Cauchy–Schwarz,
\langle x, y \rangle \le |\langle x, y \rangle| \le \|x\|_2\,\|y\|_2, so
\|x + y\|_2^2 \le \|x\|_2^2 + 2\,\|x\|_2\,\|y\|_2 + \|y\|_2^2 = \big(\|x\|_2 + \|y\|_2\big)^2.
Step 3 — take square roots. Both sides are non-negative and the square root is
increasing, so \|x + y\|_2 \le \|x\|_2 + \|y\|_2. Done. Notice the shape
of the argument — "square, apply an inner-product inequality, un-square" — recurs for every norm
built from an inner product, and it is the engine behind Hilbert-space geometry.
The general p case (Minkowski) has no inner product to lean on, so it
instead runs through Hölder's inequality, the
p-generalisation of Cauchy–Schwarz. The strategy — reduce (N3) to a
deeper "product" inequality — is the same.
From finite to infinite: the sequence spaces \ell^p
The p-norm has an obvious infinite-dimensional sibling. Instead of an
n-tuple, take an infinite sequence
x = (x_1, x_2, x_3, \dots) of reals and let the sum run to infinity. For
1 \le p < \infty, the space
\ell^p = \left\{\, x = (x_n)_{n\ge1} : \sum_{n=1}^{\infty} |x_n|^p < \infty \,\right\}, \qquad \|x\|_p = \left(\sum_{n=1}^{\infty} |x_n|^p\right)^{1/p},
consists of exactly those sequences whose p-norm is finite — the ones
that are "p-summable". The companion
\ell^\infty is the space of bounded sequences with
\|x\|_\infty = \sup_n |x_n|. These are genuine
vector spaces (Minkowski's inequality is what guarantees that the sum of two
p-summable sequences is again p-summable, so
the set is closed under addition), and they are the first infinite-dimensional Banach spaces most
students meet.
A striking new phenomenon appears here that has no analogue in
\mathbb{R}^n: the choice of p now
changes which sequences belong at all, not merely their measured length. The
harmonic-type sequence x_n = 1/n lies in
\ell^2 (since \sum 1/n^2 = \pi^2/6 < \infty)
but not in \ell^1 (the harmonic series
\sum 1/n diverges). In fact
\ell^p \subsetneq \ell^q whenever
p < q — the spaces genuinely differ. This is the first sign that in
infinite dimensions the different p-norms are not interchangeable, a
theme that culminates in the equivalent-norms discussion below.
Worked example 2: the sup norm on C[a,b] is a norm
The most important function space in a first course is C[a, b], the
vector space of continuous real-valued functions on a closed interval, with pointwise addition and
scaling. Its natural length is the supremum (uniform) norm
\|f\|_\infty = \sup_{x \in [a, b]} |f(x)| \;=\; \max_{x \in [a, b]} |f(x)|,
where the supremum is actually attained (a continuous function on a compact interval is bounded and
reaches its extreme values, by the extreme value theorem — so the \sup
is a genuine \max and is finite). Let us verify all three axioms; this
is the archetype for checking that a proposed length on a function space is a norm.
(N1) Positive definiteness. Each |f(x)| \ge 0, so its
supremum is \ge 0. If \|f\|_\infty = 0 then
\sup_x |f(x)| = 0, which forces |f(x)| = 0 for
every x, i.e. f is the zero function.
Conversely the zero function plainly has \|f\|_\infty = 0. (This step
quietly uses continuity's cousin — it holds for any bounded function — but definiteness is where
"the zero vector" means "the zero function", the whole graph flat on the axis.)
(N2) Absolute homogeneity. Pull the constant out of the supremum:
\|\lambda f\|_\infty = \sup_x |\lambda f(x)| = \sup_x |\lambda|\,|f(x)| = |\lambda| \sup_x |f(x)| = |\lambda|\,\|f\|_\infty,
valid because |\lambda| \ge 0 is a constant and scaling a set of
non-negative numbers by a non-negative constant scales its supremum.
(N3) Triangle inequality. Fix any x \in [a, b]. The
pointwise (scalar) triangle inequality gives
|f(x) + g(x)| \le |f(x)| + |g(x)| \le \sup_t |f(t)| + \sup_t |g(t)| = \|f\|_\infty + \|g\|_\infty.
The right-hand side is a single constant that bounds |f(x) + g(x)| for
every x. A number that bounds a set is at least its supremum,
so taking the supremum over x on the left preserves the inequality:
\|f + g\|_\infty = \sup_x |f(x) + g(x)| \le \|f\|_\infty + \|g\|_\infty.
All three axioms hold, so (C[a, b], \|\cdot\|_\infty) is a normed space
— and, crucially, it is complete (a uniform limit of continuous functions is continuous),
making it a Banach space. The induced metric
d(f, g) = \|f - g\|_\infty = \sup_x |f(x) - g(x)| is exactly the
uniform distance:
two functions are close when their graphs are close everywhere, and convergence in this
norm is uniform convergence.
You could put a different norm on the same functions:
\|f\|_1 = \int_a^b |f(x)|\,dx, the "area under
|f|". It is a norm on C[a, b]
(definiteness needs continuity: a continuous non-zero function has positive area). But it measures
something genuinely different — a tall, thin spike has huge sup norm yet tiny
L^1 norm. A sequence of ever-thinner spikes converges to
0 in \|\cdot\|_1 while its
\|\cdot\|_\infty stays at its peak height. On an
infinite-dimensional space the two norms are not equivalent — they disagree
about which sequences converge — which is impossible in
\mathbb{R}^n. That contrast is the punchline of the next card.
Equivalent norms: when do two lengths agree on "convergence"?
A space can carry many norms. When do two of them describe the same analysis — the same
convergent sequences, the same open sets, the same continuous maps? The answer is
equivalence.
-
Two norms \|\cdot\|_a and \|\cdot\|_b on
V are equivalent if there exist constants
0 < c \le C < \infty with
c\,\|x\|_a \le \|x\|_b \le C\,\|x\|_a for all
x \in V.
-
Equivalent norms induce the same topology: identical convergent sequences,
identical open/closed sets, identical continuous functions. (If
\|x_n\|_a \to 0 then the sandwich forces
\|x_n\|_b \to 0, and vice versa.)
On \mathbb{R}^n the three p-norms are all
equivalent — concretely,
\|x\|_\infty \le \|x\|_2 \le \|x\|_1 \le n\,\|x\|_\infty, \qquad \|x\|_2 \le \sqrt{n}\,\|x\|_\infty,
so no matter which you pick, a sequence of vectors converges in one iff it converges in all. This is
no accident of the p-norms; it is a theorem of real power:
-
On a finite-dimensional vector space, any two norms are equivalent.
-
Consequently there is only one sensible notion of convergence, continuity, and completeness on
\mathbb{R}^n — the choice of norm is a matter of convenience, never
of substance. (The proof compares an arbitrary norm to
\|\cdot\|_2 using continuity of the norm on the compact unit sphere.)
In infinite dimensions this collapses. As the spike example showed,
\|\cdot\|_\infty and \|\cdot\|_1 on
C[a, b] are not equivalent: no constant
C can satisfy
\|f\|_\infty \le C\,\|f\|_1 for all
f, because a spike of height h and width
1/h^2 has \|f\|_\infty = h \to \infty while
\|f\|_1 \to 0. This is the reason functional analysis is harder
and richer than linear algebra: in infinite dimensions the norm you choose genuinely
matters, and much of the subject is about which norm makes a given problem tractable.
Seeing the norm: unit balls
The cleanest way to see a norm is to draw its unit ball
B = \{\, x : \|x\| \le 1 \,\} — the set of vectors of length at most one.
Because a norm-metric is translation-invariant and homogeneous, this single shape encodes the whole
geometry: every other ball is a scaled, shifted copy. In \mathbb{R}^2 the
boundary \{\,|x|^p + |y|^p = 1\,\} morphs beautifully with
p:
- p = 1: a diamond (rotated square), corners on the axes;
- p = 2: the familiar round disc;
- p \to \infty: an axis-aligned square, the set
\max(|x|, |y|) \le 1.
Drag the slider. As p grows from 1 the diamond
swells through the circle and puffs out toward the square. Two features are worth naming. First,
a bigger unit ball means a smaller norm — the square (largest ball) is
\|\cdot\|_\infty, the smallest of the three, matching
\|x\|_\infty \le \|x\|_2 \le \|x\|_1. Second, for every
p \ge 1 the ball is convex — and that is no coincidence.
The triangle inequality is precisely the statement that the unit ball is convex. If
\|x\| \le 1 and \|y\| \le 1, then for
t \in [0, 1],
\|t x + (1-t) y\| \le t\|x\| + (1-t)\|y\| \le 1, so the segment between
any two ball points stays in the ball.
Now push the slider below 1. The shape pinches inward and becomes
a concave star — its "ball" is no longer convex. By the equivalence just noted, the triangle
inequality must fail, so \|\cdot\|_p is not a norm for
0 < p < 1. You are literally watching an axiom break. The next box
makes it numerical.
The formula \|x\|_p = \left(\sum |x_i|^p\right)^{1/p} makes sense for
any p > 0, and it satisfies positive definiteness (N1) and
homogeneity (N2) for every such p. So it is easy to assume it is always
a norm. It is not: for 0 < p < 1 the
triangle inequality (N3) fails.
Take p = \tfrac12 in \mathbb{R}^2 with
x = (1, 0) and y = (0, 1). Then
\|x\|_{1/2} = \|y\|_{1/2} = 1, but
\|x + y\|_{1/2} = \big(|1|^{1/2} + |1|^{1/2}\big)^{2} = (1 + 1)^2 = 4 \;>\; 2 = \|x\|_{1/2} + \|y\|_{1/2}.
The length of the sum exceeds the sum of the lengths — (N3) is violated, badly. The
geometric shadow of this failure is exactly the non-convex "pinched star" you saw when the slider
dropped below 1: the segment from (1,0) to
(0,1) bulges outside the unit ball. So while
\|\cdot\|_p for p < 1 still gives a
metric (via \sum |x_i - y_i|^p, no outer root — that one
does obey the triangle inequality), it is emphatically not a norm.