Normed Vector Spaces

How big is a signal? An engineer cleaning noise from an audio track needs a single number that says "this recording is loud" or "this error is small". A statistician fitting a model wants one number for "how far off" a whole vector of predictions is. A numerical analyst iterating toward the solution of a huge linear system needs to know the residual is shrinking — again, one number for the size of a vector. In every case we are asking for a length: a way to measure the magnitude of an element of a vector space, not the gap between two points but the size of one.

You already know one such rule intimately — the Euclidean length \|x\| = \sqrt{x_1^2 + \dots + x_n^2} in \mathbb{R}^n. But there are many others, each suited to a different job, and the audio engineer's "loudness" is not the statistician's "error". What all honest notions of length share is captured by three axioms. A vector space carrying such a length function is a normed vector space, and it is the central object of functional analysis — the arena where linear algebra and metric-space analysis fuse, so that limits and continuity coexist with addition and scaling.

Throughout, V is a vector space over \mathbb{R} (everything works verbatim over \mathbb{C}, replacing |\lambda| by the complex modulus). A norm is a function \|\cdot\| : V \to [0, \infty) assigning a length to each vector, subject to three axioms. For all x, y \in V and every scalar \lambda \in \mathbb{R}:

\textbf{(N1)} \;\; \|x\| \ge 0, \quad \text{and} \quad \|x\| = 0 \iff x = 0 \qquad (\text{positive definiteness}),

\textbf{(N2)} \;\; \|\lambda x\| = |\lambda|\,\|x\| \qquad (\text{absolute homogeneity}),

\textbf{(N3)} \;\; \|x + y\| \le \|x\| + \|y\| \qquad (\text{the triangle inequality}).

The pair (V, \|\cdot\|) is a normed vector space. Read the axioms as three demands on any reasonable "length": only the zero vector has zero length (N1); scaling a vector by \lambda scales its length by |\lambda| — doubling a vector doubles its length, and reversing it (\lambda = -1) leaves the length unchanged (N2); and the length of a sum never exceeds the sum of the lengths (N3).

What each axiom buys — and what homogeneity forces on you

The difference between a norm and a mere metric is that a norm must respect the linear structure of the space. (N2) and (N3) are not just about "closeness" — they tie length to scaling and addition. Two immediate consequences fall out of the axioms and are worth internalising.

The zero vector, and only it, has length zero. Putting \lambda = 0 in (N2) gives \|0\| = \|0 \cdot x\| = 0 \cdot \|x\| = 0, so 0 always has length 0. The definiteness half of (N1) is the converse — the extra promise that nothing else does. Drop it and you have a seminorm: still useful (it appears everywhere in the theory of function spaces), but now distinct vectors can have length zero, so "size zero" no longer means "the zero vector".
Non-negativity is not an extra assumption — it is free. The clause \|x\| \ge 0 follows from the other two. Apply (N3) to x and -x, then (N2) with \lambda = -1:

0 = \|0\| = \|x + (-x)\| \le \|x\| + \|-x\| = \|x\| + |-1|\,\|x\| = 2\,\|x\|,

so \|x\| \ge 0 automatically. (This is the exact analogue of the "free non-negativity" you met for metrics — and it uses both linear axioms, which a bare metric does not have.) A third handy consequence, the reverse triangle inequality, drops out of (N3) the same way it did for the absolute value:

\big|\, \|x\| - \|y\| \,\big| \le \|x - y\|,

obtained by writing \|x\| = \|(x - y) + y\| \le \|x - y\| + \|y\| and its mirror image. It says the length function itself is continuous — a small change in a vector makes a small change in its length.

A norm on a real vector space V is a map \|\cdot\| : V \to [0, \infty) such that, for all x, y \in V and \lambda \in \mathbb{R}:

(N1) Positive definiteness: \|x\| \ge 0, and \|x\| = 0 \iff x = 0.
(N2) Absolute homogeneity: \|\lambda x\| = |\lambda|\,\|x\|.
(N3) Triangle inequality: \|x + y\| \le \|x\| + \|y\|.
A vector space equipped with a norm is a normed (vector) space. A normed space that is complete in the induced metric (below) is a Banach space — the setting for most of functional analysis.

Every norm is a metric in disguise — with extra symmetry

A norm measures the size of one vector; a metric measures the gap between two. The bridge is to measure the gap between x and y as the length of their difference:

d(x, y) := \|x - y\|.

Claim: this d is always a metric. The three metric axioms drop straight out of the three norm axioms:

(M1) d(x, y) = \|x - y\| \ge 0, and it is 0 iff x - y = 0 (by definiteness), i.e. x = y.
(M2) Symmetry is homogeneity with \lambda = -1: d(y, x) = \|y - x\| = \|-(x - y)\| = |-1|\,\|x - y\| = d(x, y).
(M3) Writing x - z = (x - y) + (y - z) and applying the norm's triangle inequality, d(x, z) = \|(x - y) + (y - z)\| \le \|x - y\| + \|y - z\| = d(x, y) + d(y, z).

So every normed space is automatically a metric space, and all the machinery of metric spaces — open balls, convergence, continuity, completeness — is instantly available. Convergence x_n \to x just means \|x_n - x\| \to 0.

But a norm-metric is special: it is compatible with the linear structure in two ways an arbitrary metric need not be.

Translation invariance: d(x + a, y + a) = \|(x + a) - (y + a)\| = \|x - y\| = d(x, y). Shifting both points by the same vector leaves the distance unchanged — the geometry looks the same everywhere, so the ball of radius r about any point is a translate of the ball about 0.
Absolute homogeneity of distance: d(\lambda x, \lambda y) = |\lambda|\,d(x, y). Scaling the whole space by \lambda scales all distances by |\lambda|.

These two properties are exactly the fingerprint of a norm. A metric that has them does come from a norm (define \|x\| := d(x, 0) and check the axioms); a metric that lacks them cannot — which is the trap the "Watch out!" box below exposes.

It is tempting to think "metric" and "norm" are two words for the same idea. They are not: a norm always yields a metric, but many honest metrics can never be written as \|x - y\| for any norm.

Take the discrete metric on \mathbb{R}: d(x, y) = 1 when x \ne y and 0 otherwise. It is a perfectly good metric. But suppose it came from a norm, d(x, y) = \|x - y\|. Homogeneity would force \|2 \cdot 1\| = |2|\,\|1\| = 2\,\|1\|, i.e. d(2, 0) = 2\,d(1, 0). Yet in the discrete metric d(2, 0) = 1 and 2\,d(1, 0) = 2 — contradiction. The discrete metric is not translation-and-scale compatible, so no norm induces it. "Getting twice as far" has no meaning in it, and that is precisely the linear structure a norm insists on.

The same reasoning shows the bounded metric \rho(x, y) = \min(1, |x - y|) on \mathbb{R} is not from a norm either: distances can never exceed 1, but a norm's \|\lambda x\| = |\lambda|\,\|x\| is unbounded. Moral: a norm is strictly more than a metric — it is a metric that plays nicely with the algebra.

The family of p-norms on \mathbb{R}^n

The single most important family of examples lives on \mathbb{R}^n. For a real number p \ge 1 and a vector x = (x_1, \dots, x_n), define the p-norm

\|x\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p}.

Three members earn their own names and appear constantly:

The 1-norm (Manhattan / taxicab length): \|x\|_1 = \sum_{i=1}^{n} |x_i| = |x_1| + \dots + |x_n|. Add up the coordinate magnitudes — total blocks walked.
The 2-norm (Euclidean length): \|x\|_2 = \sqrt{\sum_{i=1}^{n} x_i^2}. The one from Pythagoras; the only p-norm that comes from an inner product \langle x, y \rangle = \sum x_i y_i, which is what makes \|\cdot\|_2 so special (angles, orthogonality, projections all need it).
The \infty-norm (maximum / supremum length): \|x\|_\infty = \max_{1 \le i \le n} |x_i|. It is the limit of \|x\|_p as p \to \infty — as the exponent grows, the largest coordinate dominates the sum, and \left(\sum |x_i|^p\right)^{1/p} \to \max_i |x_i|.

Worked example 1: the Euclidean 2-norm satisfies the triangle inequality

We prove \|x + y\|_2 \le \|x\|_2 + \|y\|_2 on \mathbb{R}^n, the one axiom that is not immediate. The single tool we borrow is the Cauchy–Schwarz inequality \big|\langle x, y \rangle\big| \le \|x\|_2\,\|y\|_2, where \langle x, y \rangle = \sum_i x_i y_i.

Step 1 — square the target. Both sides are non-negative, so it suffices to compare their squares. Expand using the inner product:

\|x + y\|_2^2 = \langle x + y,\, x + y \rangle = \|x\|_2^2 + 2\,\langle x, y \rangle + \|y\|_2^2.

Step 2 — bound the cross term. By Cauchy–Schwarz, \langle x, y \rangle \le |\langle x, y \rangle| \le \|x\|_2\,\|y\|_2, so

\|x + y\|_2^2 \le \|x\|_2^2 + 2\,\|x\|_2\,\|y\|_2 + \|y\|_2^2 = \big(\|x\|_2 + \|y\|_2\big)^2.

Step 3 — take square roots. Both sides are non-negative and the square root is increasing, so \|x + y\|_2 \le \|x\|_2 + \|y\|_2. Done. Notice the shape of the argument — "square, apply an inner-product inequality, un-square" — recurs for every norm built from an inner product, and it is the engine behind Hilbert-space geometry.

The general p case (Minkowski) has no inner product to lean on, so it instead runs through Hölder's inequality, the p-generalisation of Cauchy–Schwarz. The strategy — reduce (N3) to a deeper "product" inequality — is the same.

From finite to infinite: the sequence spaces \ell^p

The p-norm has an obvious infinite-dimensional sibling. Instead of an n-tuple, take an infinite sequence x = (x_1, x_2, x_3, \dots) of reals and let the sum run to infinity. For 1 \le p < \infty, the space

\ell^p = \left\{\, x = (x_n)_{n\ge1} : \sum_{n=1}^{\infty} |x_n|^p < \infty \,\right\}, \qquad \|x\|_p = \left(\sum_{n=1}^{\infty} |x_n|^p\right)^{1/p},

consists of exactly those sequences whose p-norm is finite — the ones that are "p-summable". The companion \ell^\infty is the space of bounded sequences with \|x\|_\infty = \sup_n |x_n|. These are genuine vector spaces (Minkowski's inequality is what guarantees that the sum of two p-summable sequences is again p-summable, so the set is closed under addition), and they are the first infinite-dimensional Banach spaces most students meet.

A striking new phenomenon appears here that has no analogue in \mathbb{R}^n: the choice of p now changes which sequences belong at all, not merely their measured length. The harmonic-type sequence x_n = 1/n lies in \ell^2 (since \sum 1/n^2 = \pi^2/6 < \infty) but not in \ell^1 (the harmonic series \sum 1/n diverges). In fact \ell^p \subsetneq \ell^q whenever p < q — the spaces genuinely differ. This is the first sign that in infinite dimensions the different p-norms are not interchangeable, a theme that culminates in the equivalent-norms discussion below.

Worked example 2: the sup norm on C[a,b] is a norm

The most important function space in a first course is C[a, b], the vector space of continuous real-valued functions on a closed interval, with pointwise addition and scaling. Its natural length is the supremum (uniform) norm

\|f\|_\infty = \sup_{x \in [a, b]} |f(x)| \;=\; \max_{x \in [a, b]} |f(x)|,

where the supremum is actually attained (a continuous function on a compact interval is bounded and reaches its extreme values, by the extreme value theorem — so the \sup is a genuine \max and is finite). Let us verify all three axioms; this is the archetype for checking that a proposed length on a function space is a norm.

(N1) Positive definiteness. Each |f(x)| \ge 0, so its supremum is \ge 0. If \|f\|_\infty = 0 then \sup_x |f(x)| = 0, which forces |f(x)| = 0 for every x, i.e. f is the zero function. Conversely the zero function plainly has \|f\|_\infty = 0. (This step quietly uses continuity's cousin — it holds for any bounded function — but definiteness is where "the zero vector" means "the zero function", the whole graph flat on the axis.)

(N2) Absolute homogeneity. Pull the constant out of the supremum:

\|\lambda f\|_\infty = \sup_x |\lambda f(x)| = \sup_x |\lambda|\,|f(x)| = |\lambda| \sup_x |f(x)| = |\lambda|\,\|f\|_\infty,

valid because |\lambda| \ge 0 is a constant and scaling a set of non-negative numbers by a non-negative constant scales its supremum.

(N3) Triangle inequality. Fix any x \in [a, b]. The pointwise (scalar) triangle inequality gives

|f(x) + g(x)| \le |f(x)| + |g(x)| \le \sup_t |f(t)| + \sup_t |g(t)| = \|f\|_\infty + \|g\|_\infty.

The right-hand side is a single constant that bounds |f(x) + g(x)| for every x. A number that bounds a set is at least its supremum, so taking the supremum over x on the left preserves the inequality:

\|f + g\|_\infty = \sup_x |f(x) + g(x)| \le \|f\|_\infty + \|g\|_\infty.

All three axioms hold, so (C[a, b], \|\cdot\|_\infty) is a normed space — and, crucially, it is complete (a uniform limit of continuous functions is continuous), making it a Banach space. The induced metric d(f, g) = \|f - g\|_\infty = \sup_x |f(x) - g(x)| is exactly the uniform distance: two functions are close when their graphs are close everywhere, and convergence in this norm is uniform convergence.

You could put a different norm on the same functions: \|f\|_1 = \int_a^b |f(x)|\,dx, the "area under |f|". It is a norm on C[a, b] (definiteness needs continuity: a continuous non-zero function has positive area). But it measures something genuinely different — a tall, thin spike has huge sup norm yet tiny L^1 norm. A sequence of ever-thinner spikes converges to 0 in \|\cdot\|_1 while its \|\cdot\|_\infty stays at its peak height. On an infinite-dimensional space the two norms are not equivalent — they disagree about which sequences converge — which is impossible in \mathbb{R}^n. That contrast is the punchline of the next card.

Equivalent norms: when do two lengths agree on "convergence"?

A space can carry many norms. When do two of them describe the same analysis — the same convergent sequences, the same open sets, the same continuous maps? The answer is equivalence.

Two norms \|\cdot\|_a and \|\cdot\|_b on V are equivalent if there exist constants 0 < c \le C < \infty with c\,\|x\|_a \le \|x\|_b \le C\,\|x\|_a for all x \in V.
Equivalent norms induce the same topology: identical convergent sequences, identical open/closed sets, identical continuous functions. (If \|x_n\|_a \to 0 then the sandwich forces \|x_n\|_b \to 0, and vice versa.)

On \mathbb{R}^n the three p-norms are all equivalent — concretely,

\|x\|_\infty \le \|x\|_2 \le \|x\|_1 \le n\,\|x\|_\infty, \qquad \|x\|_2 \le \sqrt{n}\,\|x\|_\infty,

so no matter which you pick, a sequence of vectors converges in one iff it converges in all. This is no accident of the p-norms; it is a theorem of real power:

On a finite-dimensional vector space, any two norms are equivalent.
Consequently there is only one sensible notion of convergence, continuity, and completeness on \mathbb{R}^n — the choice of norm is a matter of convenience, never of substance. (The proof compares an arbitrary norm to \|\cdot\|_2 using continuity of the norm on the compact unit sphere.)

Seeing the norm: unit balls

The cleanest way to see a norm is to draw its unit ball B = \{\, x : \|x\| \le 1 \,\} — the set of vectors of length at most one. Because a norm-metric is translation-invariant and homogeneous, this single shape encodes the whole geometry: every other ball is a scaled, shifted copy. In \mathbb{R}^2 the boundary \{\,|x|^p + |y|^p = 1\,\} morphs beautifully with p:

p = 1: a diamond (rotated square), corners on the axes;
p = 2: the familiar round disc;
p \to \infty: an axis-aligned square, the set \max(|x|, |y|) \le 1.

Drag the slider. As p grows from 1 the diamond swells through the circle and puffs out toward the square. Two features are worth naming. First, a bigger unit ball means a smaller norm — the square (largest ball) is \|\cdot\|_\infty, the smallest of the three, matching \|x\|_\infty \le \|x\|_2 \le \|x\|_1. Second, for every p \ge 1 the ball is convex — and that is no coincidence. The triangle inequality is precisely the statement that the unit ball is convex. If \|x\| \le 1 and \|y\| \le 1, then for t \in [0, 1], \|t x + (1-t) y\| \le t\|x\| + (1-t)\|y\| \le 1, so the segment between any two ball points stays in the ball.

Now push the slider below 1. The shape pinches inward and becomes a concave star — its "ball" is no longer convex. By the equivalence just noted, the triangle inequality must fail, so \|\cdot\|_p is not a norm for 0 < p < 1. You are literally watching an axiom break. The next box makes it numerical.

The formula \|x\|_p = \left(\sum |x_i|^p\right)^{1/p} makes sense for any p > 0, and it satisfies positive definiteness (N1) and homogeneity (N2) for every such p. So it is easy to assume it is always a norm. It is not: for 0 < p < 1 the triangle inequality (N3) fails.

Take p = \tfrac12 in \mathbb{R}^2 with x = (1, 0) and y = (0, 1). Then \|x\|_{1/2} = \|y\|_{1/2} = 1, but

\|x + y\|_{1/2} = \big(|1|^{1/2} + |1|^{1/2}\big)^{2} = (1 + 1)^2 = 4 \;>\; 2 = \|x\|_{1/2} + \|y\|_{1/2}.

The length of the sum exceeds the sum of the lengths — (N3) is violated, badly. The geometric shadow of this failure is exactly the non-convex "pinched star" you saw when the slider dropped below 1: the segment from (1,0) to (0,1) bulges outside the unit ball. So while \|\cdot\|_p for p < 1 still gives a metric (via \sum |x_i - y_i|^p, no outer root — that one does obey the triangle inequality), it is emphatically not a norm.