Measures and Measure Spaces
A σ-algebra answers only half
the question. It tells us which subsets of a space X we are
allowed to measure — a family \mathcal{F} closed under complements and
countable unions, rich enough for analysis, small enough to dodge the non-measurable monsters. But a list
of "measurable sets" says nothing about how much each one contains. Naming the sets you
may weigh is not the same as owning a scale.
This page installs the scale. A measure \mu is the map that
assigns to every set in \mathcal{F} a size in
[0, \infty] — a length, an area, a probability, a count — subject to one
decisive rule: the size of a disjoint countable pile is the sum of the sizes of its pieces.
A σ-algebra and a measure on it, riding on a set X, form a
measure space (X, \mathcal{F}, \mu) — the single object on
which the whole of integration, and (when \mu(X) = 1) all of probability, is
built.
The definition
Fix a measurable space (X, \mathcal{F}). A measure on it is a
function
\mu : \mathcal{F} \longrightarrow [0, \infty]
(the value +\infty is genuinely allowed — the whole line has infinite length)
satisfying two axioms:
-
(M1) — the empty set weighs nothing:
\mu(\varnothing) = 0.
-
(M2) — countable additivity (σ-additivity): for any sequence
A_1, A_2, A_3, \dots \in \mathcal{F} that is
pairwise disjoint (A_i \cap A_j = \varnothing for
i \ne j),
\mu\!\left(\bigcup_{n=1}^{\infty} A_n\right) = \sum_{n=1}^{\infty} \mu(A_n).
The triple (X, \mathcal{F}, \mu) is a measure space. Everything
else on this page — every property you would want a "size" to have — is squeezed out of these two lines.
The word that carries all the weight is countable. Axiom (M2) is not merely
finite additivity (split a set in two, the sizes add). It promises the sum survives an
infinite sequence of disjoint pieces — and that is exactly the promise that lets sizes commute
with limits, the property Riemann's theory so badly lacked. A single disjoint union
A \sqcup B is the easy special case; the theory lives in the tail.
A measure on (X, \mathcal{F}) is a set function
\mu : \mathcal{F} \to [0, \infty] such that:
- \mu(\varnothing) = 0;
- for pairwise-disjoint A_1, A_2, \dots \in \mathcal{F},
\ \mu\!\left(\bigcup_n A_n\right) = \sum_n \mu(A_n).
The triple (X, \mathcal{F}, \mu) is called a measure space.
Everything follows from two axioms
The axioms look thin. Watch how much they carry — each property below is forced by (M1) and (M2) alone,
with a one-line proof.
Finite additivity
Take disjoint A_1, \dots, A_k \in \mathcal{F} and pad the sequence with empty
sets: A_{k+1} = A_{k+2} = \dots = \varnothing. These are still pairwise
disjoint, so (M2) applies, and by (M1) every padded term contributes
0:
\mu(A_1 \cup \dots \cup A_k) = \sum_{n=1}^{\infty} \mu(A_n) = \sum_{n=1}^{k} \mu(A_n).
So finite additivity is a corollary — which is why we bothered to demand the countable version:
the strong axiom hands us the weak one for free, but not conversely.
Monotonicity
Suppose A \subseteq B, both in \mathcal{F}. Split
B into the disjoint pieces A and
B \setminus A (both measurable — a σ-algebra is closed under differences). By
finite additivity,
\mu(B) = \mu(A) + \mu(B \setminus A) \;\ge\; \mu(A),
because \mu(B \setminus A) \ge 0. Bigger set, bigger (or equal) measure — no
set can shrink by gaining points. As a bonus, when \mu(A) < \infty we may
subtract to get \mu(B \setminus A) = \mu(B) - \mu(A).
Countable subadditivity
Now drop the disjointness. For any A_1, A_2, \dots \in \mathcal{F}
(overlapping freely),
\mu\!\left(\bigcup_{n} A_n\right) \;\le\; \sum_{n} \mu(A_n).
The trick is disjointification: set
B_1 = A_1 and
B_n = A_n \setminus (A_1 \cup \dots \cup A_{n-1}). The
B_n are disjoint, each B_n \subseteq A_n, and
\bigcup B_n = \bigcup A_n. So by (M2) then monotonicity,
\mu\!\left(\bigcup_n A_n\right) = \sum_n \mu(B_n) \le \sum_n \mu(A_n).
Equality when the sets are disjoint (that is (M2)); a strict deficit whenever they overlap — you paid for
the overlap twice on the right.
Continuity from below
A measure respects increasing limits. If
A_1 \subseteq A_2 \subseteq A_3 \subseteq \dots with union
A = \bigcup_n A_n (written A_n \uparrow A), then
\mu(A_n) \;\uparrow\; \mu(A) \qquad \text{i.e.} \qquad \lim_{n\to\infty}\mu(A_n) = \mu\!\left(\bigcup_n A_n\right).
Peel the tower into disjoint rings C_1 = A_1,
C_n = A_n \setminus A_{n-1}. Then
A_k = \bigcup_{n \le k} C_n and
A = \bigcup_n C_n, so by (M2)
\mu(A) = \sum_n \mu(C_n) = \lim_k \sum_{n\le k}\mu(C_n) = \lim_k \mu(A_k).
Countable additivity is precisely this continuity — the two are equivalent given finite
additivity.
Continuity from above (with a catch)
The decreasing version holds too — but only if you start from finite measure. If
A_1 \supseteq A_2 \supseteq \dots with
A_n \downarrow A = \bigcap_n A_n, and
\mu(A_1) < \infty, then
\mu(A_n) \;\downarrow\; \mu(A).
Apply continuity from below to the growing complements
A_1 \setminus A_n \uparrow A_1 \setminus A, and use
\mu(A_1 \setminus A_n) = \mu(A_1) - \mu(A_n) (legal because
\mu(A_1) < \infty). The finiteness is not a technicality — the next box shows
the theorem is false without it.
The measures you already know (and some you don't)
Abstract axioms deserve concrete inhabitants. Every one of these is a bona fide measure on some
(X, \mathcal{F}).
-
Lebesgue measure \lambda on \mathbb{R}.
The one we were chasing: it extends length, \lambda([a,b]) = b - a, to the
whole Borel (and Lebesgue) σ-algebra, is translation-invariant, and assigns the rationals measure
0. Building it is the work of the next few pages; that it is a
countably additive measure is its defining achievement.
-
Counting measure. On any X with
\mathcal{F} = 2^X, put
\mu(A) = |A| — the number of elements of A
(with \mu(A) = \infty for infinite A). The empty
set has 0 elements, and cardinalities of disjoint sets add, so both axioms
hold. On \mathbb{N} this is the measure that turns integration into
ordinary summation.
-
The Dirac measure \delta_x (a point mass). Fix a point
x \in X and ask only "is x in the set?":
\delta_x(A) = \begin{cases} 1, & x \in A, \\ 0, & x \notin A. \end{cases}
All of the mass sits on a single point. \delta_x(\varnothing) = 0 since
x \notin \varnothing, and for disjoint sets x
lands in at most one of them, so the sum on the right has at most one non-zero term — additivity holds.
-
A discrete probability measure. Choose points
x_1, x_2, \dots and weights p_n \ge 0 with
\sum_n p_n = 1, and stack point masses:
\mu = \sum_{n} p_n\, \delta_{x_n}, \qquad \mu(A) = \sum_{n \,:\, x_n \in A} p_n.
This is a weighted mixture of Dirac measures — the measure behind a fair die
(x_n \in \{1,\dots,6\}, each p_n = \tfrac16) or
any discrete random variable.
A measure with \mu(X) = 1 is a probability measure: the total
mass is exactly one unit, so \mu(A) reads as "the probability of
A". Every axiom of probability — that the whole sample space has probability
1, that disjoint events add — is just a measure with total mass
1. Probability theory is measure theory with a normalization.
Null sets, "almost everywhere", and completeness
Sets of measure zero are the dust the theory learns to ignore. A set
N \in \mathcal{F} is a null set (or
μ-null) if \mu(N) = 0. Under Lebesgue measure the rationals,
every finite or countable set, and even the uncountable Cantor set are null — negligible for the purposes
of integration.
A property is said to hold almost everywhere (abbreviated a.e., or
almost surely in probability) if the set of points where it fails is null. Dirichlet's
function is 0 a.e. because it is non-zero only on
\mathbb{Q} — a null set — which is precisely why its Lebesgue integral is
0. "Almost everywhere" is the phrase that lets us change a function on a null
set without changing a single integral.
One subtlety earns a name. A measure is complete if every subset of a null set is itself
measurable (and hence null): whenever \mu(N) = 0 and
M \subseteq N, we insist M \in \mathcal{F}. This
seems obvious — a subset of something negligible should be negligible — but a raw σ-algebra need not
contain all such M. The Borel sets under Lebesgue measure are
incomplete; one completes them by adjoining every subset of every Borel null
set, giving the (complete) Lebesgue σ-algebra. Completeness removes annoying "is this scrap even
measurable?" caveats from later theorems.
See it: the parts add up to the whole
Countable additivity is a statement about an infinite disjoint pile, so let us watch one converge. Split
[0, 1) into disjoint intervals
I_1 = \left[0, \tfrac12\right),\quad I_2 = \left[\tfrac12, \tfrac34\right),\quad I_3 = \left[\tfrac34, \tfrac78\right),\ \dots,\quad I_n = \left[1 - \tfrac{1}{2^{n-1}},\, 1 - \tfrac{1}{2^{n}}\right),
so I_n has Lebesgue measure \lambda(I_n) = 2^{-n}.
The pieces are pairwise disjoint and their union is all of [0,1), so (M2)
demands \sum_n 2^{-n} = 1. Drag the slider to lay down more pieces and watch
the running total \sum_{k=1}^{n} \lambda(I_k) = 1 - 2^{-n} creep up to the
measure of the whole interval — never overshooting, always converging.
This is also continuity from below in miniature: the partial unions
A_n = I_1 \cup \dots \cup I_n = [0, 1 - 2^{-n}) increase up to
[0,1), and their measures 1 - 2^{-n} increase up to
1. Additivity and continuity are the same fact wearing two hats.
The prerequisite page's four-wish list already showed we cannot have everything. Here the pressure lands
squarely on the word countable. If we asked only for finite additivity, strange
creatures called finitely-additive measures exist — one can, using the axiom of choice,
build a finitely-additive, translation-invariant "measure" defined on every subset of
[0,1) with total mass 1. It dodges Vitali's
contradiction precisely because it refuses to sum over the countably many rational translates. The same
loophole, in three dimensions, is the engine of the Banach–Tarski paradox: a solid ball
cut into finitely many (non-measurable) pieces and reassembled into two balls of the same size —
finite additivity alone cannot forbid it.
Countable additivity is the extra strength that outlaws these pathologies and, far more importantly, is
exactly the hypothesis that makes measure continuous along increasing and decreasing sequences.
Without continuity along limits there is no Monotone Convergence Theorem, no Dominated Convergence, no
interchange of limit and integral — no Lebesgue theory at all. We pay by restricting to a σ-algebra
rather than all subsets; we are repaid with limits that behave.
Four traps waiting the moment you start computing with measures:
-
Continuity from above needs finite measure. Take
A_n = [n, \infty) under Lebesgue measure. These decrease,
A_n \downarrow \bigcap_n A_n = \varnothing, so the intersection has measure
0 — yet every \mu(A_n) = \infty, so
\mu(A_n) = \infty \not\to 0 = \mu(\varnothing). The theorem fails without
\mu(A_1) < \infty. Never drop that hypothesis.
-
Additivity is countable, not uncountable. A single point has Lebesgue measure
0, and [0,1] = \bigcup_{x \in [0,1]} \{x\} is a
union of points — but this union is uncountable, so (M2) does not apply, and
"0 + 0 + \dots" over uncountably many points says nothing.
\lambda([0,1]) = 1, no contradiction. The countability restriction is
load-bearing.
-
\mu may equal +\infty. The
codomain is [0,\infty], not [0,\infty). Sums like
\infty + 5 = \infty are fine, but \infty - \infty
is forbidden — which is exactly why the subtraction step
\mu(B \setminus A) = \mu(B) - \mu(A) is only legal when
\mu(A) < \infty.
-
"Measure zero" is not "empty" or "finite". The rationals are infinite; the Cantor set
is uncountable; both are null. Null means negligible for this measure, not small in size or
cardinality.