Measures and Measure Spaces

A σ-algebra answers only half the question. It tells us which subsets of a space X we are allowed to measure — a family \mathcal{F} closed under complements and countable unions, rich enough for analysis, small enough to dodge the non-measurable monsters. But a list of "measurable sets" says nothing about how much each one contains. Naming the sets you may weigh is not the same as owning a scale.

This page installs the scale. A measure \mu is the map that assigns to every set in \mathcal{F} a size in [0, \infty] — a length, an area, a probability, a count — subject to one decisive rule: the size of a disjoint countable pile is the sum of the sizes of its pieces. A σ-algebra and a measure on it, riding on a set X, form a measure space (X, \mathcal{F}, \mu) — the single object on which the whole of integration, and (when \mu(X) = 1) all of probability, is built.

The definition

Fix a measurable space (X, \mathcal{F}). A measure on it is a function

\mu : \mathcal{F} \longrightarrow [0, \infty]

(the value +\infty is genuinely allowed — the whole line has infinite length) satisfying two axioms:

(M1) — the empty set weighs nothing: \mu(\varnothing) = 0.
(M2) — countable additivity (σ-additivity): for any sequence A_1, A_2, A_3, \dots \in \mathcal{F} that is pairwise disjoint (A_i \cap A_j = \varnothing for i \ne j), \mu\!\left(\bigcup_{n=1}^{\infty} A_n\right) = \sum_{n=1}^{\infty} \mu(A_n).

The triple (X, \mathcal{F}, \mu) is a measure space. Everything else on this page — every property you would want a "size" to have — is squeezed out of these two lines.

The word that carries all the weight is countable. Axiom (M2) is not merely finite additivity (split a set in two, the sizes add). It promises the sum survives an infinite sequence of disjoint pieces — and that is exactly the promise that lets sizes commute with limits, the property Riemann's theory so badly lacked. A single disjoint union A \sqcup B is the easy special case; the theory lives in the tail.

A measure on (X, \mathcal{F}) is a set function \mu : \mathcal{F} \to [0, \infty] such that:

\mu(\varnothing) = 0;
for pairwise-disjoint A_1, A_2, \dots \in \mathcal{F}, \ \mu\!\left(\bigcup_n A_n\right) = \sum_n \mu(A_n).

The triple (X, \mathcal{F}, \mu) is called a measure space.

Everything follows from two axioms

The axioms look thin. Watch how much they carry — each property below is forced by (M1) and (M2) alone, with a one-line proof.

Finite additivity

Take disjoint A_1, \dots, A_k \in \mathcal{F} and pad the sequence with empty sets: A_{k+1} = A_{k+2} = \dots = \varnothing. These are still pairwise disjoint, so (M2) applies, and by (M1) every padded term contributes 0:

\mu(A_1 \cup \dots \cup A_k) = \sum_{n=1}^{\infty} \mu(A_n) = \sum_{n=1}^{k} \mu(A_n).

So finite additivity is a corollary — which is why we bothered to demand the countable version: the strong axiom hands us the weak one for free, but not conversely.

Monotonicity

Suppose A \subseteq B, both in \mathcal{F}. Split B into the disjoint pieces A and B \setminus A (both measurable — a σ-algebra is closed under differences). By finite additivity,

\mu(B) = \mu(A) + \mu(B \setminus A) \;\ge\; \mu(A),

because \mu(B \setminus A) \ge 0. Bigger set, bigger (or equal) measure — no set can shrink by gaining points. As a bonus, when \mu(A) < \infty we may subtract to get \mu(B \setminus A) = \mu(B) - \mu(A).

Countable subadditivity

Now drop the disjointness. For any A_1, A_2, \dots \in \mathcal{F} (overlapping freely),

\mu\!\left(\bigcup_{n} A_n\right) \;\le\; \sum_{n} \mu(A_n).

The trick is disjointification: set B_1 = A_1 and B_n = A_n \setminus (A_1 \cup \dots \cup A_{n-1}). The B_n are disjoint, each B_n \subseteq A_n, and \bigcup B_n = \bigcup A_n. So by (M2) then monotonicity,

\mu\!\left(\bigcup_n A_n\right) = \sum_n \mu(B_n) \le \sum_n \mu(A_n).

Equality when the sets are disjoint (that is (M2)); a strict deficit whenever they overlap — you paid for the overlap twice on the right.

Continuity from below

A measure respects increasing limits. If A_1 \subseteq A_2 \subseteq A_3 \subseteq \dots with union A = \bigcup_n A_n (written A_n \uparrow A), then

\mu(A_n) \;\uparrow\; \mu(A) \qquad \text{i.e.} \qquad \lim_{n\to\infty}\mu(A_n) = \mu\!\left(\bigcup_n A_n\right).

Peel the tower into disjoint rings C_1 = A_1, C_n = A_n \setminus A_{n-1}. Then A_k = \bigcup_{n \le k} C_n and A = \bigcup_n C_n, so by (M2) \mu(A) = \sum_n \mu(C_n) = \lim_k \sum_{n\le k}\mu(C_n) = \lim_k \mu(A_k). Countable additivity is precisely this continuity — the two are equivalent given finite additivity.

Continuity from above (with a catch)

The decreasing version holds too — but only if you start from finite measure. If A_1 \supseteq A_2 \supseteq \dots with A_n \downarrow A = \bigcap_n A_n, and \mu(A_1) < \infty, then

\mu(A_n) \;\downarrow\; \mu(A).

Apply continuity from below to the growing complements A_1 \setminus A_n \uparrow A_1 \setminus A, and use \mu(A_1 \setminus A_n) = \mu(A_1) - \mu(A_n) (legal because \mu(A_1) < \infty). The finiteness is not a technicality — the next box shows the theorem is false without it.

The measures you already know (and some you don't)

Abstract axioms deserve concrete inhabitants. Every one of these is a bona fide measure on some (X, \mathcal{F}).

Lebesgue measure \lambda on \mathbb{R}. The one we were chasing: it extends length, \lambda([a,b]) = b - a, to the whole Borel (and Lebesgue) σ-algebra, is translation-invariant, and assigns the rationals measure 0. Building it is the work of the next few pages; that it is a countably additive measure is its defining achievement.
Counting measure. On any X with \mathcal{F} = 2^X, put \mu(A) = |A| — the number of elements of A (with \mu(A) = \infty for infinite A). The empty set has 0 elements, and cardinalities of disjoint sets add, so both axioms hold. On \mathbb{N} this is the measure that turns integration into ordinary summation.
The Dirac measure \delta_x (a point mass). Fix a point x \in X and ask only "is x in the set?": \delta_x(A) = \begin{cases} 1, & x \in A, \\ 0, & x \notin A. \end{cases} All of the mass sits on a single point. \delta_x(\varnothing) = 0 since x \notin \varnothing, and for disjoint sets x lands in at most one of them, so the sum on the right has at most one non-zero term — additivity holds.
A discrete probability measure. Choose points x_1, x_2, \dots and weights p_n \ge 0 with \sum_n p_n = 1, and stack point masses: \mu = \sum_{n} p_n\, \delta_{x_n}, \qquad \mu(A) = \sum_{n \,:\, x_n \in A} p_n. This is a weighted mixture of Dirac measures — the measure behind a fair die (x_n \in \{1,\dots,6\}, each p_n = \tfrac16) or any discrete random variable.

A measure with \mu(X) = 1 is a probability measure: the total mass is exactly one unit, so \mu(A) reads as "the probability of A". Every axiom of probability — that the whole sample space has probability 1, that disjoint events add — is just a measure with total mass 1. Probability theory is measure theory with a normalization.

Null sets, "almost everywhere", and completeness

Sets of measure zero are the dust the theory learns to ignore. A set N \in \mathcal{F} is a null set (or μ-null) if \mu(N) = 0. Under Lebesgue measure the rationals, every finite or countable set, and even the uncountable Cantor set are null — negligible for the purposes of integration.

A property is said to hold almost everywhere (abbreviated a.e., or almost surely in probability) if the set of points where it fails is null. Dirichlet's function is 0 a.e. because it is non-zero only on \mathbb{Q} — a null set — which is precisely why its Lebesgue integral is 0. "Almost everywhere" is the phrase that lets us change a function on a null set without changing a single integral.

One subtlety earns a name. A measure is complete if every subset of a null set is itself measurable (and hence null): whenever \mu(N) = 0 and M \subseteq N, we insist M \in \mathcal{F}. This seems obvious — a subset of something negligible should be negligible — but a raw σ-algebra need not contain all such M. The Borel sets under Lebesgue measure are incomplete; one completes them by adjoining every subset of every Borel null set, giving the (complete) Lebesgue σ-algebra. Completeness removes annoying "is this scrap even measurable?" caveats from later theorems.

See it: the parts add up to the whole

Countable additivity is a statement about an infinite disjoint pile, so let us watch one converge. Split [0, 1) into disjoint intervals

I_1 = \left[0, \tfrac12\right),\quad I_2 = \left[\tfrac12, \tfrac34\right),\quad I_3 = \left[\tfrac34, \tfrac78\right),\ \dots,\quad I_n = \left[1 - \tfrac{1}{2^{n-1}},\, 1 - \tfrac{1}{2^{n}}\right),

so I_n has Lebesgue measure \lambda(I_n) = 2^{-n}. The pieces are pairwise disjoint and their union is all of [0,1), so (M2) demands \sum_n 2^{-n} = 1. Drag the slider to lay down more pieces and watch the running total \sum_{k=1}^{n} \lambda(I_k) = 1 - 2^{-n} creep up to the measure of the whole interval — never overshooting, always converging.

This is also continuity from below in miniature: the partial unions A_n = I_1 \cup \dots \cup I_n = [0, 1 - 2^{-n}) increase up to [0,1), and their measures 1 - 2^{-n} increase up to 1. Additivity and continuity are the same fact wearing two hats.

The prerequisite page's four-wish list already showed we cannot have everything. Here the pressure lands squarely on the word countable. If we asked only for finite additivity, strange creatures called finitely-additive measures exist — one can, using the axiom of choice, build a finitely-additive, translation-invariant "measure" defined on every subset of [0,1) with total mass 1. It dodges Vitali's contradiction precisely because it refuses to sum over the countably many rational translates. The same loophole, in three dimensions, is the engine of the Banach–Tarski paradox: a solid ball cut into finitely many (non-measurable) pieces and reassembled into two balls of the same size — finite additivity alone cannot forbid it.

Countable additivity is the extra strength that outlaws these pathologies and, far more importantly, is exactly the hypothesis that makes measure continuous along increasing and decreasing sequences. Without continuity along limits there is no Monotone Convergence Theorem, no Dominated Convergence, no interchange of limit and integral — no Lebesgue theory at all. We pay by restricting to a σ-algebra rather than all subsets; we are repaid with limits that behave.

Four traps waiting the moment you start computing with measures:

Continuity from above needs finite measure. Take A_n = [n, \infty) under Lebesgue measure. These decrease, A_n \downarrow \bigcap_n A_n = \varnothing, so the intersection has measure 0 — yet every \mu(A_n) = \infty, so \mu(A_n) = \infty \not\to 0 = \mu(\varnothing). The theorem fails without \mu(A_1) < \infty. Never drop that hypothesis.
Additivity is countable, not uncountable. A single point has Lebesgue measure 0, and [0,1] = \bigcup_{x \in [0,1]} \{x\} is a union of points — but this union is uncountable, so (M2) does not apply, and "0 + 0 + \dots" over uncountably many points says nothing. \lambda([0,1]) = 1, no contradiction. The countability restriction is load-bearing.
\mu may equal +\infty. The codomain is [0,\infty], not [0,\infty). Sums like \infty + 5 = \infty are fine, but \infty - \infty is forbidden — which is exactly why the subtraction step \mu(B \setminus A) = \mu(B) - \mu(A) is only legal when \mu(A) < \infty.
"Measure zero" is not "empty" or "finite". The rationals are infinite; the Cantor set is uncountable; both are null. Null means negligible for this measure, not small in size or cardinality.