Independence and Product Measures
Roll a die, then roll it again. The second roll does not care what the first one did — the two are
independent. In an elementary course you met this as a rule about numbers:
independent events
multiply,
\mathbb{P}(A \cap B) = \mathbb{P}(A)\,\mathbb{P}(B).
That formula is correct, and it is where we begin. But measure theory lets us see what independence
really is, structurally: it is the statement that a joint experiment factorises into a
product. Two independent random variables have a joint law that is a
product measure; the whole apparatus of
Fubini's theorem then
applies, and the familiar consequences — \mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y],
variances that add — drop out as one-line integrals over a rectangle. This page is the single idea that
independence = product structure, told from every angle.
From events to σ-algebras
Fix a probability space (\Omega, \mathcal{F}, \mathbb{P}). Two events
A, B \in \mathcal{F} are independent when
\mathbb{P}(A\cap B) = \mathbb{P}(A)\mathbb{P}(B). The measure-theoretic move
is to stop talking about single events and talk about whole sub-σ-algebras at once.
Two sub-σ-algebras \mathcal{G}_1, \mathcal{G}_2 \subseteq \mathcal{F} are
independent if
\mathbb{P}(G_1 \cap G_2) = \mathbb{P}(G_1)\,\mathbb{P}(G_2) \qquad\text{for all } G_1 \in \mathcal{G}_1,\ G_2 \in \mathcal{G}_2.
A σ-algebra is the collection of all questions you could answer from some source of
information. Independence of \mathcal{G}_1 and
\mathcal{G}_2 says: every yes/no question answerable from the first
source is probabilistically unlinked from every question answerable from the second. That is a
far stronger, cleaner statement than a single-event identity, and it is the right level of generality —
because random variables carry σ-algebras with them.
A random variable X:\Omega\to\mathbb{R} generates a σ-algebra
\sigma(X) = \{X^{-1}(B) : B \in \mathcal{B}(\mathbb{R})\} — the information
"what is the value of X?" We say X and
Y are independent random variables precisely when their
generated σ-algebras are independent:
X \perp\!\!\!\perp Y \iff \sigma(X) \text{ and } \sigma(Y) \text{ are independent} \iff \mathbb{P}(X\in C,\ Y \in D) = \mathbb{P}(X\in C)\,\mathbb{P}(Y\in D)
for all Borel sets C, D. In practice you rarely check every Borel pair: a
π-system argument (Dynkin's lemma) shows it is enough to verify the factorisation on
generating sets — half-lines (-\infty, x] — which is exactly the statement
about distribution functions below.
The product σ-algebra and the product measure
Suppose we have two probability spaces,
(\Omega_1, \mathcal{F}_1, \mathbb{P}_1) and
(\Omega_2, \mathcal{F}_2, \mathbb{P}_2), and we want to run both
experiments at once. The sample space is the Cartesian product
\Omega_1 \times \Omega_2. What is its σ-algebra? The natural building blocks
are measurable rectangles A \times B with
A \in \mathcal{F}_1, B \in \mathcal{F}_2. The
product σ-algebra is the one they generate:
\mathcal{F}_1 \otimes \mathcal{F}_2 := \sigma\bigl(\{A \times B : A \in \mathcal{F}_1,\ B \in \mathcal{F}_2\}\bigr).
On this σ-algebra there is a unique probability measure — the product measure
\mathbb{P}_1 \otimes \mathbb{P}_2 — characterised by what it must do on
rectangles:
-
There is a unique probability measure \mathbb{P}_1 \otimes \mathbb{P}_2
on (\Omega_1\times\Omega_2,\ \mathcal{F}_1\otimes\mathcal{F}_2) with
(\mathbb{P}_1\otimes\mathbb{P}_2)(A\times B) = \mathbb{P}_1(A)\,\mathbb{P}_2(B)
for every measurable rectangle.
-
Uniqueness is a π-system argument: the rectangles are closed under intersection
and generate \mathcal{F}_1\otimes\mathcal{F}_2, so two measures agreeing
on them agree everywhere.
-
Existence is exactly the content of the Fubini–Tonelli construction: the measure of a general set
E is obtained by integrating its slices,
(\mathbb{P}_1\otimes\mathbb{P}_2)(E) = \int_{\Omega_1} \mathbb{P}_2(E_{\omega_1})\, d\mathbb{P}_1(\omega_1).
Independence, restated one final time, is now a single equation between measures. Let
\mu_X, \mu_Y be the laws (push-forward distributions) of
X and Y, and let
\mu_{(X,Y)} be the joint law of the pair
(X,Y):\Omega\to\mathbb{R}^2. Then
X \perp\!\!\!\perp Y \iff \mu_{(X,Y)} = \mu_X \otimes \mu_Y.
Independence is the joint law being a product measure. Everything else on this page is
a consequence of that one line.
Three equivalent faces of the same fact
It helps to hold the equivalences together. For random variables X, Y the
following all say X\perp\!\!\!\perp Y:
-
σ-algebras: \sigma(X) and
\sigma(Y) are independent —
\mathbb{P}(X\in C, Y\in D) = \mathbb{P}(X\in C)\mathbb{P}(Y\in D).
-
Measures: the joint law factorises,
\mu_{(X,Y)} = \mu_X \otimes \mu_Y.
-
Distribution functions: the joint CDF factorises,
F_{X,Y}(x,y) = F_X(x)\,F_Y(y) for all
x, y (and, when densities exist,
f_{X,Y}(x,y) = f_X(x)\,f_Y(y)).
The CDF version is the working criterion — it only involves half-lines, and by the π-system lemma that
is enough to force the full product structure. The measure version is the conceptual one — it makes the
multiplicative property below a triviality via Fubini.
Above: the unit square [0,1]^2 carrying the product measure, whose value on a
set is simply its area. Slide a and
b to resize the events A = [0,a] (on the
horizontal axis) and B = [0,b] (vertical). The shaded rectangle is the event
A\times B, and its area is always
a\cdot b = \mathbb{P}(A)\,\mathbb{P}(B) — independence made visible as the
area of a rectangle equalling the product of its side lengths.
Worked example: two fair dice
Take \Omega_1 = \Omega_2 = \{1,2,3,4,5,6\}, each with the uniform measure
\mathbb{P}_i(\{k\}) = \tfrac16. The product space is
\Omega_1\times\Omega_2, the 36 ordered pairs, with
product measure \mathbb{P} = \mathbb{P}_1\otimes\mathbb{P}_2 assigning each
pair mass \tfrac16\cdot\tfrac16 = \tfrac{1}{36}. Let
A = \{\text{first die} = 6\} and
B = \{\text{second die is even}\}. As rectangles,
A = \{6\}\times\Omega_2 and
B = \Omega_1\times\{2,4,6\}, so
\mathbb{P}(A) = \tfrac16, \quad \mathbb{P}(B) = \tfrac12, \quad \mathbb{P}(A\cap B) = \mathbb{P}(\{6\}\times\{2,4,6\}) = \tfrac{3}{36} = \tfrac{1}{12} = \tfrac16\cdot\tfrac12.
The factorisation holds because the two dice live on orthogonal coordinates of the product space —
exactly the picture of the rectangle above. Now let X, Y be the two face
values, each with \mathbb{E}[X] = \tfrac{7}{2}. Because they are independent,
\mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y] = \tfrac72\cdot\tfrac72 = \tfrac{49}{4} = 12.25.
You can check this directly by averaging xy over all
36 pairs — it comes to 12.25. The next card
explains why that shortcut is legal.
Why \mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y]: Fubini on the product
This is the payoff, and the reason independence and product measures belong on one page. Let
X, Y be independent and integrable. The expectation of the
product is an integral against the joint law \mu_{(X,Y)} — which, by
independence, is the product measure \mu_X\otimes\mu_Y. So we may
apply Fubini's theorem and
split the double integral into iterated ones:
\mathbb{E}[XY]
= \int_{\mathbb{R}^2} xy \; d\mu_{(X,Y)}(x,y)
= \int_{\mathbb{R}^2} xy \; d(\mu_X\otimes\mu_Y)(x,y)
= \int_{\mathbb{R}}\!\!\int_{\mathbb{R}} xy \; d\mu_X(x)\, d\mu_Y(y).
Inside, y is constant for the x-integral, so it
pulls out, and the two single integrals separate cleanly:
= \int_{\mathbb{R}} y\left(\int_{\mathbb{R}} x\, d\mu_X(x)\right) d\mu_Y(y)
= \left(\int_{\mathbb{R}} x\, d\mu_X(x)\right)\!\left(\int_{\mathbb{R}} y\, d\mu_Y(y)\right)
= \mathbb{E}[X]\,\mathbb{E}[Y].
That is the whole proof. The reason we cite Fubini here is not decoration: without the
joint law being a genuine product measure, the double integral would not separate, and the identity
would fail. (Integrability of X and Y is what
licenses Fubini rather than merely Tonelli, since xy is signed;
\mathbb{E}|XY| = \mathbb{E}|X|\,\mathbb{E}|Y| < \infty is exactly the finite
double integral Fubini demands.)
The multiplicative property has an immediate corollary. For independent
X, Y with finite variance, the covariance vanishes —
\operatorname{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0 —
and therefore variances add:
\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y).
This is the seed of the whole theory of sums of independent variables — the law of large numbers, the
central limit theorem — all of which lean on variances adding.
Mutual independence of a family
For more than two objects, independence is a statement about all sub-collections at once. A
family of events (A_i)_{i\in I} is mutually independent if
for every finite subset J\subseteq I
\mathbb{P}\!\left(\bigcap_{i\in J} A_i\right) = \prod_{i\in J} \mathbb{P}(A_i).
Likewise a family of σ-algebras (\mathcal{G}_i) is mutually independent if
every finite selection of one event from distinct \mathcal{G}_i's
factorises, and random variables (X_i) are (mutually) independent if
(\sigma(X_i)) are — equivalently the joint law is the full product
\bigotimes_i \mu_{X_i}. It is essential that the factorisation is demanded of
every sub-collection, not just pairs. That distinction is the subject of the warning below.
Two classic traps, each a direction that the equivalence does not run in.
1. Pairwise independence does NOT imply mutual independence. Here is the standard
counterexample. Toss two fair coins independently and set
- A = \{\text{first coin is Heads}\},
- B = \{\text{second coin is Heads}\},
- C = \{\text{the two coins match}\} (both H or both T).
Each has probability \tfrac12. Any pair is independent: e.g.
\mathbb{P}(A\cap C) = \mathbb{P}(\text{HH}) = \tfrac14 = \tfrac12\cdot\tfrac12,
and similarly for A,B and B,C. But the three are
not mutually independent, because C is determined by
A and B together:
\mathbb{P}(A\cap B\cap C) = \mathbb{P}(\text{HH}) = \tfrac14 \ne \tfrac18 = \mathbb{P}(A)\mathbb{P}(B)\mathbb{P}(C).
Knowing any two of the three pins down the third exactly — so mutual independence fails even though
every pair is independent.
2. Uncorrelated does NOT imply independent. The identity
\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y] is a consequence of
independence, not equivalent to it. Let X be uniform on
\{-1, 0, 1\} and set Y = X^2. Then
\mathbb{E}[X] = 0 and
\mathbb{E}[XY] = \mathbb{E}[X^3] = 0 = \mathbb{E}[X]\mathbb{E}[Y], so they are
uncorrelated. Yet Y is a deterministic function of
X — as dependent as two variables can be. Zero covariance sees only a linear
relationship; independence is much stronger.
Remarkably close to yes — and this is Kolmogorov's great reorganisation of the subject. An infinite
sequence of independent coin tosses is nothing more than the product measure
\bigotimes_{n=1}^\infty \mathbb{P}_n on the product space
\{0,1\}^{\mathbb{N}}, whose existence is guaranteed by
Kolmogorov's extension theorem. Once you have that single infinite product measure,
an astonishing amount follows: the strong law of large numbers, the
0–1 law (any event depending only on the "tail"
of the sequence has probability 0 or 1), and the
entire modern theory of stochastic processes. The humble rectangle
A\times B of the picture above, iterated infinitely, is the stage on which
all of independent probability is performed.