Independence and Product Measures

Roll a die, then roll it again. The second roll does not care what the first one did — the two are independent. In an elementary course you met this as a rule about numbers: independent events multiply,

\mathbb{P}(A \cap B) = \mathbb{P}(A)\,\mathbb{P}(B).

That formula is correct, and it is where we begin. But measure theory lets us see what independence really is, structurally: it is the statement that a joint experiment factorises into a product. Two independent random variables have a joint law that is a product measure; the whole apparatus of Fubini's theorem then applies, and the familiar consequences — \mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y], variances that add — drop out as one-line integrals over a rectangle. This page is the single idea that independence = product structure, told from every angle.

From events to σ-algebras

Fix a probability space (\Omega, \mathcal{F}, \mathbb{P}). Two events A, B \in \mathcal{F} are independent when \mathbb{P}(A\cap B) = \mathbb{P}(A)\mathbb{P}(B). The measure-theoretic move is to stop talking about single events and talk about whole sub-σ-algebras at once. Two sub-σ-algebras \mathcal{G}_1, \mathcal{G}_2 \subseteq \mathcal{F} are independent if

\mathbb{P}(G_1 \cap G_2) = \mathbb{P}(G_1)\,\mathbb{P}(G_2) \qquad\text{for all } G_1 \in \mathcal{G}_1,\ G_2 \in \mathcal{G}_2.

A σ-algebra is the collection of all questions you could answer from some source of information. Independence of \mathcal{G}_1 and \mathcal{G}_2 says: every yes/no question answerable from the first source is probabilistically unlinked from every question answerable from the second. That is a far stronger, cleaner statement than a single-event identity, and it is the right level of generality — because random variables carry σ-algebras with them.

A random variable X:\Omega\to\mathbb{R} generates a σ-algebra \sigma(X) = \{X^{-1}(B) : B \in \mathcal{B}(\mathbb{R})\} — the information "what is the value of X?" We say X and Y are independent random variables precisely when their generated σ-algebras are independent:

X \perp\!\!\!\perp Y \iff \sigma(X) \text{ and } \sigma(Y) \text{ are independent} \iff \mathbb{P}(X\in C,\ Y \in D) = \mathbb{P}(X\in C)\,\mathbb{P}(Y\in D)

for all Borel sets C, D. In practice you rarely check every Borel pair: a π-system argument (Dynkin's lemma) shows it is enough to verify the factorisation on generating sets — half-lines (-\infty, x] — which is exactly the statement about distribution functions below.

The product σ-algebra and the product measure

Suppose we have two probability spaces, (\Omega_1, \mathcal{F}_1, \mathbb{P}_1) and (\Omega_2, \mathcal{F}_2, \mathbb{P}_2), and we want to run both experiments at once. The sample space is the Cartesian product \Omega_1 \times \Omega_2. What is its σ-algebra? The natural building blocks are measurable rectangles A \times B with A \in \mathcal{F}_1, B \in \mathcal{F}_2. The product σ-algebra is the one they generate:

\mathcal{F}_1 \otimes \mathcal{F}_2 := \sigma\bigl(\{A \times B : A \in \mathcal{F}_1,\ B \in \mathcal{F}_2\}\bigr).

On this σ-algebra there is a unique probability measure — the product measure \mathbb{P}_1 \otimes \mathbb{P}_2 — characterised by what it must do on rectangles:

There is a unique probability measure \mathbb{P}_1 \otimes \mathbb{P}_2 on (\Omega_1\times\Omega_2,\ \mathcal{F}_1\otimes\mathcal{F}_2) with (\mathbb{P}_1\otimes\mathbb{P}_2)(A\times B) = \mathbb{P}_1(A)\,\mathbb{P}_2(B) for every measurable rectangle.
Uniqueness is a π-system argument: the rectangles are closed under intersection and generate \mathcal{F}_1\otimes\mathcal{F}_2, so two measures agreeing on them agree everywhere.
Existence is exactly the content of the Fubini–Tonelli construction: the measure of a general set E is obtained by integrating its slices, (\mathbb{P}_1\otimes\mathbb{P}_2)(E) = \int_{\Omega_1} \mathbb{P}_2(E_{\omega_1})\, d\mathbb{P}_1(\omega_1).

Independence, restated one final time, is now a single equation between measures. Let \mu_X, \mu_Y be the laws (push-forward distributions) of X and Y, and let \mu_{(X,Y)} be the joint law of the pair (X,Y):\Omega\to\mathbb{R}^2. Then

X \perp\!\!\!\perp Y \iff \mu_{(X,Y)} = \mu_X \otimes \mu_Y.

Independence is the joint law being a product measure. Everything else on this page is a consequence of that one line.

Three equivalent faces of the same fact

It helps to hold the equivalences together. For random variables X, Y the following all say X\perp\!\!\!\perp Y:

σ-algebras: \sigma(X) and \sigma(Y) are independent — \mathbb{P}(X\in C, Y\in D) = \mathbb{P}(X\in C)\mathbb{P}(Y\in D).
Measures: the joint law factorises, \mu_{(X,Y)} = \mu_X \otimes \mu_Y.
Distribution functions: the joint CDF factorises, F_{X,Y}(x,y) = F_X(x)\,F_Y(y) for all x, y (and, when densities exist, f_{X,Y}(x,y) = f_X(x)\,f_Y(y)).

The CDF version is the working criterion — it only involves half-lines, and by the π-system lemma that is enough to force the full product structure. The measure version is the conceptual one — it makes the multiplicative property below a triviality via Fubini.

Above: the unit square [0,1]^2 carrying the product measure, whose value on a set is simply its area. Slide a and b to resize the events A = [0,a] (on the horizontal axis) and B = [0,b] (vertical). The shaded rectangle is the event A\times B, and its area is always a\cdot b = \mathbb{P}(A)\,\mathbb{P}(B) — independence made visible as the area of a rectangle equalling the product of its side lengths.

Worked example: two fair dice

Take \Omega_1 = \Omega_2 = \{1,2,3,4,5,6\}, each with the uniform measure \mathbb{P}_i(\{k\}) = \tfrac16. The product space is \Omega_1\times\Omega_2, the 36 ordered pairs, with product measure \mathbb{P} = \mathbb{P}_1\otimes\mathbb{P}_2 assigning each pair mass \tfrac16\cdot\tfrac16 = \tfrac{1}{36}. Let A = \{\text{first die} = 6\} and B = \{\text{second die is even}\}. As rectangles, A = \{6\}\times\Omega_2 and B = \Omega_1\times\{2,4,6\}, so

\mathbb{P}(A) = \tfrac16, \quad \mathbb{P}(B) = \tfrac12, \quad \mathbb{P}(A\cap B) = \mathbb{P}(\{6\}\times\{2,4,6\}) = \tfrac{3}{36} = \tfrac{1}{12} = \tfrac16\cdot\tfrac12.

The factorisation holds because the two dice live on orthogonal coordinates of the product space — exactly the picture of the rectangle above. Now let X, Y be the two face values, each with \mathbb{E}[X] = \tfrac{7}{2}. Because they are independent,

\mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y] = \tfrac72\cdot\tfrac72 = \tfrac{49}{4} = 12.25.

You can check this directly by averaging xy over all 36 pairs — it comes to 12.25. The next card explains why that shortcut is legal.

Why \mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y]: Fubini on the product

This is the payoff, and the reason independence and product measures belong on one page. Let X, Y be independent and integrable. The expectation of the product is an integral against the joint law \mu_{(X,Y)} — which, by independence, is the product measure \mu_X\otimes\mu_Y. So we may apply Fubini's theorem and split the double integral into iterated ones:

\mathbb{E}[XY] = \int_{\mathbb{R}^2} xy \; d\mu_{(X,Y)}(x,y) = \int_{\mathbb{R}^2} xy \; d(\mu_X\otimes\mu_Y)(x,y) = \int_{\mathbb{R}}\!\!\int_{\mathbb{R}} xy \; d\mu_X(x)\, d\mu_Y(y).

Inside, y is constant for the x-integral, so it pulls out, and the two single integrals separate cleanly:

= \int_{\mathbb{R}} y\left(\int_{\mathbb{R}} x\, d\mu_X(x)\right) d\mu_Y(y) = \left(\int_{\mathbb{R}} x\, d\mu_X(x)\right)\!\left(\int_{\mathbb{R}} y\, d\mu_Y(y)\right) = \mathbb{E}[X]\,\mathbb{E}[Y].

That is the whole proof. The reason we cite Fubini here is not decoration: without the joint law being a genuine product measure, the double integral would not separate, and the identity would fail. (Integrability of X and Y is what licenses Fubini rather than merely Tonelli, since xy is signed; \mathbb{E}|XY| = \mathbb{E}|X|\,\mathbb{E}|Y| < \infty is exactly the finite double integral Fubini demands.)

The multiplicative property has an immediate corollary. For independent X, Y with finite variance, the covariance vanishes — \operatorname{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0 — and therefore variances add:

\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y).

This is the seed of the whole theory of sums of independent variables — the law of large numbers, the central limit theorem — all of which lean on variances adding.

Mutual independence of a family

For more than two objects, independence is a statement about all sub-collections at once. A family of events (A_i)_{i\in I} is mutually independent if for every finite subset J\subseteq I

\mathbb{P}\!\left(\bigcap_{i\in J} A_i\right) = \prod_{i\in J} \mathbb{P}(A_i).

Likewise a family of σ-algebras (\mathcal{G}_i) is mutually independent if every finite selection of one event from distinct \mathcal{G}_i's factorises, and random variables (X_i) are (mutually) independent if (\sigma(X_i)) are — equivalently the joint law is the full product \bigotimes_i \mu_{X_i}. It is essential that the factorisation is demanded of every sub-collection, not just pairs. That distinction is the subject of the warning below.

Two classic traps, each a direction that the equivalence does not run in.

1. Pairwise independence does NOT imply mutual independence. Here is the standard counterexample. Toss two fair coins independently and set

A = \{\text{first coin is Heads}\},
B = \{\text{second coin is Heads}\},
C = \{\text{the two coins match}\} (both H or both T).

Each has probability \tfrac12. Any pair is independent: e.g. \mathbb{P}(A\cap C) = \mathbb{P}(\text{HH}) = \tfrac14 = \tfrac12\cdot\tfrac12, and similarly for A,B and B,C. But the three are not mutually independent, because C is determined by A and B together:

\mathbb{P}(A\cap B\cap C) = \mathbb{P}(\text{HH}) = \tfrac14 \ne \tfrac18 = \mathbb{P}(A)\mathbb{P}(B)\mathbb{P}(C).

Knowing any two of the three pins down the third exactly — so mutual independence fails even though every pair is independent.

2. Uncorrelated does NOT imply independent. The identity \mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y] is a consequence of independence, not equivalent to it. Let X be uniform on \{-1, 0, 1\} and set Y = X^2. Then \mathbb{E}[X] = 0 and \mathbb{E}[XY] = \mathbb{E}[X^3] = 0 = \mathbb{E}[X]\mathbb{E}[Y], so they are uncorrelated. Yet Y is a deterministic function of X — as dependent as two variables can be. Zero covariance sees only a linear relationship; independence is much stronger.

Remarkably close to yes — and this is Kolmogorov's great reorganisation of the subject. An infinite sequence of independent coin tosses is nothing more than the product measure \bigotimes_{n=1}^\infty \mathbb{P}_n on the product space \{0,1\}^{\mathbb{N}}, whose existence is guaranteed by Kolmogorov's extension theorem. Once you have that single infinite product measure, an astonishing amount follows: the strong law of large numbers, the 0–1 law (any event depending only on the "tail" of the sequence has probability 0 or 1), and the entire modern theory of stochastic processes. The humble rectangle A\times B of the picture above, iterated infinitely, is the stage on which all of independent probability is performed.