Independence and Product Measures

Roll a die, then roll it again. The second roll does not care what the first one did — the two are independent. In an elementary course you met this as a rule about numbers: independent events multiply,

\mathbb{P}(A \cap B) = \mathbb{P}(A)\,\mathbb{P}(B).

That formula is correct, and it is where we begin. But measure theory lets us see what independence really is, structurally: it is the statement that a joint experiment factorises into a product. Two independent random variables have a joint law that is a product measure; the whole apparatus of Fubini's theorem then applies, and the familiar consequences — \mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y], variances that add — drop out as one-line integrals over a rectangle. This page is the single idea that independence = product structure, told from every angle.

From events to σ-algebras

Fix a probability space (\Omega, \mathcal{F}, \mathbb{P}). Two events A, B \in \mathcal{F} are independent when \mathbb{P}(A\cap B) = \mathbb{P}(A)\mathbb{P}(B). The measure-theoretic move is to stop talking about single events and talk about whole sub-σ-algebras at once. Two sub-σ-algebras \mathcal{G}_1, \mathcal{G}_2 \subseteq \mathcal{F} are independent if

\mathbb{P}(G_1 \cap G_2) = \mathbb{P}(G_1)\,\mathbb{P}(G_2) \qquad\text{for all } G_1 \in \mathcal{G}_1,\ G_2 \in \mathcal{G}_2.

A σ-algebra is the collection of all questions you could answer from some source of information. Independence of \mathcal{G}_1 and \mathcal{G}_2 says: every yes/no question answerable from the first source is probabilistically unlinked from every question answerable from the second. That is a far stronger, cleaner statement than a single-event identity, and it is the right level of generality — because random variables carry σ-algebras with them.

A random variable X:\Omega\to\mathbb{R} generates a σ-algebra \sigma(X) = \{X^{-1}(B) : B \in \mathcal{B}(\mathbb{R})\} — the information "what is the value of X?" We say X and Y are independent random variables precisely when their generated σ-algebras are independent:

X \perp\!\!\!\perp Y \iff \sigma(X) \text{ and } \sigma(Y) \text{ are independent} \iff \mathbb{P}(X\in C,\ Y \in D) = \mathbb{P}(X\in C)\,\mathbb{P}(Y\in D)

for all Borel sets C, D. In practice you rarely check every Borel pair: a π-system argument (Dynkin's lemma) shows it is enough to verify the factorisation on generating sets — half-lines (-\infty, x] — which is exactly the statement about distribution functions below.

The product σ-algebra and the product measure

Suppose we have two probability spaces, (\Omega_1, \mathcal{F}_1, \mathbb{P}_1) and (\Omega_2, \mathcal{F}_2, \mathbb{P}_2), and we want to run both experiments at once. The sample space is the Cartesian product \Omega_1 \times \Omega_2. What is its σ-algebra? The natural building blocks are measurable rectangles A \times B with A \in \mathcal{F}_1, B \in \mathcal{F}_2. The product σ-algebra is the one they generate:

\mathcal{F}_1 \otimes \mathcal{F}_2 := \sigma\bigl(\{A \times B : A \in \mathcal{F}_1,\ B \in \mathcal{F}_2\}\bigr).

On this σ-algebra there is a unique probability measure — the product measure \mathbb{P}_1 \otimes \mathbb{P}_2 — characterised by what it must do on rectangles:

Independence, restated one final time, is now a single equation between measures. Let \mu_X, \mu_Y be the laws (push-forward distributions) of X and Y, and let \mu_{(X,Y)} be the joint law of the pair (X,Y):\Omega\to\mathbb{R}^2. Then

X \perp\!\!\!\perp Y \iff \mu_{(X,Y)} = \mu_X \otimes \mu_Y.

Independence is the joint law being a product measure. Everything else on this page is a consequence of that one line.

Three equivalent faces of the same fact

It helps to hold the equivalences together. For random variables X, Y the following all say X\perp\!\!\!\perp Y:

The CDF version is the working criterion — it only involves half-lines, and by the π-system lemma that is enough to force the full product structure. The measure version is the conceptual one — it makes the multiplicative property below a triviality via Fubini.

Above: the unit square [0,1]^2 carrying the product measure, whose value on a set is simply its area. Slide a and b to resize the events A = [0,a] (on the horizontal axis) and B = [0,b] (vertical). The shaded rectangle is the event A\times B, and its area is always a\cdot b = \mathbb{P}(A)\,\mathbb{P}(B) — independence made visible as the area of a rectangle equalling the product of its side lengths.

Worked example: two fair dice

Take \Omega_1 = \Omega_2 = \{1,2,3,4,5,6\}, each with the uniform measure \mathbb{P}_i(\{k\}) = \tfrac16. The product space is \Omega_1\times\Omega_2, the 36 ordered pairs, with product measure \mathbb{P} = \mathbb{P}_1\otimes\mathbb{P}_2 assigning each pair mass \tfrac16\cdot\tfrac16 = \tfrac{1}{36}. Let A = \{\text{first die} = 6\} and B = \{\text{second die is even}\}. As rectangles, A = \{6\}\times\Omega_2 and B = \Omega_1\times\{2,4,6\}, so

\mathbb{P}(A) = \tfrac16, \quad \mathbb{P}(B) = \tfrac12, \quad \mathbb{P}(A\cap B) = \mathbb{P}(\{6\}\times\{2,4,6\}) = \tfrac{3}{36} = \tfrac{1}{12} = \tfrac16\cdot\tfrac12.

The factorisation holds because the two dice live on orthogonal coordinates of the product space — exactly the picture of the rectangle above. Now let X, Y be the two face values, each with \mathbb{E}[X] = \tfrac{7}{2}. Because they are independent,

\mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y] = \tfrac72\cdot\tfrac72 = \tfrac{49}{4} = 12.25.

You can check this directly by averaging xy over all 36 pairs — it comes to 12.25. The next card explains why that shortcut is legal.

Why \mathbb{E}[XY] = \mathbb{E}[X]\,\mathbb{E}[Y]: Fubini on the product

This is the payoff, and the reason independence and product measures belong on one page. Let X, Y be independent and integrable. The expectation of the product is an integral against the joint law \mu_{(X,Y)} — which, by independence, is the product measure \mu_X\otimes\mu_Y. So we may apply Fubini's theorem and split the double integral into iterated ones:

\mathbb{E}[XY] = \int_{\mathbb{R}^2} xy \; d\mu_{(X,Y)}(x,y) = \int_{\mathbb{R}^2} xy \; d(\mu_X\otimes\mu_Y)(x,y) = \int_{\mathbb{R}}\!\!\int_{\mathbb{R}} xy \; d\mu_X(x)\, d\mu_Y(y).

Inside, y is constant for the x-integral, so it pulls out, and the two single integrals separate cleanly:

= \int_{\mathbb{R}} y\left(\int_{\mathbb{R}} x\, d\mu_X(x)\right) d\mu_Y(y) = \left(\int_{\mathbb{R}} x\, d\mu_X(x)\right)\!\left(\int_{\mathbb{R}} y\, d\mu_Y(y)\right) = \mathbb{E}[X]\,\mathbb{E}[Y].

That is the whole proof. The reason we cite Fubini here is not decoration: without the joint law being a genuine product measure, the double integral would not separate, and the identity would fail. (Integrability of X and Y is what licenses Fubini rather than merely Tonelli, since xy is signed; \mathbb{E}|XY| = \mathbb{E}|X|\,\mathbb{E}|Y| < \infty is exactly the finite double integral Fubini demands.)

The multiplicative property has an immediate corollary. For independent X, Y with finite variance, the covariance vanishes — \operatorname{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0 — and therefore variances add:

\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y).

This is the seed of the whole theory of sums of independent variables — the law of large numbers, the central limit theorem — all of which lean on variances adding.

Mutual independence of a family

For more than two objects, independence is a statement about all sub-collections at once. A family of events (A_i)_{i\in I} is mutually independent if for every finite subset J\subseteq I

\mathbb{P}\!\left(\bigcap_{i\in J} A_i\right) = \prod_{i\in J} \mathbb{P}(A_i).

Likewise a family of σ-algebras (\mathcal{G}_i) is mutually independent if every finite selection of one event from distinct \mathcal{G}_i's factorises, and random variables (X_i) are (mutually) independent if (\sigma(X_i)) are — equivalently the joint law is the full product \bigotimes_i \mu_{X_i}. It is essential that the factorisation is demanded of every sub-collection, not just pairs. That distinction is the subject of the warning below.

Two classic traps, each a direction that the equivalence does not run in.

1. Pairwise independence does NOT imply mutual independence. Here is the standard counterexample. Toss two fair coins independently and set

Each has probability \tfrac12. Any pair is independent: e.g. \mathbb{P}(A\cap C) = \mathbb{P}(\text{HH}) = \tfrac14 = \tfrac12\cdot\tfrac12, and similarly for A,B and B,C. But the three are not mutually independent, because C is determined by A and B together:

\mathbb{P}(A\cap B\cap C) = \mathbb{P}(\text{HH}) = \tfrac14 \ne \tfrac18 = \mathbb{P}(A)\mathbb{P}(B)\mathbb{P}(C).

Knowing any two of the three pins down the third exactly — so mutual independence fails even though every pair is independent.

2. Uncorrelated does NOT imply independent. The identity \mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y] is a consequence of independence, not equivalent to it. Let X be uniform on \{-1, 0, 1\} and set Y = X^2. Then \mathbb{E}[X] = 0 and \mathbb{E}[XY] = \mathbb{E}[X^3] = 0 = \mathbb{E}[X]\mathbb{E}[Y], so they are uncorrelated. Yet Y is a deterministic function of X — as dependent as two variables can be. Zero covariance sees only a linear relationship; independence is much stronger.

Remarkably close to yes — and this is Kolmogorov's great reorganisation of the subject. An infinite sequence of independent coin tosses is nothing more than the product measure \bigotimes_{n=1}^\infty \mathbb{P}_n on the product space \{0,1\}^{\mathbb{N}}, whose existence is guaranteed by Kolmogorov's extension theorem. Once you have that single infinite product measure, an astonishing amount follows: the strong law of large numbers, the 01 law (any event depending only on the "tail" of the sequence has probability 0 or 1), and the entire modern theory of stochastic processes. The humble rectangle A\times B of the picture above, iterated infinitely, is the stage on which all of independent probability is performed.