Conditional Expectation

Ordinary expectation \mathbb{E}[X] collapses a random variable to a single number — the best constant guess of X when you know nothing. But we usually know something. Conditional expectation \mathbb{E}[X \mid \mathcal{G}] is the best guess of X given the information encoded in a sub-\sigma-algebra \mathcal{G} \subseteq \mathcal{F} — and that "best guess" is itself a random variable, not a number.

Formally, \mathbb{E}[X \mid \mathcal{G}] is the (almost surely unique) \mathcal{G}-measurable random variable that satisfies

\int_A \mathbb{E}[X \mid \mathcal{G}]\, d\mathbb{P} = \int_A X\, d\mathbb{P} \qquad \text{for every } A \in \mathcal{G}.

It must be measurable with respect to \mathcal{G} (you may only use information in \mathcal{G} to form it), yet it must carry the same average mass as X over every set you can resolve with \mathcal{G}. That averaging property pins it down uniquely.

The partition picture

The cleanest case: \mathcal{G} is generated by a partition of \Omega into blocks B_1, B_2, \dots. Knowing \mathcal{G} means knowing only which block you landed in — not the exact outcome. So the best guess of X must be the same throughout a block: \mathbb{E}[X \mid \mathcal{G}] is piecewise constant, equal on block B to the average of X over B,

\mathbb{E}[X \mid \mathcal{G}](\omega) = \frac{1}{\mathbb{P}(B)}\int_B X\, d\mathbb{P}, \qquad \omega \in B.

Step through the figure: first the raw values of X, then the flat level the conditional expectation snaps each block to — its average.

The properties that make it work

Four facts do almost all the labour in finance, and each is a one-line consequence of the defining property above.

Tower / iterated expectations. Averaging the average recovers the plain average: \mathbb{E}\!\big[\mathbb{E}[X \mid \mathcal{G}]\big] = \mathbb{E}[X] (take A = \Omega in the definition).
Taking out what is known. If X is \mathcal{G}-measurable it acts as a constant: \mathbb{E}[XY \mid \mathcal{G}] = X\,\mathbb{E}[Y \mid \mathcal{G}].
Independence. If X is independent of \mathcal{G}, the extra information is useless and \mathbb{E}[X \mid \mathcal{G}] = \mathbb{E}[X].
Linearity. \mathbb{E}[aX + bY \mid \mathcal{G}] = a\,\mathbb{E}[X \mid \mathcal{G}] + b\,\mathbb{E}[Y \mid \mathcal{G}].

These reappear the moment we meet martingales, where the whole "fair game" definition is a statement about a conditional expectation.