Conditional Expectation
Ordinary expectation
\mathbb{E}[X] collapses a random variable to a single number — the
best constant guess of X when you know nothing. But we
usually know something. Conditional expectation
\mathbb{E}[X \mid \mathcal{G}] is the best guess of
X given the information encoded in a sub-\sigma-algebra
\mathcal{G} \subseteq \mathcal{F} — and that "best guess" is itself a
random variable, not a number.
Formally, \mathbb{E}[X \mid \mathcal{G}] is the (almost surely unique)
\mathcal{G}-measurable random variable that satisfies
\int_A \mathbb{E}[X \mid \mathcal{G}]\, d\mathbb{P} = \int_A X\, d\mathbb{P} \qquad \text{for every } A \in \mathcal{G}.
It must be measurable with respect to \mathcal{G} (you may only use
information in \mathcal{G} to form it), yet it must carry the
same average mass as X over every set you can resolve with
\mathcal{G}. That averaging property pins it down uniquely.
The partition picture
The cleanest case: \mathcal{G} is generated by a partition
of \Omega into blocks B_1, B_2, \dots. Knowing
\mathcal{G} means knowing only which block you landed in — not
the exact outcome. So the best guess of X must be the same throughout a
block: \mathbb{E}[X \mid \mathcal{G}] is piecewise constant,
equal on block B to the average of X over
B,
\mathbb{E}[X \mid \mathcal{G}](\omega) = \frac{1}{\mathbb{P}(B)}\int_B X\, d\mathbb{P}, \qquad \omega \in B.
Step through the figure: first the raw values of X, then the flat level
the conditional expectation snaps each block to — its average.
The properties that make it work
Four facts do almost all the labour in finance, and each is a one-line consequence of the
defining property above.
-
Tower / iterated expectations. Averaging the average recovers the plain
average: \mathbb{E}\!\big[\mathbb{E}[X \mid \mathcal{G}]\big] = \mathbb{E}[X]
(take A = \Omega in the definition).
-
Taking out what is known. If X is
\mathcal{G}-measurable it acts as a constant:
\mathbb{E}[XY \mid \mathcal{G}] = X\,\mathbb{E}[Y \mid \mathcal{G}].
-
Independence. If X is
independent
of \mathcal{G}, the extra information is useless and
\mathbb{E}[X \mid \mathcal{G}] = \mathbb{E}[X].
-
Linearity.
\mathbb{E}[aX + bY \mid \mathcal{G}] = a\,\mathbb{E}[X \mid \mathcal{G}] + b\,\mathbb{E}[Y \mid \mathcal{G}].
These reappear the moment we meet
martingales,
where the whole "fair game" definition is a statement about a conditional expectation.