Expectation as a Lebesgue Integral
The expectation of a random variable is an average over the sample space, weighted
by probability. Written as an integral against the probability measure, it is
\mathbb{E}[X] \;=\; \int_{\Omega} X \, d\mathbb{P}.
This is a Lebesgue integral, not a Riemann one — and that choice is what
makes expectation behave so well under limits, conditioning and the convergence theorems we
will lean on later. The Lebesgue integral is built in three deliberate stages, each
extending the last. We need only the
integration idea of slicing by
value rather than by input.
Stage 1 — simple functions
A simple function takes finitely many values
a_1, \dots, a_n on a partition of \Omega
into measurable sets A_i:
X \;=\; \sum_{i=1}^{n} a_i \, \mathbf{1}_{A_i}.
Its expectation is forced on us — it is the value-weighted total of the probabilities:
\mathbb{E}[X] \;=\; \sum_{i=1}^{n} a_i \, \mathbb{P}(A_i).
In particular, taking X = \mathbf{1}_A recovers
\mathbb{E}[\mathbf{1}_A] = \mathbb{P}(A) — probability is just the
expectation of an indicator.
Stage 2 — non-negative functions
For a general X \ge 0 we approximate it from below by
simple functions and take the best such approximation:
\mathbb{E}[X] \;=\; \sup\Big\{\, \mathbb{E}[S] \;:\; S \text{ simple},\; 0 \le S \le X \,\Big\}.
The trick is to partition the range, not the domain: slice the
y-axis into finer and finer levels and, on each level, ask "with
what probability does X reach this high?". As the slicing refines,
the staircase climbs up to X and the simple-function expectations
increase to \mathbb{E}[X].
Stage 3 — general functions, and the payoff
Split any X into its positive and negative parts,
X = X^{+} - X^{-}, and define
\mathbb{E}[X] = \mathbb{E}[X^{+}] - \mathbb{E}[X^{-}] whenever not
both are infinite. Concretely this collapses to the formulas you already know — for a density
f and for a PMF p:
\mathbb{E}[X] = \int_{-\infty}^{\infty} x\, f(x)\, dx, \qquad \mathbb{E}[X] = \sum_k x_k\, p(x_k).
The two properties we use constantly drop straight out of the construction:
- Linearity: \mathbb{E}[aX + bY] = a\,\mathbb{E}[X] + b\,\mathbb{E}[Y].
- Monotonicity: if X \le Y then \mathbb{E}[X] \le \mathbb{E}[Y].