Expectation as a Lebesgue Integral

The expectation of a random variable is an average over the sample space, weighted by probability. Written as an integral against the probability measure, it is

\mathbb{E}[X] \;=\; \int_{\Omega} X \, d\mathbb{P}.

This is a Lebesgue integral, not a Riemann one — and that choice is what makes expectation behave so well under limits, conditioning and the convergence theorems we will lean on later. The Lebesgue integral is built in three deliberate stages, each extending the last. We need only the integration idea of slicing by value rather than by input.

Stage 1 — simple functions

A simple function takes finitely many values a_1, \dots, a_n on a partition of \Omega into measurable sets A_i:

X \;=\; \sum_{i=1}^{n} a_i \, \mathbf{1}_{A_i}.

Its expectation is forced on us — it is the value-weighted total of the probabilities:

\mathbb{E}[X] \;=\; \sum_{i=1}^{n} a_i \, \mathbb{P}(A_i).

In particular, taking X = \mathbf{1}_A recovers \mathbb{E}[\mathbf{1}_A] = \mathbb{P}(A) — probability is just the expectation of an indicator.

Stage 2 — non-negative functions

For a general X \ge 0 we approximate it from below by simple functions and take the best such approximation:

\mathbb{E}[X] \;=\; \sup\Big\{\, \mathbb{E}[S] \;:\; S \text{ simple},\; 0 \le S \le X \,\Big\}.

The trick is to partition the range, not the domain: slice the y-axis into finer and finer levels and, on each level, ask "with what probability does X reach this high?". As the slicing refines, the staircase climbs up to X and the simple-function expectations increase to \mathbb{E}[X].

Stage 3 — general functions, and the payoff

Split any X into its positive and negative parts, X = X^{+} - X^{-}, and define \mathbb{E}[X] = \mathbb{E}[X^{+}] - \mathbb{E}[X^{-}] whenever not both are infinite. Concretely this collapses to the formulas you already know — for a density f and for a PMF p:

\mathbb{E}[X] = \int_{-\infty}^{\infty} x\, f(x)\, dx, \qquad \mathbb{E}[X] = \sum_k x_k\, p(x_k).

The two properties we use constantly drop straight out of the construction:

Linearity: \mathbb{E}[aX + bY] = a\,\mathbb{E}[X] + b\,\mathbb{E}[Y].
Monotonicity: if X \le Y then \mathbb{E}[X] \le \mathbb{E}[Y].