The Itô Isometry

The Itô integral of a simple process was easy to write down; extending it to every adapted integrand needs one more fact — an identity that controls the size of the integral. It is the Itô isometry:

\mathbb{E}\!\left[\left(\int_0^T H_s\, dW_s\right)^{2}\right] = \mathbb{E}\!\left[\int_0^T H_s^{2}\, ds\right].

On the left is the L^2(\Omega) size of a random variable — the mean-square magnitude of the integral, living in the probability space. On the right is an ordinary L^2(dt \times d\mathbb{P}) size of the integrand — a plain time-integral of H^2, then averaged. The isometry says these two are equal: the unwieldy stochastic object on the left is measured by the tame deterministic object on the right. That equality is the engine that drives the entire construction.

The derivation, line by line (simple H)

We prove it for a simple adapted process H_s = \sum_{i=0}^{n-1} H_{t_i}\mathbf{1}_{(t_i, t_{i+1}]}(s); the general case then follows by the very approximation the isometry makes possible. Write \Delta W_i = W_{t_{i+1}} - W_{t_i} and \Delta t_i = t_{i+1} - t_i, so the integral is \int_0^T H\, dW = \sum_i H_{t_i}\Delta W_i.

Step 1 — expand the square into a double sum. A finite sum squared is the double sum of all pairwise products:

\left(\sum_{i} H_{t_i}\Delta W_i\right)^{2} = \sum_{i}\sum_{j} H_{t_i} H_{t_j}\,\Delta W_i\,\Delta W_j.

Taking expectations and using linearity,

\mathbb{E}\!\left[\left(\int_0^T H\, dW\right)^{2}\right] = \sum_{i}\sum_{j} \mathbb{E}\big[\,H_{t_i} H_{t_j}\,\Delta W_i\,\Delta W_j\,\big].

Split the double sum into off-diagonal terms (i \neq j) and diagonal terms (i = j). We show every off-diagonal term is zero, then evaluate the diagonal.

Step 2 — the off-diagonal terms vanish. Take i < j (the case i > j is symmetric). Then the four factors H_{t_i}, H_{t_j}, \Delta W_i are all \mathcal{F}_{t_j}-measurable: H_{t_i} and \Delta W_i happened before t_j, and H_{t_j} is adapted, set at t_j. Only the last increment \Delta W_j reaches into the future. Condition on \mathcal{F}_{t_j} and pull out everything known:

\mathbb{E}\big[\,H_{t_i} H_{t_j}\,\Delta W_i\,\Delta W_j\,\big] = \mathbb{E}\Big[\,H_{t_i} H_{t_j}\,\Delta W_i\;\mathbb{E}\big[\Delta W_j \mid \mathcal{F}_{t_j}\big]\,\Big].

The future increment \Delta W_j is independent of \mathcal{F}_{t_j} and mean-zero, so the inner conditional expectation is 0, and the whole term collapses:

= \mathbb{E}\big[\,H_{t_i} H_{t_j}\,\Delta W_i \cdot 0\,\big] = 0.

Every cross term is killed by the same "future increment has no correlation with the past" mechanism that made the integral mean-zero. Only the diagonal survives.

Step 3 — the diagonal terms. On the diagonal i = j the term is \mathbb{E}\big[H_{t_i}^2\,(\Delta W_i)^2\big]. Condition on \mathcal{F}_{t_i} and pull out the known coefficient H_{t_i}^2:

\mathbb{E}\big[\,H_{t_i}^2\,(\Delta W_i)^2\,\big] = \mathbb{E}\Big[\,H_{t_i}^2\;\mathbb{E}\big[(\Delta W_i)^2 \mid \mathcal{F}_{t_i}\big]\,\Big].

The squared increment is independent of \mathcal{F}_{t_i}, so its conditional mean is its plain mean — the variance of a mean-zero N(0, \Delta t_i):

\mathbb{E}\big[(\Delta W_i)^2 \mid \mathcal{F}_{t_i}\big] = \mathbb{E}\big[(\Delta W_i)^2\big] = \operatorname{Var}(\Delta W_i) = \Delta t_i.

Therefore each diagonal term is

\mathbb{E}\big[\,H_{t_i}^2\,(\Delta W_i)^2\,\big] = \mathbb{E}\big[\,H_{t_i}^2\,\big]\,\Delta t_i.

Step 4 — sum the diagonal and recognise the time-integral. Adding the surviving terms,

\mathbb{E}\!\left[\left(\int_0^T H\, dW\right)^{2}\right] = \sum_{i=0}^{n-1} \mathbb{E}\big[H_{t_i}^2\big]\,\Delta t_i = \mathbb{E}\!\left[\sum_{i=0}^{n-1} H_{t_i}^2\,\Delta t_i\right].

But \sum_i H_{t_i}^2\,\Delta t_i is exactly the (deterministic-in-time) Riemann sum of \int_0^T H_s^2\, ds for the step integrand H^2 — and since H is simple, it equals that integral on the nose. Hence

\mathbb{E}\!\left[\left(\int_0^T H\, dW\right)^{2}\right] = \mathbb{E}\!\left[\int_0^T H_s^2\, ds\right].

Two ingredients did everything: independent increments killed the off-diagonal, and variance = elapsed time ((dW)^2 = dt again) evaluated the diagonal.

Let H be an adapted process with \mathbb{E}\big[\int_0^T H_s^2\, ds\big] < \infty. Then the Itô integral is an isometry from L^2(dt \times d\mathbb{P}) into L^2(\Omega): \mathbb{E}\!\left[\left(\int_0^T H_s\, dW_s\right)^{2}\right] = \mathbb{E}\!\left[\int_0^T H_s^{2}\, ds\right]. Equivalently, the L^2(\Omega) norm of the integral equals the L^2(dt\times d\mathbb{P}) norm of the integrand, \big\|\int_0^T H\, dW\big\|_{L^2(\Omega)} = \|H\|_{L^2(dt\times d\mathbb{P})}.

This identity is not a footnote — it is what lets Stage 2 of the construction even make sense. Suppose H is a general adapted L^2 integrand and H^{(m)} is a sequence of simple processes approximating it in L^2(dt\times d\mathbb{P}), so \|H^{(m)} - H^{(k)}\| \to 0 as m, k \to \infty. Apply the isometry to the difference (using linearity of the integral on simple processes):

\mathbb{E}\!\left[\left(\int H^{(m)} dW - \int H^{(k)} dW\right)^{2}\right] = \mathbb{E}\!\left[\int_0^T \big(H^{(m)}_s - H^{(k)}_s\big)^2\, ds\right] \longrightarrow 0.

So a Cauchy sequence of integrands maps to a Cauchy sequence of integrals in L^2(\Omega). Because L^2(\Omega) is complete, that sequence has a limit, and the isometry also forces the limit to be unique — two approximating sequences for the same H differ by something of vanishing norm, hence have the same limit. That limit is the definition of \int_0^T H\, dW for general adapted H. The isometry is the bridge that carries the easy step-function definition across to every square-integrable integrand.

The analogy is to Parseval / Plancherel: the Fourier transform is an isometry between two L^2 spaces, and that single fact lets it be extended from nice test functions to all of L^2 by continuity. The Itô isometry plays exactly that role for the stochastic integral — an isometry between L^2(dt\times d\mathbb{P}) and L^2(\Omega), and the extension is "by continuity" in precisely the same sense.

Putting it to work

The isometry is also the everyday tool for computing second moments of stochastic integrals, since the right-hand side is an ordinary integral. Two quick examples for a deterministic integrand (where the outer expectation is trivial):

With H_s \equiv 1,

\mathbb{E}\!\left[\left(\int_0^T 1\, dW\right)^{2}\right] = \int_0^T 1^2\, ds = T,

which is just \mathbb{E}[W_T^2] = T — a sanity check, since \int_0^T dW = W_T. With H_s = s,

\mathbb{E}\!\left[\left(\int_0^T s\, dW_s\right)^{2}\right] = \int_0^T s^2\, ds = \frac{T^3}{3}.

No path-by-path heroics: a hard mean-square of a random integral became a one-line calculus exercise.

Only the diagonal survives

The picture of the proof: lay out the n \times n grid of pairwise terms \mathbb{E}[H_{t_i} H_{t_j}\,\Delta W_i\,\Delta W_j]. Every off-diagonal cell (i \neq j) is killed by the independent future increment and vanishes; only the diagonal cells survive, each contributing \mathbb{E}[H_{t_i}^2]\,\Delta t_i — and their sum is \mathbb{E}[\int_0^T H^2\, ds].