Quadratic Variation

Here is the single fact that makes stochastic calculus different from ordinary calculus. Partition the interval [0, t] into n pieces 0 = t_0 < t_1 < \cdots < t_n = t and add up the squared increments of a Brownian path. As the mesh \max_i (t_{i+1} - t_i) shrinks to zero, this sum does not vanish — it converges to t:

\sum_{i=0}^{n-1} \left( W_{t_{i+1}} - W_{t_i} \right)^2 \;\xrightarrow{\;L^2\;}\; t.

The convergence is in the mean-square (L^2) sense: the random sum tightens around the number t as the limit is taken. So the quadratic variation of Brownian motion is

[W]_t = t \;-\; \text{not zero, and not random.}

The derivation, line by line

This is the centrepiece, so let us prove it with no steps skipped. Abbreviate the i-th increment and its time-step by

\Delta W_i = W_{t_{i+1}} - W_{t_i}, \qquad \Delta t_i = t_{i+1} - t_i,

and write the random sum we are studying as

Q_n = \sum_{i=0}^{n-1} \big(\Delta W_i\big)^2.

We will show two things — its mean is exactly t, and its variance goes to 0 — and then read off the convergence.

Step 1: the mean is t. Each increment runs over an interval of length \Delta t_i, so by the Gaussian-increments property

\Delta W_i \sim N(0,\, \Delta t_i).

For a mean-zero variable the expected square is the variance, so

\mathbb{E}\big[(\Delta W_i)^2\big] = \operatorname{Var}(\Delta W_i) = \Delta t_i.

Take the expectation of Q_n term by term (expectation is linear) and sum:

\mathbb{E}[Q_n] = \sum_{i=0}^{n-1} \mathbb{E}\big[(\Delta W_i)^2\big] = \sum_{i=0}^{n-1} \Delta t_i.

But the time-steps \Delta t_i = t_{i+1} - t_i are a telescoping sum: consecutive endpoints cancel, leaving only the outermost,

\sum_{i=0}^{n-1} \Delta t_i = (t_n - t_0) = t - 0 = t.

So \mathbb{E}[Q_n] = t exactly, for every partition, however coarse. The sum is already centred on the right answer; what remains is to show it stops wobbling.

Step 2: the variance vanishes. The increments \Delta W_i live over disjoint intervals, so by the independent-increments property the terms (\Delta W_i)^2 are independent. The variance of a sum of independent terms is the sum of their variances (no cross terms):

\operatorname{Var}(Q_n) = \sum_{i=0}^{n-1} \operatorname{Var}\big((\Delta W_i)^2\big).

Now we need the variance of a squared Gaussian. For X \sim N(0, \sigma^2) one has \operatorname{Var}(X^2) = 2\sigma^4 (the fourth-moment fact, derived in the vignette below from \mathbb{E}[Z^4] = 3 for a standard normal Z). Here \sigma^2 = \Delta t_i, so

\operatorname{Var}\big((\Delta W_i)^2\big) = 2\,(\Delta t_i)^2.

Substituting back,

\operatorname{Var}(Q_n) = \sum_{i=0}^{n-1} 2\,(\Delta t_i)^2 = 2 \sum_{i=0}^{n-1} (\Delta t_i)^2.

To bound this, pull one factor of \Delta t_i out of each square and replace it by the largest step — the mesh \|\Delta\| = \max_i \Delta t_i:

(\Delta t_i)^2 = \Delta t_i \cdot \Delta t_i \le \|\Delta\| \cdot \Delta t_i.

Summing the bound and using the telescoping identity \sum_i \Delta t_i = t from Step 1,

\operatorname{Var}(Q_n) \le 2 \sum_{i=0}^{n-1} \|\Delta\|\, \Delta t_i = 2\,\|\Delta\| \sum_{i=0}^{n-1} \Delta t_i = 2\,\|\Delta\|\, t.

As the partition refines the mesh shrinks, \|\Delta\| \to 0, and therefore

\operatorname{Var}(Q_n) \le 2\,\|\Delta\|\, t \longrightarrow 0.

Step 3: put them together — convergence in L^2. Mean-square (L^2) convergence to the constant t means the expected squared distance \mathbb{E}\big[(Q_n - t)^2\big] goes to zero. Because \mathbb{E}[Q_n] = t exactly, that distance is precisely the variance:

\mathbb{E}\big[(Q_n - t)^2\big] = \mathbb{E}\big[(Q_n - \mathbb{E}[Q_n])^2\big] = \operatorname{Var}(Q_n) \le 2\,\|\Delta\|\, t \to 0.

So Q_n \to t in L^2: the random sum tightens onto the deterministic number t as the mesh shrinks. That limit is the quadratic variation,

[W]_t = \lim_{\|\Delta\| \to 0} \sum_{i=0}^{n-1} (\Delta W_i)^2 = t.

Let (W_t) be a standard Brownian motion and partition [0, t] by 0 = t_0 < \cdots < t_n = t. As the mesh \max_i (t_{i+1} - t_i) \to 0, \sum_{i=0}^{n-1} \big(W_{t_{i+1}} - W_{t_i}\big)^2 \;\xrightarrow{\;L^2\;}\; t, so the quadratic variation is [W]_t = t — a deterministic number, not random and not zero. In differential shorthand this is written (dW)^2 = dt.

Watch the sum converge

Below is one fixed Brownian path on [0, 1]. The slider sets the number of partition points n; the running total \sum (\Delta W)^2 is read off that same path and printed live. As you refine the partition, the sum climbs toward the dashed target line at [W]_1 = 1 and settles there.

The contrast with smooth functions, and "(dW)² = dt"

Why is this so special? Take a smooth (differentiable) function g. Over a sub-interval of length \Delta t its increment is about g'(\tau)\,\Delta t, so the squared increment is of order (\Delta t)^2. Summing n \approx t/\Delta t of them gives order t \cdot \Delta t \to 0: a smooth function has quadratic variation zero. Brownian increments are larger — of order \sqrt{\Delta t}, so their squares are of order \Delta t and the sum survives.

The shorthand that bookkeeps all of this is

(dW)^2 = dt.

A Brownian increment squared behaves like dt, not like the negligible (dt)^2 of smooth calculus. This one extra term — kept instead of discarded — is the seed of the correction in Itô's lemma, the chain rule of stochastic calculus.

The hand-wavy "order of magnitude" argument above can be made completely rigorous, and it is worth seeing because it is the exact mirror image of the Brownian computation. Let f be continuously differentiable (C^1) on [0, t], and form the quadratic sum over a partition,

Q_n^f = \sum_{i=0}^{n-1} \big(\Delta f_i\big)^2, \qquad \Delta f_i = f(t_{i+1}) - f(t_i).

Pull one factor of |\Delta f_i| out of each square and bound it by the largest increment over the partition:

Q_n^f = \sum_{i} |\Delta f_i|\cdot |\Delta f_i| \;\le\; \Big(\max_i |\Delta f_i|\Big) \sum_{i} |\Delta f_i|.

Q_n^f \;\le\; \underbrace{\Big(\max_i |\Delta f_i|\Big)}_{\to\, 0} \cdot \underbrace{\sum_i |\Delta f_i|}_{\text{finite (total variation)}} \;\longrightarrow\; 0 \cdot (\text{finite}) = 0.

A smooth function has quadratic variation 0. The contrast is exact: for the smooth path it was the first-power sum that stayed finite and dragged the second-power sum down to zero; for the Brownian path the first-power sum is infinite, and the second-power sum settles on t.

Step 2 of the derivation used the fact that a squared centred Gaussian X \sim N(0, \sigma^2) has variance 2\sigma^4. Here it is, line by line, from the fourth moment of a standard normal.

Write X = \sigma Z with Z \sim N(0, 1). By definition,

\operatorname{Var}(X^2) = \mathbb{E}\big[(X^2)^2\big] - \big(\mathbb{E}[X^2]\big)^2 = \mathbb{E}[X^4] - \big(\mathbb{E}[X^2]\big)^2.

Pull out the powers of \sigma, since X^k = \sigma^k Z^k:

\mathbb{E}[X^2] = \sigma^2\, \mathbb{E}[Z^2], \qquad \mathbb{E}[X^4] = \sigma^4\, \mathbb{E}[Z^4].

For a standard normal the second moment is \mathbb{E}[Z^2] = 1 (its variance), and the fourth moment is

\mathbb{E}[Z^4] = 3.

(This is the standard Gaussian fourth moment — for example via the moment formula \mathbb{E}[Z^{2k}] = (2k-1)!! = 1\cdot 3 \cdots (2k-1), which gives 3!! = 1 \cdot 3 = 3; it also drops out of one integration by parts, \mathbb{E}[Z^4] = 3\,\mathbb{E}[Z^2] = 3.) Substituting both moments,

\operatorname{Var}(X^2) = \sigma^4 \cdot 3 - \big(\sigma^2 \cdot 1\big)^2 = 3\sigma^4 - \sigma^4 = 2\sigma^4.

The leftover 3 - 1 = 2 is exactly the constant that appears as \operatorname{Var}((\Delta W_i)^2) = 2(\Delta t_i)^2 in the derivation above.

A Brownian path has infinite total variation (the sum of the absolute increments \sum|\Delta W| blows up) yet finite quadratic variation (\sum(\Delta W)^2 \to t). It is exactly this gap — too rough for the first power, perfectly tame at the second — that ordinary calculus has no machinery for, and that the Itô integral is built to handle.