Itô's lemma
in one dimension promotes the chain rule by one term: a smooth function of a single
Itô process picks up a \tfrac12 f'' correction because
(dW_t)^2 = dt refuses to vanish. Finance, however, is rarely
one-dimensional — a portfolio depends on several prices, an exchange-rate model couples two
diffusions, a stochastic-volatility model has both a price and its volatility wandering
at once. We need the lemma for a vector of processes.
Let X = (X^1, \dots, X^n) be a vector
Itô process,
each component driven by (possibly correlated) Brownian motions, and let
f(t, x_1, \dots, x_n) be smooth. The single new ingredient compared
to the scalar case is the family of cross-variations
d[X_i, X_j] — the
quadratic covariation
between two of the driving processes. They are exactly the second-order terms a naive chain
rule would discard, and exactly the terms that survive.
The 2-D case, derived line by line
Everything important is already visible with two processes, so let us do that case in full and
in slow motion. Take two Itô processes X_t and
Y_t and a smooth f(X, Y) (we suppress an
explicit t-dependence for now — it would just add an
f_t\,dt term that has no second-order partner). We want
df = f(X + dX,\, Y + dY) - f(X, Y).
Step 1 — Taylor expand to second order. Ordinary calculus would stop at first
order; the whole point of Itô calculus is that the second-order terms are
not negligible, because squared increments are of order dt,
not (dt)^2. So keep every term up to second order:
df = f_x\,dX + f_y\,dY + \tfrac12\Big( f_{xx}\,(dX)^2 + 2 f_{xy}\,dX\,dY + f_{yy}\,(dY)^2 \Big) + \cdots,
where the dots are genuinely negligible (third order and higher). The mixed partial appears
twice — as f_{xy} and f_{yx} — and since
f is smooth these are equal, giving the factor of
2.
Step 2 — substitute the box-algebra products. This is where stochastic calculus
departs from the deterministic Taylor series. The products of differentials are not all zero;
they are read off the covariation table. Each squared differential becomes a
quadratic variation, and each cross product a quadratic covariation:
(dX)^2 = d[X], \qquad (dY)^2 = d[Y], \qquad dX\,dY = d[X, Y].
(Any product involving a dt — such as dt\,dX
or (dt)^2 — is of order higher than dt and
is dropped, which is exactly why an f_t\,dt term has no second-order
partner.)
Step 3 — substitute (dX)^2 = d[X]:
\tfrac12 f_{xx}\,(dX)^2 = \tfrac12 f_{xx}\,d[X].
Step 4 — substitute the cross term dX\,dY = d[X, Y].
This is the new ingredient with no one-dimensional analogue:
\tfrac12 \cdot 2 f_{xy}\,dX\,dY = f_{xy}\,d[X, Y].
Step 5 — substitute (dY)^2 = d[Y]:
\tfrac12 f_{yy}\,(dY)^2 = \tfrac12 f_{yy}\,d[Y].
Step 6 — collect everything. Reassembling the first-order terms (which survive
untouched) with the three substituted second-order terms gives the two-dimensional Itô formula:
df = f_x\,dX + f_y\,dY + \tfrac12\Big( f_{xx}\,d[X] + 2 f_{xy}\,d[X, Y] + f_{yy}\,d[Y] \Big).
Compared with the scalar lemma, the only addition is the middle term
f_{xy}\,d[X, Y] — the channel through which the
coupling of the two processes feeds into the dynamics of f.
If X and Y were driven by independent
Brownian motions this term would vanish and the two processes would not "talk"; correlation is
precisely what keeps it alive.
Let X = (X^1, \dots, X^n) be a vector of Itô processes and
f(t, x_1, \dots, x_n) be C^{1,2}. Then
df = f_t\,dt + \sum_{i=1}^{n} f_{x_i}\,dX_i + \tfrac12 \sum_{i=1}^{n}\sum_{j=1}^{n} f_{x_i x_j}\,d[X_i, X_j].
The second-order terms are governed by the box-algebra rules:
- dX_i\,dX_j = d[X_i, X_j] — squared / cross differentials become quadratic (co)variations;
- dt\,dX_i = 0 and (dt)^2 = 0 — anything multiplied by dt is higher order;
- for independent Brownian drivers dW_i\,dW_j = 0 when i \ne j, and dW_i\,dW_i = dt.
Most multidimensional models do not use independent drivers — they use
correlated ones. Two Brownian motions with correlation
\rho obey the covariation rule
dW^1\,dW^2 = \rho\,dt,
which interpolates between independence (\rho = 0) and perfect
lockstep (\rho = \pm 1). Now write a general vector diffusion as
dX_i = a_i\,dt + \sum_{k} b_{ik}\,dW^k,
with independent drivers W^k. Multiplying two such differentials and
using dW^k\,dW^\ell = \delta_{k\ell}\,dt gives the
diffusion (covariance) matrix
d[X_i, X_j] = \Big(\sum_{k} b_{ik}\,b_{jk}\Big)\,dt = (b\,b^{\mathsf T})_{ij}\,dt, \qquad \Sigma = b\,b^{\mathsf T}.
So the entire second-order structure of an n-dimensional Itô
process is encoded in one symmetric, positive-semidefinite matrix
\Sigma = b\,b^{\mathsf T} — the instantaneous covariance of the
increments per unit time.
A worked cross term. Take f(X^1, X^2) = X^1 X^2, a
product of two processes. Then f_{x_1} = X^2,
f_{x_2} = X^1, f_{x_1 x_1} = f_{x_2 x_2} = 0,
and f_{x_1 x_2} = 1. The lemma gives the
Itô product rule:
d(X^1 X^2) = X^2\,dX^1 + X^1\,dX^2 + d[X^1, X^2].
The first two terms are the familiar Leibniz product rule; the extra
d[X^1, X^2] = \Sigma_{12}\,dt is the correction that ordinary
calculus lacks. If the two processes are driven by Brownian motions with correlation
\rho and volatilities \sigma_1, \sigma_2,
that term is \rho\,\sigma_1\sigma_2\,X^1 X^2\,dt.