We have a controlled stochastic system,
dynamics
dx = f\,dt + \sigma\,dW and an expected cost
J = \mathbb{E}[\phi + \int L\,dt], and a promise that the
value-function method survives the move to randomness. This page makes good on it. We rerun the
derivation of the
Hamilton–Jacobi–Bellman
equation exactly as before — with one surgical change: where the deterministic version
expanded V(x + dx, t + dt) with the ordinary chain rule, we expand it with
Itô's
lemma. That single substitution adds one term, and that term is the whole of stochastic
optimal control.
As always, define the value function as the best expected cost-to-go from state
x at time t:
V(x, t) = \min_{u(\cdot)} \mathbb{E}\!\left[\, \phi\big(x(T)\big) + \int_t^T L\big(x(s), u(s)\big)\,ds \;\Big|\; x(t) = x \,\right].
Deriving the stochastic HJB
Step 1 — the principle of optimality over a tiny step. Split the horizon into a
short slice [t, t + dt] and the rest. Bellman's principle says the
optimal cost-to-go is the best running cost over the slice plus the optimal cost-to-go from where we
land — and because the landing point is now random, that continuation is an
expectation:
V(x, t) = \min_{u}\, \mathbb{E}\!\left[\, L(x, u)\,dt + V\big(x + dx,\; t + dt\big) \,\right].
Step 2 — expand V(x + dx, t + dt) with Itô's lemma. This
is the one and only departure from the deterministic derivation. For
dx = f\,dt + \sigma\,dW, Itô's lemma — the Taylor expansion that keeps the
second-order term because (dW)^2 = dt — gives the differential of
V as
dV = \Big(\,V_t + V_x^{\mathsf{T}} f + \tfrac12\,\mathrm{tr}\!\big(\sigma\sigma^{\mathsf{T}} V_{xx}\big)\,\Big)\,dt + V_x^{\mathsf{T}}\,\sigma\,dW.
The first three terms are the drift of V; the last is its martingale
(Brownian) part. The novelty is the middle term
\tfrac12\,\mathrm{tr}(\sigma\sigma^{\mathsf{T}} V_{xx}) — the
multivariate Itô correction, built from the Hessian V_{xx}
of the value function (see
multidimensional
Itô). Ordinary calculus would have stopped at V_t + V_x^{\mathsf{T}} f.
Step 3 — take the expectation; the dW term dies. The Itô
integral has zero mean — a fair bet against a fair game — so the Brownian part contributes nothing in
expectation:
\mathbb{E}\big[\,V_x^{\mathsf{T}}\sigma\,dW\,\big] = V_x^{\mathsf{T}}\sigma\,\mathbb{E}[dW] = 0.
What survives is the drift, so
\mathbb{E}[V(x+dx, t+dt)] = V(x,t) + \big(V_t + V_x^{\mathsf{T}} f + \tfrac12\mathrm{tr}(\sigma\sigma^{\mathsf{T}}V_{xx})\big)dt.
Step 4 — substitute and cancel V(x,t). Put this back into
the Bellman relation of Step 1:
V(x,t) = \min_u\Big[\, L\,dt + V(x,t) + \big(V_t + V_x^{\mathsf{T}} f + \tfrac12\mathrm{tr}(\sigma\sigma^{\mathsf{T}}V_{xx})\big)\,dt \,\Big].
The V(x,t) on the left cancels the one on the right (it carries no
u, so it leaves the minimisation untouched). The term
V_t also has no u and pulls outside the
\min.
Step 5 — divide by dt and collect. Dropping the common
dt and moving V_t to the left gives the
stochastic Hamilton–Jacobi–Bellman equation:
\boxed{\;-V_t = \min_u\Big[\, L(x,u) + V_x^{\mathsf{T}} f(x,u) + \tfrac12\,\mathrm{tr}\!\big(\sigma\sigma^{\mathsf{T}} V_{xx}\big) \,\Big],\qquad V(x,T) = \phi(x).\;}
The terminal condition is unchanged — at the final instant the cost-to-go is the terminal penalty.
One term, and what it means
Set it beside the deterministic HJB:
\text{(deterministic)}\quad -V_t = \min_u\big[\, L + V_x^{\mathsf{T}} f \,\big],
\text{(stochastic)}\quad\;\; -V_t = \min_u\big[\, L + V_x^{\mathsf{T}} f + \underbrace{\tfrac12\,\mathrm{tr}(\sigma\sigma^{\mathsf{T}} V_{xx})}_{\text{noise term}} \,\big].
The only difference is the extra
\tfrac12\,\mathrm{tr}(\sigma\sigma^{\mathsf{T}} V_{xx}). Two readings:
-
Noise couples to curvature. The new term pairs the diffusion
\sigma\sigma^{\mathsf{T}} with the Hessian
V_{xx} — the curvature of the value function. Where the cost-to-go is
sharply bowed, randomness matters most; where V is flat
(V_{xx} = 0) the noise washes out and the equation looks deterministic.
-
It vanishes when the noise does. Set \sigma = 0 and the
term is identically zero — the stochastic HJB collapses exactly onto the deterministic one. The
two theories agree in the noiseless limit, as they must.
-
The value function of the stochastic control problem solves
-V_t = \min_u\big[L + V_x^{\mathsf{T}} f + \tfrac12\mathrm{tr}(\sigma\sigma^{\mathsf{T}} V_{xx})\big],
with V(x,T) = \phi(x).
-
It is the deterministic HJB plus one term,
\tfrac12\mathrm{tr}(\sigma\sigma^{\mathsf{T}} V_{xx}), born from the
Itô correction (dW)^2 = dt.
-
That term couples the noise \sigma\sigma^{\mathsf{T}} to the Hessian
V_{xx}, and disappears when \sigma = 0 — the
dW contribution dropped out because \mathbb{E}[dW] = 0.
Adding the \tfrac12\mathrm{tr}(\sigma\sigma^{\mathsf{T}}V_{xx}) term
turns HJB from a first-order PDE into a second-order one — its highest derivative
is now the Hessian. That is no accident: second-order parabolic PDEs are exactly the equations
satisfied by expectations of diffusions, the content of the Feynman–Kac formula. The same
\tfrac12\sigma^2 \partial_{xx} operator that appears here is the
generator that turns the Black–Scholes hedging argument into a diffusion equation. Stochastic
control, option pricing, and the heat equation are, under the hood, the same mathematics — Itô's
second-order term wearing different clothes.
Watching the noise lift the value
Take the scalar LQ value function V_{\text{det}}(x) = \tfrac12 x^2, so the
curvature is constant, V_{xx} = 1. Over a horizon
T = 1 the stochastic HJB's noise term adds a fixed amount of expected
cost, lifting the parabola to
V_{\text{stoch}}(x) = \tfrac12 x^2 + \tfrac12\,\sigma^2\,V_{xx}\,T = \tfrac12 x^2 + \tfrac12\,\sigma^2.
Slide \sigma: the deterministic curve stays put while the stochastic curve
rides upward by exactly \tfrac12\sigma^2 — the irreducible cost the noise
injects through the value function's curvature. At \sigma = 0 the two
curves coincide, as the stochastic HJB collapses onto the deterministic one.