The Stochastic HJB Equation

We have a controlled stochastic system, dynamics dx = f\,dt + \sigma\,dW and an expected cost J = \mathbb{E}[\phi + \int L\,dt], and a promise that the value-function method survives the move to randomness. This page makes good on it. We rerun the derivation of the Hamilton–Jacobi–Bellman equation exactly as before — with one surgical change: where the deterministic version expanded V(x + dx, t + dt) with the ordinary chain rule, we expand it with Itô's lemma. That single substitution adds one term, and that term is the whole of stochastic optimal control.

As always, define the value function as the best expected cost-to-go from state x at time t:

V(x, t) = \min_{u(\cdot)} \mathbb{E}\!\left[\, \phi\big(x(T)\big) + \int_t^T L\big(x(s), u(s)\big)\,ds \;\Big|\; x(t) = x \,\right].

Deriving the stochastic HJB

Step 1 — the principle of optimality over a tiny step. Split the horizon into a short slice [t, t + dt] and the rest. Bellman's principle says the optimal cost-to-go is the best running cost over the slice plus the optimal cost-to-go from where we land — and because the landing point is now random, that continuation is an expectation:

V(x, t) = \min_{u}\, \mathbb{E}\!\left[\, L(x, u)\,dt + V\big(x + dx,\; t + dt\big) \,\right].

Step 2 — expand V(x + dx, t + dt) with Itô's lemma. This is the one and only departure from the deterministic derivation. For dx = f\,dt + \sigma\,dW, Itô's lemma — the Taylor expansion that keeps the second-order term because (dW)^2 = dt — gives the differential of V as

dV = \Big(\,V_t + V_x^{\mathsf{T}} f + \tfrac12\,\mathrm{tr}\!\big(\sigma\sigma^{\mathsf{T}} V_{xx}\big)\,\Big)\,dt + V_x^{\mathsf{T}}\,\sigma\,dW.

The first three terms are the drift of V; the last is its martingale (Brownian) part. The novelty is the middle term \tfrac12\,\mathrm{tr}(\sigma\sigma^{\mathsf{T}} V_{xx}) — the multivariate Itô correction, built from the Hessian V_{xx} of the value function (see multidimensional Itô). Ordinary calculus would have stopped at V_t + V_x^{\mathsf{T}} f.

Step 3 — take the expectation; the dW term dies. The Itô integral has zero mean — a fair bet against a fair game — so the Brownian part contributes nothing in expectation:

\mathbb{E}\big[\,V_x^{\mathsf{T}}\sigma\,dW\,\big] = V_x^{\mathsf{T}}\sigma\,\mathbb{E}[dW] = 0.

What survives is the drift, so \mathbb{E}[V(x+dx, t+dt)] = V(x,t) + \big(V_t + V_x^{\mathsf{T}} f + \tfrac12\mathrm{tr}(\sigma\sigma^{\mathsf{T}}V_{xx})\big)dt.

Step 4 — substitute and cancel V(x,t). Put this back into the Bellman relation of Step 1:

V(x,t) = \min_u\Big[\, L\,dt + V(x,t) + \big(V_t + V_x^{\mathsf{T}} f + \tfrac12\mathrm{tr}(\sigma\sigma^{\mathsf{T}}V_{xx})\big)\,dt \,\Big].

The V(x,t) on the left cancels the one on the right (it carries no u, so it leaves the minimisation untouched). The term V_t also has no u and pulls outside the \min.

Step 5 — divide by dt and collect. Dropping the common dt and moving V_t to the left gives the stochastic Hamilton–Jacobi–Bellman equation:

\boxed{\;-V_t = \min_u\Big[\, L(x,u) + V_x^{\mathsf{T}} f(x,u) + \tfrac12\,\mathrm{tr}\!\big(\sigma\sigma^{\mathsf{T}} V_{xx}\big) \,\Big],\qquad V(x,T) = \phi(x).\;}

The terminal condition is unchanged — at the final instant the cost-to-go is the terminal penalty.

One term, and what it means

Set it beside the deterministic HJB:

\text{(deterministic)}\quad -V_t = \min_u\big[\, L + V_x^{\mathsf{T}} f \,\big], \text{(stochastic)}\quad\;\; -V_t = \min_u\big[\, L + V_x^{\mathsf{T}} f + \underbrace{\tfrac12\,\mathrm{tr}(\sigma\sigma^{\mathsf{T}} V_{xx})}_{\text{noise term}} \,\big].

The only difference is the extra \tfrac12\,\mathrm{tr}(\sigma\sigma^{\mathsf{T}} V_{xx}). Two readings:

Adding the \tfrac12\mathrm{tr}(\sigma\sigma^{\mathsf{T}}V_{xx}) term turns HJB from a first-order PDE into a second-order one — its highest derivative is now the Hessian. That is no accident: second-order parabolic PDEs are exactly the equations satisfied by expectations of diffusions, the content of the Feynman–Kac formula. The same \tfrac12\sigma^2 \partial_{xx} operator that appears here is the generator that turns the Black–Scholes hedging argument into a diffusion equation. Stochastic control, option pricing, and the heat equation are, under the hood, the same mathematics — Itô's second-order term wearing different clothes.

Watching the noise lift the value

Take the scalar LQ value function V_{\text{det}}(x) = \tfrac12 x^2, so the curvature is constant, V_{xx} = 1. Over a horizon T = 1 the stochastic HJB's noise term adds a fixed amount of expected cost, lifting the parabola to

V_{\text{stoch}}(x) = \tfrac12 x^2 + \tfrac12\,\sigma^2\,V_{xx}\,T = \tfrac12 x^2 + \tfrac12\,\sigma^2.

Slide \sigma: the deterministic curve stays put while the stochastic curve rides upward by exactly \tfrac12\sigma^2 — the irreducible cost the noise injects through the value function's curvature. At \sigma = 0 the two curves coincide, as the stochastic HJB collapses onto the deterministic one.