Maximum Principle vs Dynamic Programming

We now hold the two great methods of optimal control: Pontryagin's maximum principle and the Hamilton–Jacobi–Bellman equation. They look utterly different — ODEs along one path versus a PDE over all states — yet they describe the same optimum. This capstone shows they are two faces of one truth, joined by a single bridge: the costate is the gradient of the value function.

Two methods, side by side

The bridge: \lambda(t) = \nabla_x V(x^\*(t), t)

The claim is that, evaluated along the optimal trajectory, the maximum principle's costate is exactly the gradient of the HJB value function. Let us define it that way and watch the costate equation fall out of HJB.

Step 1 — define the costate as the value gradient. Along the optimal path x^\*(t), set

\lambda(t) := \nabla_x V\big(x^\*(t), t\big).

Step 2 — differentiate it in time. By the chain rule, with \dot{x}^\* = f,

\dot{\lambda} = \nabla_x V_t + \big(\nabla_x^2 V\big)\,\dot{x}^\* = \nabla_x V_t + \big(\nabla_x^2 V\big) f,

where \nabla_x^2 V is the Hessian of the value function.

Step 3 — differentiate HJB in x. Take -V_t = H(x, u^\*, \nabla_x V) and apply \nabla_x. Because u^\* is the minimiser of H, the term through \partial H/\partial u vanishes (it is zero at the minimum — the envelope theorem), leaving the explicit x-dependence and the dependence through \lambda = \nabla_x V:

-\nabla_x V_t = \frac{\partial H}{\partial x} + \big(\nabla_x^2 V\big)\frac{\partial H}{\partial \lambda}.

Step 4 — use \partial H/\partial \lambda = f. The state equation says \partial H/\partial\lambda = f, so

-\nabla_x V_t = \frac{\partial H}{\partial x} + \big(\nabla_x^2 V\big) f.

Step 5 — combine. Substitute \nabla_x V_t = -\partial H/\partial x - (\nabla_x^2 V) f into Step 2; the Hessian terms cancel:

\dot{\lambda} = \Big(-\frac{\partial H}{\partial x} - (\nabla_x^2 V) f\Big) + (\nabla_x^2 V) f = -\frac{\partial H}{\partial x}.

This is precisely the costate equation of the maximum principle. Pontryagin's adjoint dynamics are HJB differentiated along the optimal path — the costate \lambda(t) is the gradient \nabla_x V riding along the trajectory, and its "shadow price" meaning is now literal: it is how the optimal cost changes as the state is nudged.

Which to use: the trade-off

Because they compute different things, the choice is practical, not philosophical.

The costate riding the value landscape

Take the worked HJB example again: \dot{x} = u, \int(x^2 + u^2)\,dt, with V(x) = x^2 and feedback u^\* = -x. The optimal trajectory is x^\*(t) = x_0 e^{-t}, gliding to the origin. Slide time t to send the marker down the value landscape V(x) = x^2; the tangent drawn at it has slope V'(x) = 2x, and that slope is the costate \lambda(t) = 2x^\*(t). One Pontryagin trajectory, read straight off the gradient of the dynamic-programming value function.

Think of V(x, t) as a landscape over all states and times. HJB surveys the whole landscape and reads the steepest-descent feedback at every point. The maximum principle instead follows the single streamline that the optimal start launches — and along that streamline the costate it carries is nothing other than the local gradient of the very same landscape. Necessary and sufficient, open-loop and closed-loop, ODE and PDE: two cross-sections of one optimum.