Maximum Principle vs Dynamic Programming

We now hold the two great methods of optimal control: Pontryagin's maximum principle and the Hamilton–Jacobi–Bellman equation. They look utterly different — ODEs along one path versus a PDE over all states — yet they describe the same optimum. This capstone shows they are two faces of one truth, joined by a single bridge: the costate is the gradient of the value function.

Two methods, side by side

What they assert. PMP gives necessary conditions; HJB gives a sufficient one. A trajectory can satisfy PMP and not be optimal, but a smooth solution of HJB is globally optimal.
Scope. PMP describes one optimal trajectory; HJB describes every starting state at once through the value function V(x, t).
Form. PMP is a system of ODEs (state and costate, a two-point boundary value problem); HJB is a single PDE for V.
What you get. PMP yields an open-loop control u^\*(t) along the path; HJB yields a closed-loop feedback law u^\*(x, t) for any state.
Central object. PMP carries the costate \lambda(t); HJB carries the value function V(x, t) — and \lambda = \nabla_x V ties them together.

The bridge: \lambda(t) = \nabla_x V(x^\*(t), t)

The claim is that, evaluated along the optimal trajectory, the maximum principle's costate is exactly the gradient of the HJB value function. Let us define it that way and watch the costate equation fall out of HJB.

Step 1 — define the costate as the value gradient. Along the optimal path x^\*(t), set

\lambda(t) := \nabla_x V\big(x^\*(t), t\big).

Step 2 — differentiate it in time. By the chain rule, with \dot{x}^\* = f,

\dot{\lambda} = \nabla_x V_t + \big(\nabla_x^2 V\big)\,\dot{x}^\* = \nabla_x V_t + \big(\nabla_x^2 V\big) f,

where \nabla_x^2 V is the Hessian of the value function.

Step 3 — differentiate HJB in x. Take -V_t = H(x, u^\*, \nabla_x V) and apply \nabla_x. Because u^\* is the minimiser of H, the term through \partial H/\partial u vanishes (it is zero at the minimum — the envelope theorem), leaving the explicit x-dependence and the dependence through \lambda = \nabla_x V:

-\nabla_x V_t = \frac{\partial H}{\partial x} + \big(\nabla_x^2 V\big)\frac{\partial H}{\partial \lambda}.

Step 4 — use \partial H/\partial \lambda = f. The state equation says \partial H/\partial\lambda = f, so

-\nabla_x V_t = \frac{\partial H}{\partial x} + \big(\nabla_x^2 V\big) f.

Step 5 — combine. Substitute \nabla_x V_t = -\partial H/\partial x - (\nabla_x^2 V) f into Step 2; the Hessian terms cancel:

\dot{\lambda} = \Big(-\frac{\partial H}{\partial x} - (\nabla_x^2 V) f\Big) + (\nabla_x^2 V) f = -\frac{\partial H}{\partial x}.

This is precisely the costate equation of the maximum principle. Pontryagin's adjoint dynamics are HJB differentiated along the optimal path — the costate \lambda(t) is the gradient \nabla_x V riding along the trajectory, and its "shadow price" meaning is now literal: it is how the optimal cost changes as the state is nudged.

Which to use: the trade-off

Because they compute different things, the choice is practical, not philosophical.

Maximum principle integrates ODEs along a single trajectory, so its cost grows mildly with the state dimension — it scales to high dimensions. The catch: it returns only an open-loop plan for one set of boundary conditions; perturb the start and you re-solve.
Dynamic programming / HJB returns a global feedback law, robust to disturbances and valid from any state. The catch: it must represent V(x, t) over the whole state space, whose size grows exponentially with dimension — the curse of dimensionality — so the PDE is intractable beyond a few dimensions.
In practice they are combined: HJB ideas give feedback and verify optimality in low dimensions; PMP-style trajectory optimisation scales up, often warm-started or corrected by value-function approximations.

The costate riding the value landscape

Take the worked HJB example again: \dot{x} = u, \int(x^2 + u^2)\,dt, with V(x) = x^2 and feedback u^\* = -x. The optimal trajectory is x^\*(t) = x_0 e^{-t}, gliding to the origin. Slide time t to send the marker down the value landscape V(x) = x^2; the tangent drawn at it has slope V'(x) = 2x, and that slope is the costate \lambda(t) = 2x^\*(t). One Pontryagin trajectory, read straight off the gradient of the dynamic-programming value function.

Think of V(x, t) as a landscape over all states and times. HJB surveys the whole landscape and reads the steepest-descent feedback at every point. The maximum principle instead follows the single streamline that the optimal start launches — and along that streamline the costate it carries is nothing other than the local gradient of the very same landscape. Necessary and sufficient, open-loop and closed-loop, ODE and PDE: two cross-sections of one optimum.