The Hamiltonian and Costate

We have a system whose state x(t) obeys the dynamics \dot{x} = f(x, u, t), steered by a control u(t), and we want the control that minimises a cost functional

J = \varphi\big(x(T)\big) + \int_0^T L(x, u, t)\,dt.

The first piece \varphi(x(T)) is the terminal cost — how good the final state is — and the integral accumulates a running cost L along the way. The catch is that we cannot pick x(t) and u(t) freely: they are tied together at every instant by the dynamics. This page builds the two objects — the costate and the Hamiltonian — that let us handle that constraint. The full stationarity conditions follow on the next page.

The dynamics is a constraint at every instant

Recall how Lagrange multipliers handle a single constraint g(x) = 0 while minimising F(x): introduce a multiplier \lambda and make the combination F + \lambda g stationary. The multiplier is the exact "exchange rate" that prices the constraint into the objective.

Here the constraint is the differential equation, rewritten with everything on one side:

f(x, u, t) - \dot{x} = 0 \qquad \text{for every } t \in [0, T].

That is not one constraint but a continuum of them — one for each instant t. So instead of a single number \lambda we need a whole time-varying multiplier \lambda(t), a vector with one component per state equation. This function is the costate (also called the adjoint variable):

\lambda(t) \in \mathbb{R}^n, \qquad \text{a Lagrange multiplier for the dynamics at time } t.

Just as a Lagrange multiplier measures how hard the constraint pushes back, the costate \lambda(t) measures the marginal cost of the state — how much the optimal total cost would change if the state at time t were nudged. It is the shadow price of being where you are.

The Hamiltonian

Adjoining the constraint instant-by-instant, the cost-weighted dynamics appears in the integrand as \lambda^\top(f - \dot{x}). The two pieces that do not involve \dot{x} — the running cost L and the steered dynamics \lambda^\top f — bundle into a single function, the Hamiltonian:

H(x, u, \lambda, t) \;=\; L(x, u, t) \;+\; \lambda^\top f(x, u, t).

Read it as two terms with a shared unit of "cost rate":

L(x, u, t) — the running cost you pay right now for being in state x and applying control u.
\lambda^\top f(x, u, t) — the costate-weighted dynamics: the velocity f that u produces, each component priced by its shadow price \lambda. It charges the control for where it is pushing the state.

A control is good when it keeps this total small: a low running cost and a velocity that the costate values. That single trade-off, packed into H, is the whole engine of the maximum principle.

The costate \lambda(t) \in \mathbb{R}^n is a time-varying Lagrange multiplier for the dynamic constraint \dot{x} = f(x, u, t).
The Hamiltonian collects the running cost and the costate-weighted dynamics, H = L + \lambda^\top f.
Its value \lambda^\top \delta x is the marginal cost of the state: \lambda(t) prices a small change in x(t) into the total objective.

Adjoining the constraint: the augmented cost

Step 1 — price each constraint into the cost. Because f - \dot{x} = 0 holds at every instant, we may add \lambda^\top(f - \dot{x}) to the integrand without changing the value of J on any trajectory that actually obeys the dynamics. Call the result the augmented cost \bar{J}:

\bar{J} = \varphi\big(x(T)\big) + \int_0^T \Big[\, L + \lambda^\top\big(f - \dot{x}\big) \,\Big]\,dt.

Step 2 — recognise the Hamiltonian. The terms L + \lambda^\top f are exactly H, leaving only the -\lambda^\top \dot{x} piece outside it:

\bar{J} = \varphi\big(x(T)\big) + \int_0^T \Big[\, H(x, u, \lambda, t) - \lambda^\top \dot{x} \,\Big]\,dt.

This compact form is the launch pad. On any admissible trajectory \bar{J} = J for every choice of \lambda(t), so we are free to pick the costate later to make the algebra collapse. Notice the only place the state-velocity \dot{x} now appears is the lone term -\lambda^\top \dot{x} — and that is precisely the term we will integrate by parts on the next page to expose the costate equation. For now the structure is set: minimising J subject to the dynamics is the same as making the unconstrained \bar{J} stationary in x, u and \lambda together.

The minimising control u*

Freeze the state and the costate and look at H as a function of the single variable u. Take the textbook quadratic example L = \tfrac12 u^2 with dynamics f = x + u, so

H(u) = \tfrac12 u^2 + \lambda(x + u) = \tfrac12 u^2 + \lambda u + \lambda x.

The constant \lambda x only shifts the curve vertically; its shape is a parabola in u. Slide u to watch H rise and fall, and slide the costate \lambda to tilt the parabola. The minimiser sits where the slope \partial H/\partial u = u + \lambda is zero — that is u^\* = -\lambda. Choosing the control that minimises the Hamiltonian at each instant is the central idea the next page promotes to a theorem.

The name is borrowed from classical mechanics, where the Hamiltonian H = T + V is the total energy and Hamilton's canonical equations \dot{q} = \partial H/\partial p, \dot{p} = -\partial H/\partial q govern motion. Pontryagin's control Hamiltonian is the direct analogue: the costate \lambda plays the role of the conjugate momentum p, and — as the next page shows — the state and costate obey the very same pair of canonical equations, \dot{x} = \partial H/\partial \lambda and \dot{\lambda} = -\partial H/\partial x. Control theory inherited a two-century-old machine.