Numerical Optimal Control

Every closed form we have prized — the LQR gain, a bang-bang switching curve, a tidy two-point boundary problem — lives in a small, friendly corner of the subject. Step outside it, into nonlinear dynamics, awkward constraints, and high dimensions, and the pencil runs out. There the only way forward is to compute. Numerical optimal control turns the maximum principle and the Hamilton–Jacobi–Bellman equation into algorithms a machine can run, and it splits into two great families, distinguished by when you discretise.

Indirect methods: optimise, then discretise

Indirect methods take the maximum principle at its word. First you derive the exact optimality conditions by hand — the Hamiltonian, the costate equation, the stationarity condition — which assemble into a two-point boundary-value problem (TPBVP): the state x is pinned at the start, the costate \lambda is pinned at the end, and the coupled dynamics must thread between them:

\dot{x} = \frac{\partial H}{\partial \lambda}, \qquad \dot{\lambda} = -\frac{\partial H}{\partial x}, \qquad x(0) = x_0,\ \ \lambda(T) = \frac{\partial \varphi}{\partial x}\big(x(T)\big).

Only then do you reach for a computer, to solve that boundary problem numerically. The classic tool is shooting:

Because you optimise on paper before discretising, the slogan is “optimise then discretise.” The reward is high accuracy from the exact conditions; the catch is that the costate is invisible and unintuitive, the Hamiltonian must be derived afresh for every problem, and the integration is exquisitely sensitive to the initial guess — a small error in \lambda(0) can blow up by time T.

Direct methods: discretise, then optimise

Direct methods reverse the order. They never derive a costate at all. Instead they chop the trajectory into a finite set of nodes, replace the continuous problem with a large but ordinary nonlinear program (NLP) in those node variables, and hand the whole thing to an off-the-shelf optimiser — hence “discretise then optimise.” Three styles dominate:

In every case the continuous problem becomes

\min_{z}\ J(z) \quad \text{s.t.}\quad c_{\text{dyn}}(z) = 0,\ \ c_{\text{path}}(z) \le 0,

where the vector z stacks the node states and controls, the equality constraints c_{\text{dyn}} impose the dynamics, and the inequalities carry the path and input limits. A finer mesh — more nodes — converges to the true optimum at the cost of a bigger NLP.

Why direct methods took over

There is a third route we have already met: grid the state space and solve HJB / dynamic programming directly. It returns the global optimal feedback over the whole space — but it falls to the curse of dimensionality. A grid of m points per axis in n dimensions has m^n cells, so the work explodes exponentially with the number of states. A handful of dimensions is fine; a robot with dozens is hopeless.

Direct NLP methods scale far more gently: their cost grows with the number of nodes along one trajectory, not with the volume of the whole state space, and modern sparse NLP solvers exploit the banded structure of the constraints. That is why, for high-dimensional problems — legged robots, spacecraft, full vehicle models — direct collocation and multiple shooting are the workhorses, while grid-based HJB is reserved for low-dimensional problems where a global feedback law is worth the price.

Refining the mesh

Below is the heart of a direct method. The smooth curve is the true optimal trajectory x^{\*}(t); the kinked curve is its direct transcription on N nodes, with the dynamics enforced as equality constraints between adjacent nodes. Slide number of nodes N: a coarse mesh is a crude polygon that misses the curve, but as the mesh refines the approximation closes on x^{\*} and the worst-case gap shrinks toward zero — the discretised NLP converging to the continuous optimum, at the price of more variables.

Numerical optimal control is no longer an academic afterthought — it is shipped software. Direct collocation packages plan the trajectories of Mars landers and re-entry vehicles, and the same transcription-into-an-NLP idea, solved fast enough to run in a loop, is what lets a humanoid robot plan each footstep in real time. The pattern that began with Pontryagin's hand-derived conditions and Bellman's grids has become, in its computational form, a general-purpose engine: write down a model, a cost and the constraints, discretise, and let a sparse NLP solver find the optimal motion.