Numerical Optimal Control

Every closed form we have prized — the LQR gain, a bang-bang switching curve, a tidy two-point boundary problem — lives in a small, friendly corner of the subject. Step outside it, into nonlinear dynamics, awkward constraints, and high dimensions, and the pencil runs out. There the only way forward is to compute. Numerical optimal control turns the maximum principle and the Hamilton–Jacobi–Bellman equation into algorithms a machine can run, and it splits into two great families, distinguished by when you discretise.

Indirect methods: optimise, then discretise

Indirect methods take the maximum principle at its word. First you derive the exact optimality conditions by hand — the Hamiltonian, the costate equation, the stationarity condition — which assemble into a two-point boundary-value problem (TPBVP): the state x is pinned at the start, the costate \lambda is pinned at the end, and the coupled dynamics must thread between them:

\dot{x} = \frac{\partial H}{\partial \lambda}, \qquad \dot{\lambda} = -\frac{\partial H}{\partial x}, \qquad x(0) = x_0,\ \ \lambda(T) = \frac{\partial \varphi}{\partial x}\big(x(T)\big).

Only then do you reach for a computer, to solve that boundary problem numerically. The classic tool is shooting:

Guess the missing initial costate \lambda(0).
Integrate the coupled (x, \lambda) dynamics forward to time T.
Check the terminal condition on \lambda(T); it will be missed.
Adjust the guess (a Newton step on the miss) and repeat until the boundary is hit.

Because you optimise on paper before discretising, the slogan is “optimise then discretise.” The reward is high accuracy from the exact conditions; the catch is that the costate is invisible and unintuitive, the Hamiltonian must be derived afresh for every problem, and the integration is exquisitely sensitive to the initial guess — a small error in \lambda(0) can blow up by time T.

Direct methods: discretise, then optimise

Direct methods reverse the order. They never derive a costate at all. Instead they chop the trajectory into a finite set of nodes, replace the continuous problem with a large but ordinary nonlinear program (NLP) in those node variables, and hand the whole thing to an off-the-shelf optimiser — hence “discretise then optimise.” Three styles dominate:

Single shooting. Make only the controls the decision variables; simulate the trajectory forward from them and minimise the resulting cost. Simple, but it inherits the same sensitivity as indirect shooting over a long horizon.
Multiple shooting. Cut the horizon into several segments, integrate each independently from its own starting state, and add continuity constraints forcing the segments to join. Breaking up the long integration tames the blow-up and makes the problem far better conditioned.
Collocation. Make the states and controls at every node decision variables, approximate the trajectory by polynomials between nodes, and enforce the dynamics as equality constraints at chosen collocation points — the defect x_{i+1} - x_i - \int f \,\approx\, 0 must vanish at each. The dynamics become constraints in the NLP rather than something you integrate.

In every case the continuous problem becomes

\min_{z}\ J(z) \quad \text{s.t.}\quad c_{\text{dyn}}(z) = 0,\ \ c_{\text{path}}(z) \le 0,

where the vector z stacks the node states and controls, the equality constraints c_{\text{dyn}} impose the dynamics, and the inequalities carry the path and input limits. A finer mesh — more nodes — converges to the true optimum at the cost of a bigger NLP.

Why direct methods took over

There is a third route we have already met: grid the state space and solve HJB / dynamic programming directly. It returns the global optimal feedback over the whole space — but it falls to the curse of dimensionality. A grid of m points per axis in n dimensions has m^n cells, so the work explodes exponentially with the number of states. A handful of dimensions is fine; a robot with dozens is hopeless.

Direct NLP methods scale far more gently: their cost grows with the number of nodes along one trajectory, not with the volume of the whole state space, and modern sparse NLP solvers exploit the banded structure of the constraints. That is why, for high-dimensional problems — legged robots, spacecraft, full vehicle models — direct collocation and multiple shooting are the workhorses, while grid-based HJB is reserved for low-dimensional problems where a global feedback law is worth the price.

Indirect (“optimise then discretise”): solve the maximum principle's two-point boundary-value problem numerically, e.g. by shooting — guess the initial costate, integrate, adjust to hit the boundary. Accurate but delicate.
Direct (“discretise then optimise”): turn the trajectory into a finite nonlinear program via single/multiple shooting or collocation (dynamics as equality constraints at collocation points). Robust and general.
Grid-based HJB/DP gives a global feedback law but suffers the curse of dimensionality (m^n cells), so direct methods dominate in high dimensions.

Refining the mesh

Below is the heart of a direct method. The smooth curve is the true optimal trajectory x^{\*}(t); the kinked curve is its direct transcription on N nodes, with the dynamics enforced as equality constraints between adjacent nodes. Slide number of nodes N: a coarse mesh is a crude polygon that misses the curve, but as the mesh refines the approximation closes on x^{\*} and the worst-case gap shrinks toward zero — the discretised NLP converging to the continuous optimum, at the price of more variables.

Numerical optimal control is no longer an academic afterthought — it is shipped software. Direct collocation packages plan the trajectories of Mars landers and re-entry vehicles, and the same transcription-into-an-NLP idea, solved fast enough to run in a loop, is what lets a humanoid robot plan each footstep in real time. The pattern that began with Pontryagin's hand-derived conditions and Bellman's grids has become, in its computational form, a general-purpose engine: write down a model, a cost and the constraints, discretise, and let a sparse NLP solver find the optimal motion.