Numerical Optimal Control
Every closed form we have prized — the
LQR gain,
a bang-bang switching curve, a tidy two-point boundary problem — lives in a small, friendly corner of
the subject. Step outside it, into nonlinear dynamics, awkward constraints, and high dimensions, and
the pencil runs out. There the only way forward is to compute. Numerical optimal
control turns the
maximum
principle and the
Hamilton–Jacobi–Bellman
equation into algorithms a machine can run, and it splits into two great families,
distinguished by when you discretise.
Indirect methods: optimise, then discretise
Indirect methods take the maximum principle at its word. First you derive the exact
optimality conditions by hand — the Hamiltonian, the costate equation, the stationarity condition —
which assemble into a two-point boundary-value problem (TPBVP): the state
x is pinned at the start, the costate \lambda is
pinned at the end, and the coupled dynamics must thread between them:
\dot{x} = \frac{\partial H}{\partial \lambda}, \qquad \dot{\lambda} = -\frac{\partial H}{\partial x}, \qquad x(0) = x_0,\ \ \lambda(T) = \frac{\partial \varphi}{\partial x}\big(x(T)\big).
Only then do you reach for a computer, to solve that boundary problem numerically. The
classic tool is shooting:
- Guess the missing initial costate \lambda(0).
- Integrate the coupled (x, \lambda) dynamics forward to
time T.
- Check the terminal condition on \lambda(T); it will be
missed.
- Adjust the guess (a Newton step on the miss) and repeat until the boundary is
hit.
Because you optimise on paper before discretising, the slogan is “optimise then
discretise.” The reward is high accuracy from the exact conditions; the catch is that
the costate is invisible and unintuitive, the Hamiltonian must be derived afresh for every problem,
and the integration is exquisitely sensitive to the initial guess — a small error in
\lambda(0) can blow up by time T.
Direct methods: discretise, then optimise
Direct methods reverse the order. They never derive a costate at all. Instead they
chop the trajectory into a finite set of nodes, replace the continuous problem with a large but
ordinary nonlinear program (NLP) in those node variables, and hand the whole thing to
an off-the-shelf optimiser — hence “discretise then optimise.” Three
styles dominate:
-
Single shooting. Make only the controls the decision variables; simulate the
trajectory forward from them and minimise the resulting cost. Simple, but it inherits the same
sensitivity as indirect shooting over a long horizon.
-
Multiple shooting. Cut the horizon into several segments, integrate each
independently from its own starting state, and add continuity constraints forcing
the segments to join. Breaking up the long integration tames the blow-up and makes the problem far
better conditioned.
-
Collocation. Make the states and controls at every node decision
variables, approximate the trajectory by polynomials between nodes, and enforce the dynamics as
equality constraints at chosen collocation points — the defect
x_{i+1} - x_i - \int f \,\approx\, 0 must vanish at each. The dynamics
become constraints in the NLP rather than something you integrate.
In every case the continuous problem becomes
\min_{z}\ J(z) \quad \text{s.t.}\quad c_{\text{dyn}}(z) = 0,\ \ c_{\text{path}}(z) \le 0,
where the vector z stacks the node states and controls, the equality
constraints c_{\text{dyn}} impose the dynamics, and the inequalities carry
the path and input limits. A finer mesh — more nodes — converges to the true optimum at the cost of a
bigger NLP.
Why direct methods took over
There is a third route we have already met: grid the state space and solve HJB / dynamic programming
directly. It returns the global optimal feedback over the whole space — but it falls to the
curse of dimensionality. A grid of m points per axis in
n dimensions has m^n cells, so the work explodes
exponentially with the number of states. A handful of dimensions is fine; a robot with dozens is
hopeless.
Direct NLP methods scale far more gently: their cost grows with the number of nodes along one
trajectory, not with the volume of the whole state space, and modern sparse NLP solvers exploit
the banded structure of the constraints. That is why, for high-dimensional problems — legged robots,
spacecraft, full vehicle models — direct collocation and multiple shooting are the workhorses, while
grid-based HJB is reserved for low-dimensional problems where a global feedback law is worth the price.
-
Indirect (“optimise then discretise”): solve the maximum principle's
two-point boundary-value problem numerically, e.g. by shooting — guess the
initial costate, integrate, adjust to hit the boundary. Accurate but delicate.
-
Direct (“discretise then optimise”): turn the trajectory into a
finite nonlinear program via single/multiple shooting or
collocation (dynamics as equality constraints at collocation points). Robust and
general.
-
Grid-based HJB/DP gives a global feedback law but suffers the curse of
dimensionality (m^n cells), so direct methods dominate in high
dimensions.
Refining the mesh
Below is the heart of a direct method. The smooth curve is the true optimal trajectory
x^{\*}(t); the kinked curve is its direct transcription
on N nodes, with the dynamics enforced as equality constraints between
adjacent nodes. Slide number of nodes N: a coarse mesh is
a crude polygon that misses the curve, but as the mesh refines the approximation closes on
x^{\*} and the worst-case gap shrinks toward zero — the discretised NLP
converging to the continuous optimum, at the price of more variables.
Numerical optimal control is no longer an academic afterthought — it is shipped software. Direct
collocation packages plan the trajectories of Mars landers and re-entry vehicles, and the same
transcription-into-an-NLP idea, solved fast enough to run in a loop, is what lets a humanoid robot
plan each footstep in real time. The pattern that began with Pontryagin's hand-derived conditions
and Bellman's grids has become, in its computational form, a general-purpose engine: write down a
model, a cost and the constraints, discretise, and let a sparse NLP solver find the optimal motion.