What Is Optimal Control?
A controlled system is a machine with a knob. Turn the knob over time and the machine
follows some path; turn it differently and it follows another. Optimal control
asks the natural next question: of all the ways we could turn the knob, which one is
best? It is the calculus of decisions that unfold in time — steering a rocket,
a portfolio, or a power grid — and it rests on exactly four ingredients.
The four ingredients
1 — The state x(t). A vector that records
everything we need to know about the system right now to predict its future: a rocket's
height and velocity, a car's speed, an account balance. It lives in
\mathbb{R}^n and evolves in time.
2 — The control u(t). The knob we are free to
turn — the engine thrust, the throttle, the trading rate. We get to choose this
function over the whole horizon [0, T]; everything else follows
from it.
3 — The dynamics \dot{x} = f(x, u, t). The law
that says how the knob moves the state. It is a
differential
equation: given the current state and control, it fixes the instantaneous rate
of change \dot{x}.
\dot{x}(t) = f\big(x(t),\, u(t),\, t\big), \qquad x(0) = x_0 \text{ given}.
4 — The cost J. A single number scoring how good
a run was — fuel burned, time taken, energy spent, error accumulated. We want it
small. A typical cost integrates a running penalty along the path and adds a
penalty on where we end up:
J[u] = \underbrace{\phi\big(x(T)\big)}_{\text{terminal}} + \int_0^T \underbrace{L\big(x(t), u(t), t\big)}_{\text{running}} \, dt.
The problem
With those four pieces in hand, the optimal control problem is a single sentence: choose the
control function u(\cdot) on [0, T] to
minimise J[u], subject to the state obeying the
dynamics and starting at x_0.
\min_{u(\cdot)} \; J[u] \qquad \text{subject to} \qquad \dot{x} = f(x, u, t), \quad x(0) = x_0.
This is a minimisation not over a number, nor a point, but over an entire function —
the whole control history. That is what makes the subject a child of the
calculus of
variations rather than ordinary calculus. A few motivating instances:
-
Soft-landing a rocket. State = height and velocity; control = thrust;
dynamics = Newton's law; cost = fuel burned. Land gently using the least propellant.
-
Swinging a robot arm to a target. State = joint angle and angular
velocity; control = motor torque; cost = elapsed time. Reach the target as fast as
possible — a minimum-time problem.
-
Cruise control. State = the gap between actual and target speed; control =
throttle; cost = that error plus fuel use. Hold the set speed smoothly.
Open-loop versus feedback
There are two fundamentally different ways to express a control, and the distinction
runs through the whole subject.
Open-loop control writes the knob as a function of time alone,
u = u(t): a plan computed in advance and played back blindly, like
a pianola roll. It is optimal for the model we wrote down — but it has no way to react if the
real system drifts off the predicted path.
Feedback (closed-loop) control writes the knob as a function of the current
state, u = u(x): a rule that looks at where the system
actually is and responds. A thermostat reading the room temperature is feedback; a heating
timer is open-loop. Feedback corrects for disturbances and modelling error, and it is what we
ultimately want — the crowning result of the linear-quadratic theory will be exactly such a
rule, u = -Kx.
An optimal control problem is specified by four objects and one verb:
- a state x(t) \in \mathbb{R}^n describing the
system, and a control u(t) \in \mathbb{R}^m we
are free to choose;
- dynamics \dot{x} = f(x, u, t) with
x(0) = x_0, linking the control to the state's evolution;
- a scalar cost
J[u] = \phi(x(T)) + \int_0^T L(x, u, t)\,dt to be
minimised over the control function;
- the solution is a control open-loop
(u = u(t), planned ahead) or, better,
feedback (u = u(x), reacting to the state).
Feel the trade-off
Take the simplest possible system, \dot{x} = u, starting at
x_0 = 2, and hold the control at a single constant value
u over [0, 2]. The state then runs along
the straight line x(t) = x_0 + u\,t. Slide u
and watch two things at once: the trajectory tilt, and the running cost
J(u) = \int_0^2 \big(x(t)^2 + u(t)^2\big)\, dt
change. A large negative u drives the state to zero quickly (small
state penalty) but spends a lot of control effort (large
u^2 penalty); u = 0 is lazy but lets the
state sit far from zero. Somewhere between lies the constant control that minimises
J — the first whiff of an optimum.
The instinct to optimise a path is old — Johann Bernoulli's 1696 brachistochrone, the curve
of fastest descent, is the founding problem of the calculus of variations. But optimal
control as we know it is a child of the Space Age. In the late 1950s, racing to guide
rockets and intercept missiles, Lev Pontryagin's school in Moscow proved the
maximum principle while, in the United States, Richard Bellman built
dynamic programming. Their methods sent Apollo to the Moon on a fuel budget, and
today they steer everything from spacecraft to data-centre cooling. We will meet both pillars
later; this stage builds the language they speak.