What Is Optimal Control?

A controlled system is a machine with a knob. Turn the knob over time and the machine follows some path; turn it differently and it follows another. Optimal control asks the natural next question: of all the ways we could turn the knob, which one is best? It is the calculus of decisions that unfold in time — steering a rocket, a portfolio, or a power grid — and it rests on exactly four ingredients.

The four ingredients

1 — The state x(t). A vector that records everything we need to know about the system right now to predict its future: a rocket's height and velocity, a car's speed, an account balance. It lives in \mathbb{R}^n and evolves in time.

2 — The control u(t). The knob we are free to turn — the engine thrust, the throttle, the trading rate. We get to choose this function over the whole horizon [0, T]; everything else follows from it.

3 — The dynamics \dot{x} = f(x, u, t). The law that says how the knob moves the state. It is a differential equation: given the current state and control, it fixes the instantaneous rate of change \dot{x}.

\dot{x}(t) = f\big(x(t),\, u(t),\, t\big), \qquad x(0) = x_0 \text{ given}.

4 — The cost J. A single number scoring how good a run was — fuel burned, time taken, energy spent, error accumulated. We want it small. A typical cost integrates a running penalty along the path and adds a penalty on where we end up:

J[u] = \underbrace{\phi\big(x(T)\big)}_{\text{terminal}} + \int_0^T \underbrace{L\big(x(t), u(t), t\big)}_{\text{running}} \, dt.

The problem

With those four pieces in hand, the optimal control problem is a single sentence: choose the control function u(\cdot) on [0, T] to minimise J[u], subject to the state obeying the dynamics and starting at x_0.

\min_{u(\cdot)} \; J[u] \qquad \text{subject to} \qquad \dot{x} = f(x, u, t), \quad x(0) = x_0.

This is a minimisation not over a number, nor a point, but over an entire function — the whole control history. That is what makes the subject a child of the calculus of variations rather than ordinary calculus. A few motivating instances:

Soft-landing a rocket. State = height and velocity; control = thrust; dynamics = Newton's law; cost = fuel burned. Land gently using the least propellant.
Swinging a robot arm to a target. State = joint angle and angular velocity; control = motor torque; cost = elapsed time. Reach the target as fast as possible — a minimum-time problem.
Cruise control. State = the gap between actual and target speed; control = throttle; cost = that error plus fuel use. Hold the set speed smoothly.

Open-loop versus feedback

There are two fundamentally different ways to express a control, and the distinction runs through the whole subject.

Open-loop control writes the knob as a function of time alone, u = u(t): a plan computed in advance and played back blindly, like a pianola roll. It is optimal for the model we wrote down — but it has no way to react if the real system drifts off the predicted path.

Feedback (closed-loop) control writes the knob as a function of the current state, u = u(x): a rule that looks at where the system actually is and responds. A thermostat reading the room temperature is feedback; a heating timer is open-loop. Feedback corrects for disturbances and modelling error, and it is what we ultimately want — the crowning result of the linear-quadratic theory will be exactly such a rule, u = -Kx.

An optimal control problem is specified by four objects and one verb:

a state x(t) \in \mathbb{R}^n describing the system, and a control u(t) \in \mathbb{R}^m we are free to choose;
dynamics \dot{x} = f(x, u, t) with x(0) = x_0, linking the control to the state's evolution;
a scalar cost J[u] = \phi(x(T)) + \int_0^T L(x, u, t)\,dt to be minimised over the control function;
the solution is a control open-loop (u = u(t), planned ahead) or, better, feedback (u = u(x), reacting to the state).

Feel the trade-off

Take the simplest possible system, \dot{x} = u, starting at x_0 = 2, and hold the control at a single constant value u over [0, 2]. The state then runs along the straight line x(t) = x_0 + u\,t. Slide u and watch two things at once: the trajectory tilt, and the running cost

J(u) = \int_0^2 \big(x(t)^2 + u(t)^2\big)\, dt

change. A large negative u drives the state to zero quickly (small state penalty) but spends a lot of control effort (large u^2 penalty); u = 0 is lazy but lets the state sit far from zero. Somewhere between lies the constant control that minimises J — the first whiff of an optimum.

The instinct to optimise a path is old — Johann Bernoulli's 1696 brachistochrone, the curve of fastest descent, is the founding problem of the calculus of variations. But optimal control as we know it is a child of the Space Age. In the late 1950s, racing to guide rockets and intercept missiles, Lev Pontryagin's school in Moscow proved the maximum principle while, in the United States, Richard Bellman built dynamic programming. Their methods sent Apollo to the Moon on a fuel budget, and today they steer everything from spacecraft to data-centre cooling. We will meet both pillars later; this stage builds the language they speak.