The Optimal Feedback Law
Every thread of this stage now ties off. The
Riccati
equation handed us a matrix P; the
algebraic
version made it a constant. What does P buy us? A
single, strikingly simple controller: the optimal control is a linear state
feedback, and the loop it closes is automatically stable.
The gain and the feedback law
The minimisation inside HJB already produced the optimal control
u^\* = -R^{-1} B^{\mathsf{T}} P\, x. Bundle the constant matrices into a
single gain matrix
K = R^{-1} B^{\mathsf{T}} P,
and the entire optimal controller is the one-line feedback law
u^\*(t) = -K\, x(t).
This is the prize. The control at any instant is a fixed linear function of the
current measured state — no trajectory to precompute, no replanning. Knock the system off
course and the law instantly prescribes the corrective control from where you actually are. That is
what makes feedback robust where an open-loop plan, fixed in advance, is brittle. The gain
K is computed once, offline, from the
inverse
R^{-1} and the Riccati solution P.
The closed loop is stable
Substitute the feedback into the dynamics
\dot{x} = A x + B u. The control becomes part of the system, and the
closed-loop dynamics are autonomous:
\dot{x} = A x + B(-K x) = (A - B K)\, x.
Everything now hinges on the matrix A - BK. From the
matrix-exponential
solution x(t) = e^{(A - BK)t} x(0), the state decays to zero
precisely when every
eigenvalue
of A - BK has negative real part. The remarkable fact is
that the LQR gain guarantees this: the optimal feedback always stabilises the system (given
stabilisability and a detectable Q).
Why — a Lyapunov argument. Use the value function itself as an energy,
V(x) = \tfrac12 x^{\mathsf{T}} P x with
P \succ 0. Differentiate along the closed-loop flow and substitute the
ARE; the algebra collapses to
\dot{V} = -\tfrac12\, x^{\mathsf{T}}\big(Q + K^{\mathsf{T}} R K\big) x \;\le\; 0.
Because Q \succeq 0 and R \succ 0, the matrix
Q + K^{\mathsf{T}} R K is positive-semi-definite, so
V can only decrease. A positive energy that only ever falls must
drain to its floor: x \to 0. The same P that
scores the cost doubles as the certificate of stability.
-
Gain: K = R^{-1} B^{\mathsf{T}} P, with
P the (stabilising) Riccati solution.
-
Control: the optimal law is the linear feedback
u^\*(t) = -K\, x(t) — a function of the current state only.
-
Closed loop: \dot{x} = (A - BK)\, x, whose
eigenvalues all have negative real part, so x(t) \to 0.
Closing the scalar loop
Continue the scalar example. With gain K = b p / r the closed-loop
coefficient is a - bK = a - b^2 p / r. For the worked numbers
a = 0, b = 1, q = 1,
r = 1 we had p = 1, so
K = 1 and
a - bK = 0 - 1\cdot 1 = -1 \;<\; 0,
a single closed-loop eigenvalue at -1. The open loop
\dot{x} = u just sat there (eigenvalue 0, a
marginal integrator); the feedback pulls it firmly into the left half-plane, and the state decays as
x(t) = x_0 e^{-t}. More generally
a - bK = -\sqrt{a^2 + b^2 q/r}, which is negative for any
weights — the closed loop is stable no matter how you tune the cost.
Driving the state to zero
Here is the open-loop system \dot{x} = a x with
a = 0.3 (slightly unstable, so it grows) alongside the
closed loop \dot{x} = (a - bK)x under the LQR feedback,
for b = 1, q = 1 and
x_0 = 1. Slide the control weight r. A
small r makes control cheap: the gain
K is large, the closed-loop pole a - bK moves
far left, and the state is slammed to zero. A large r
makes control expensive: a gentle gain, a lazy pole near the origin, a slow glide home. The pole
never crosses into the right half-plane — the closed loop is always stable.
Classical control also stabilises by moving eigenvalues — “pole placement” — but the
engineer must choose where to put them, by taste. LQR places the closed-loop poles too,
yet the placement is derived: it is wherever minimising
\int (x^{\mathsf{T}} Q x + u^{\mathsf{T}} R u)\,dt sends them. Tuning
Q and R slides the poles along optimal loci,
so the designer dials an intention — “track harder” vs “spend less” — and
the mathematics returns the corresponding optimal pole pattern. The
final
page runs the whole pipeline on a real system.