The Optimal Feedback Law

Every thread of this stage now ties off. The Riccati equation handed us a matrix P; the algebraic version made it a constant. What does P buy us? A single, strikingly simple controller: the optimal control is a linear state feedback, and the loop it closes is automatically stable.

The gain and the feedback law

The minimisation inside HJB already produced the optimal control u^\* = -R^{-1} B^{\mathsf{T}} P\, x. Bundle the constant matrices into a single gain matrix

K = R^{-1} B^{\mathsf{T}} P,

and the entire optimal controller is the one-line feedback law

u^\*(t) = -K\, x(t).

This is the prize. The control at any instant is a fixed linear function of the current measured state — no trajectory to precompute, no replanning. Knock the system off course and the law instantly prescribes the corrective control from where you actually are. That is what makes feedback robust where an open-loop plan, fixed in advance, is brittle. The gain K is computed once, offline, from the inverse R^{-1} and the Riccati solution P.

The closed loop is stable

Substitute the feedback into the dynamics \dot{x} = A x + B u. The control becomes part of the system, and the closed-loop dynamics are autonomous:

\dot{x} = A x + B(-K x) = (A - B K)\, x.

Everything now hinges on the matrix A - BK. From the matrix-exponential solution x(t) = e^{(A - BK)t} x(0), the state decays to zero precisely when every eigenvalue of A - BK has negative real part. The remarkable fact is that the LQR gain guarantees this: the optimal feedback always stabilises the system (given stabilisability and a detectable Q).

Why — a Lyapunov argument. Use the value function itself as an energy, V(x) = \tfrac12 x^{\mathsf{T}} P x with P \succ 0. Differentiate along the closed-loop flow and substitute the ARE; the algebra collapses to

\dot{V} = -\tfrac12\, x^{\mathsf{T}}\big(Q + K^{\mathsf{T}} R K\big) x \;\le\; 0.

Because Q \succeq 0 and R \succ 0, the matrix Q + K^{\mathsf{T}} R K is positive-semi-definite, so V can only decrease. A positive energy that only ever falls must drain to its floor: x \to 0. The same P that scores the cost doubles as the certificate of stability.

Closing the scalar loop

Continue the scalar example. With gain K = b p / r the closed-loop coefficient is a - bK = a - b^2 p / r. For the worked numbers a = 0, b = 1, q = 1, r = 1 we had p = 1, so K = 1 and

a - bK = 0 - 1\cdot 1 = -1 \;<\; 0,

a single closed-loop eigenvalue at -1. The open loop \dot{x} = u just sat there (eigenvalue 0, a marginal integrator); the feedback pulls it firmly into the left half-plane, and the state decays as x(t) = x_0 e^{-t}. More generally a - bK = -\sqrt{a^2 + b^2 q/r}, which is negative for any weights — the closed loop is stable no matter how you tune the cost.

Driving the state to zero

Here is the open-loop system \dot{x} = a x with a = 0.3 (slightly unstable, so it grows) alongside the closed loop \dot{x} = (a - bK)x under the LQR feedback, for b = 1, q = 1 and x_0 = 1. Slide the control weight r. A small r makes control cheap: the gain K is large, the closed-loop pole a - bK moves far left, and the state is slammed to zero. A large r makes control expensive: a gentle gain, a lazy pole near the origin, a slow glide home. The pole never crosses into the right half-plane — the closed loop is always stable.

Classical control also stabilises by moving eigenvalues — “pole placement” — but the engineer must choose where to put them, by taste. LQR places the closed-loop poles too, yet the placement is derived: it is wherever minimising \int (x^{\mathsf{T}} Q x + u^{\mathsf{T}} R u)\,dt sends them. Tuning Q and R slides the poles along optimal loci, so the designer dials an intention — “track harder” vs “spend less” — and the mathematics returns the corresponding optimal pole pattern. The final page runs the whole pipeline on a real system.