LQG and the Separation Principle

This is the summit of the stage. We have built two optimal machines independently: the LQR optimal feedback law u = -Kx, which knows what to do if it can see the state, and the Kalman filter, which reconstructs the state from noisy partial measurements. The obvious question is whether they fit together — and the answer is one of the most elegant results in all of control theory.

The setting is LQG: Linear dynamics, Quadratic cost, Gaussian noise — the stochastic version of LQR where the controller no longer sees the state, only noisy measurements of it:

dx = (Ax + Bu)\,dt + w, \qquad y = Cx + v,

with process noise w (covariance Q_n) and measurement noise v (covariance R_n), and the quadratic cost J = \mathbb{E}\big[\int_0^\infty (x^{\mathsf{T}} Q x + u^{\mathsf{T}} R u)\,dt\big].

The separation principle

Here is the punchline. The optimal LQG controller is simply the LQR feedback gain K applied to the Kalman-filtered state estimate \hat{x}:

\boxed{\;u^\* = -K\,\hat{x}.\;}

You do not co-design the controller and the estimator in some tangled joint optimisation. You design them separately — each optimal in its own right — and bolt them together. The result is globally optimal. Concretely:

Estimation and control decouple. Two problems that looked hopelessly entangled — how should my steering depend on how well I can see? — turn out to be solvable one at a time.

Certainty equivalence

There is a second, equivalent way to state the miracle, called certainty equivalence: act as if your best estimate were the true state. The optimal controller uses the very same gain K it would use with perfect state knowledge — it simply substitutes \hat{x} for the unavailable x and proceeds with full confidence, as though the estimate carried no uncertainty at all.

What makes this remarkable is what the controller is allowed to ignore. The estimation-error covariance P_e — how uncertain the estimate is — does not change the gain. A foggier sensor produces a worse \hat{x}, and therefore a higher expected cost, but the controller's response to the estimate it has is unchanged. The feedback law does not become timid in proportion to its uncertainty; certainty equivalence says the optimal policy treats the estimate as the truth. (This clean decoupling is special to the linear Gaussian world; for nonlinear systems the controller generally should hedge against its uncertainty, and separation fails.)

The two Riccati equations make the decoupling vivid. The control Riccati is a backward, forward-design sweep producing K; the estimation Riccati is a forward sweep producing K_f. They share no variables. The controller designer and the filter designer can work in separate rooms and the assembled system is still optimal — the transpose duality we first met for the feedback law closing into a complete, optimal, output-feedback controller.

The closed loop, assembled

Putting the pieces in series gives the full LQG compensator. The plant is driven by u; its noisy output y feeds the Kalman filter, which emits the estimate \hat{x}; the gain produces u = -K\hat{x}, which drives the plant — a closed loop of estimate and action:

\text{plant} \;\xrightarrow{\;y = Cx + v\;}\; \text{Kalman filter} \;\xrightarrow{\;\hat{x}\;}\; (-K) \;\xrightarrow{\;u = -K\hat{x}\;}\; \text{plant}.

Internally the estimator carries its own dynamics, \dot{\hat{x}} = A\hat{x} + Bu + K_f(y - C\hat{x}), while the true state obeys dx = (Ax + Bu)\,dt + w. The combined closed loop has eigenvalues that split cleanly into the controller eigenvalues of A - BK and the estimator eigenvalues of A - K_f C — the separation principle visible in the spectrum itself. Design each set where you like, independently.

The separation principle is not just pretty — it is why optimal output-feedback control was tractable enough to fly. The Apollo guidance and navigation system was, in essence, a Kalman filter fusing star-tracker and accelerometer data into a state estimate, feeding a control law that steered the spacecraft — estimator and controller designed apart and joined, exactly as separation permits. The same architecture runs underneath modern aerospace: aircraft autopilots, missile guidance, satellite attitude control, and the inertial-navigation systems in every airliner are LQG controllers in spirit, an optimal filter handing a clean state to an optimal regulator. Rudolf Kálmán's filter and the LQR regulator, separately optimal, became jointly indispensable.

Watching estimate and control close the loop

Below, the scalar plant dx = (ax + u)\,dt + \sigma\,dW (with a = 0.3, an unstable plant) is steered by u = -K\hat{x}, where \hat{x} comes from a Kalman observer \dot{\hat{x}} = a\hat{x} + u + K_f(x - \hat{x}). The true state starts at x(0) = 2; the estimate starts wrong at \hat{x}(0) = 0. Watch \hat{x} lock onto x (the estimator working) while the feedback drives both toward zero (the controller working). Raise the noise \sigma and the pair jitters more, but the loop keeps the unstable state regulated — separately designed gains, jointly holding the system.