The Kalman Filter

The Luenberger observer drives the estimation error to zero by choosing the gain L to place the eigenvalues of A - LC. But it leaves a question open: which gain is best? Place the eigenvalues too far left and the observer reacts violently, amplifying sensor noise; too close to the imaginary axis and it crawls. When both the process and the measurements are corrupted by noise, there is an optimal answer — the Kalman filter, the minimum-variance state estimator for a linear Gaussian system.

Model the system with two independent white-noise sources:

\dot{x} = Ax + Bu + w, \qquad y = Cx + v,

where w is process noise (gusts, unmodelled forces) with covariance Q_n, and v is measurement noise (sensor error) with covariance R_n. Both are zero-mean and Gaussian.

A Gaussian belief, propagated

Because everything is linear and every noise is Gaussian, the filter's knowledge of the state stays Gaussian for all time. So the entire belief is carried by just two objects:

The filter runs a perpetual predict–update cycle: roll the belief forward by the dynamics, then snap it back toward each new measurement.

This is exactly the Luenberger observer — a model copy corrected by the residual — but now with a statistically optimal, time-varying gain in place of a hand-placed one.

The Kalman gain and its Riccati equation

Step 1 — the optimal gain. Minimising the error covariance P at the update step (a least-squares trade-off between trusting the prediction and trusting the measurement) gives the Kalman gain

\boxed{\;K_f = P\,C^{\mathsf{T}} R_n^{-1}.\;}

Read it as a ratio of confidences. A large state uncertainty P or a small measurement noise R_n makes K_f large — trust the sensor, correct hard. A small P or a noisy sensor (large R_n) makes K_f small — trust the model, barely budge.

Step 2 — the continuous covariance flow. Combining the predict growth AP + PA^{\mathsf{T}} + Q_n with the update shrinkage through K_f = PC^{\mathsf{T}}R_n^{-1} gives a single differential equation for the error covariance:

\dot{P} = AP + PA^{\mathsf{T}} + Q_n - P\,C^{\mathsf{T}} R_n^{-1} C\,P.

Step 3 — recognise it. This is a matrix Riccati equation — the same species we met for optimal control. Place the two side by side:

\text{(control)}\quad -\dot{P} = A^{\mathsf{T}} P + P A - P B R^{-1} B^{\mathsf{T}} P + Q, \text{(estimation)}\quad \phantom{-}\dot{P} = A\,P + P A^{\mathsf{T}} + Q_n - P C^{\mathsf{T}} R_n^{-1} C\,P.

They are identical under transposition. Send the control Riccati through the dictionary

A \leftrightarrow A^{\mathsf{T}}, \qquad B \leftrightarrow C^{\mathsf{T}}, \qquad Q \leftrightarrow Q_n, \qquad R \leftrightarrow R_n,

and the estimator Riccati appears. The control Riccati runs backward from a terminal condition (cost-to-go); the estimator Riccati runs forward from an initial condition (accumulated knowledge) — the same equation, swept in opposite directions of time. This is the precise sense in which the Kalman filter is the estimation dual of the LQR controller.

The discrete predict–update equations

In practice measurements arrive at discrete instants k = 0, 1, 2, \dots, and the filter alternates two stamped steps. With \hat{x}^- the predicted (prior) mean and \hat{x}^+ the corrected (posterior) mean:

Predict (extrapolate by the dynamics, inflate by process noise):

\hat{x}^-_k = A\,\hat{x}^+_{k-1} + B u_{k-1}, \qquad P^-_k = A\,P^+_{k-1} A^{\mathsf{T}} + Q_n.

Update (form the gain, correct by the residual, shrink the covariance):

K_k = P^-_k C^{\mathsf{T}}\big(C P^-_k C^{\mathsf{T}} + R_n\big)^{-1}, \hat{x}^+_k = \hat{x}^-_k + K_k\big(y_k - C\hat{x}^-_k\big), \qquad P^+_k = (I - K_k C)\,P^-_k.

The discrete gain K_k = P^- C^{\mathsf{T}}(CP^-C^{\mathsf{T}} + R_n)^{-1} is the continuous PC^{\mathsf{T}}R_n^{-1} with the measurement-noise denominator made explicit: when R_n dwarfs the predicted output variance CP^-C^{\mathsf{T}}, the gain is tiny and the filter ignores a noisy reading; when R_n \to 0 the gain drives the estimate straight onto the measurement.

The scalar headline: balancing trust

Strip it to one dimension to see the gain do its job. Take a slowly drifting scalar quantity (a random walk, A = 1, C = 1) with process variance q and measurement variance r. The steady-state error variance p settles where predict-growth balances update-shrinkage, and the steady Kalman gain is

k = \frac{p}{p + r}, \qquad \text{a number between } 0 \text{ and } 1.

The corrected estimate is then a weighted average of where the model thought it was and what the sensor just said:

\hat{x}^+ = (1 - k)\,\hat{x}^- + k\,y.

With a precise sensor (r small) k \to 1 and the estimate follows the measurement; with a noisy sensor (r large) k \to 0 and the estimate leans on the smooth model prediction, ignoring the jitter. The Kalman gain is the dial between believing your model and believing your eyes.

Seeing it track

Below, a smooth true signal (in one colour) is observed only through the noisy dots. The Kalman estimate (a second colour) is an exponential smoothing of those dots with the steady gain k = p/(p+r), wrapped in a shaded \pm\sqrt{p} uncertainty band. Raise the measurement noise and the gain falls: the estimate grows smoother but lags the truth, trusting the model over the jittery sensor. Raise the process noise and the gain climbs: the estimate chases every wiggle, trusting the measurements more.

Rudolf Kálmán's 1960–61 papers might have stayed a control-theory curiosity but for a visit to NASA Ames, where Stanley Schmidt saw that the filter was exactly what the Apollo guidance computer needed to fuse noisy star-sightings and accelerometer data into a spacecraft trajectory. The on-board "Kalman filter" — in its extended, nonlinear form — became the navigation backbone of Apollo, and has flown on essentially every spacecraft, aircraft, missile, and now phone and self-driving car since. Few equations have travelled so far on the strength of a single elegant idea: keep a Gaussian, and update it optimally.