The Kalman Filter
The
Luenberger
observer drives the estimation error to zero by choosing the gain
L to place the eigenvalues of A - LC. But it
leaves a question open: which gain is best? Place the eigenvalues too far left and
the observer reacts violently, amplifying sensor noise; too close to the imaginary axis and it
crawls. When both the process and the measurements are corrupted by noise, there is an
optimal answer — the Kalman filter, the minimum-variance state estimator
for a linear Gaussian system.
Model the system with two independent
white-noise
sources:
\dot{x} = Ax + Bu + w, \qquad y = Cx + v,
where w is process noise (gusts, unmodelled forces) with
covariance Q_n, and v is
measurement noise (sensor error) with covariance R_n.
Both are zero-mean and
Gaussian.
A Gaussian belief, propagated
Because everything is linear and every noise is Gaussian, the filter's knowledge of the state stays
Gaussian for all time. So the entire belief is carried by just two objects:
-
a mean \hat{x} — the best estimate, which is the
conditional
expectation \hat{x} = \mathbb{E}[x \mid y_{0:t}] of the
state given all measurements so far;
-
a covariance P = \mathbb{E}[(x - \hat{x})(x - \hat{x})^{\mathsf{T}}]
— the estimation-error covariance, our quantified uncertainty.
The filter runs a perpetual predict–update cycle: roll the belief forward by the
dynamics, then snap it back toward each new measurement.
-
Predict. Push mean and covariance forward through the model. The mean follows the
deterministic dynamics, while the covariance grows — the process noise
Q_n makes us less sure as time passes:
\dot{\hat{x}} = A\hat{x} + Bu, \qquad \dot{P} = AP + PA^{\mathsf{T}} + Q_n.
-
Update. A measurement arrives. Correct the mean with the residual
y - C\hat{x} through the Kalman gain
K_f, and shrink the covariance — a measurement makes us
more sure:
\hat{x} \leftarrow \hat{x} + K_f\big(y - C\hat{x}\big), \qquad P \leftarrow P - K_f C P.
This is exactly the Luenberger observer — a model copy corrected by the residual — but now with a
statistically optimal, time-varying gain in place of a hand-placed one.
The Kalman gain and its Riccati equation
Step 1 — the optimal gain. Minimising the error covariance
P at the update step (a least-squares trade-off between trusting the
prediction and trusting the measurement) gives the Kalman gain
\boxed{\;K_f = P\,C^{\mathsf{T}} R_n^{-1}.\;}
Read it as a ratio of confidences. A large state uncertainty P or a small
measurement noise R_n makes K_f large — trust
the sensor, correct hard. A small P or a noisy sensor (large
R_n) makes K_f small — trust the model, barely
budge.
Step 2 — the continuous covariance flow. Combining the predict growth
AP + PA^{\mathsf{T}} + Q_n with the update shrinkage through
K_f = PC^{\mathsf{T}}R_n^{-1} gives a single differential equation for the
error covariance:
\dot{P} = AP + PA^{\mathsf{T}} + Q_n - P\,C^{\mathsf{T}} R_n^{-1} C\,P.
Step 3 — recognise it. This is a matrix Riccati equation — the
same species we met for optimal control. Place the two side by side:
\text{(control)}\quad -\dot{P} = A^{\mathsf{T}} P + P A - P B R^{-1} B^{\mathsf{T}} P + Q,
\text{(estimation)}\quad \phantom{-}\dot{P} = A\,P + P A^{\mathsf{T}} + Q_n - P C^{\mathsf{T}} R_n^{-1} C\,P.
They are identical under transposition. Send the
control
Riccati through the dictionary
A \leftrightarrow A^{\mathsf{T}}, \qquad B \leftrightarrow C^{\mathsf{T}}, \qquad Q \leftrightarrow Q_n, \qquad R \leftrightarrow R_n,
and the estimator Riccati appears. The control Riccati runs backward from a
terminal condition (cost-to-go); the estimator Riccati runs forward from an initial
condition (accumulated knowledge) — the same equation, swept in opposite directions of time. This is
the precise sense in which the Kalman filter is the estimation dual of the LQR
controller.
-
The optimal estimate is a Gaussian belief
(\hat{x}, P) propagated by predict–update.
-
The optimal gain is K_f = P\,C^{\mathsf{T}} R_n^{-1}, weighing model
confidence against sensor noise.
-
The error covariance solves the Riccati equation
\dot{P} = AP + PA^{\mathsf{T}} + Q_n - PC^{\mathsf{T}}R_n^{-1}CP, the
exact transpose dual of the LQR control Riccati.
The discrete predict–update equations
In practice measurements arrive at discrete instants k = 0, 1, 2, \dots,
and the filter alternates two stamped steps. With \hat{x}^- the predicted
(prior) mean and \hat{x}^+ the corrected (posterior) mean:
Predict (extrapolate by the dynamics, inflate by process noise):
\hat{x}^-_k = A\,\hat{x}^+_{k-1} + B u_{k-1}, \qquad P^-_k = A\,P^+_{k-1} A^{\mathsf{T}} + Q_n.
Update (form the gain, correct by the residual, shrink the covariance):
K_k = P^-_k C^{\mathsf{T}}\big(C P^-_k C^{\mathsf{T}} + R_n\big)^{-1},
\hat{x}^+_k = \hat{x}^-_k + K_k\big(y_k - C\hat{x}^-_k\big), \qquad P^+_k = (I - K_k C)\,P^-_k.
The discrete gain K_k = P^- C^{\mathsf{T}}(CP^-C^{\mathsf{T}} + R_n)^{-1}
is the continuous PC^{\mathsf{T}}R_n^{-1} with the measurement-noise
denominator made explicit: when R_n dwarfs the predicted output variance
CP^-C^{\mathsf{T}}, the gain is tiny and the filter ignores a noisy
reading; when R_n \to 0 the gain drives the estimate straight onto the
measurement.
The scalar headline: balancing trust
Strip it to one dimension to see the gain do its job. Take a slowly drifting scalar quantity
(a random walk, A = 1, C = 1) with process
variance q and measurement variance r. The
steady-state error variance p settles where predict-growth balances
update-shrinkage, and the steady Kalman gain is
k = \frac{p}{p + r}, \qquad \text{a number between } 0 \text{ and } 1.
The corrected estimate is then a weighted average of where the model thought it was and what the
sensor just said:
\hat{x}^+ = (1 - k)\,\hat{x}^- + k\,y.
With a precise sensor (r small) k \to 1 and the
estimate follows the measurement; with a noisy sensor (r large)
k \to 0 and the estimate leans on the smooth model prediction, ignoring
the jitter. The Kalman gain is the dial between believing your model and believing your
eyes.
Seeing it track
Below, a smooth true signal (in one colour) is observed only through the noisy dots. The Kalman
estimate (a second colour) is an exponential smoothing of those dots with the steady gain
k = p/(p+r), wrapped in a shaded
\pm\sqrt{p} uncertainty band. Raise the measurement noise
and the gain falls: the estimate grows smoother but lags the truth, trusting the model over
the jittery sensor. Raise the process noise and the gain climbs: the estimate
chases every wiggle, trusting the measurements more.
Rudolf Kálmán's 1960–61 papers might have stayed a control-theory curiosity but for a visit to
NASA Ames, where Stanley Schmidt saw that the filter was exactly what the Apollo guidance computer
needed to fuse noisy star-sightings and accelerometer data into a spacecraft trajectory. The
on-board "Kalman filter" — in its extended, nonlinear form — became the navigation backbone of
Apollo, and has flown on essentially every spacecraft, aircraft, missile, and now phone and
self-driving car since. Few equations have travelled so far on the strength of a single elegant
idea: keep a Gaussian, and update it optimally.