The Riccati Equation

We have the Linear-Quadratic problem and we have the Hamilton–Jacobi–Bellman equation, a sufficient condition that hands back the optimal feedback for every state at once. This page bolts them together. One inspired guess about the shape of the value function turns the HJB partial differential equation into a single ordinary differential equation for a matrix — the Riccati equation. This is the centrepiece derivation of the whole stage.

The quadratic ansatz

The HJB equation for the LQ problem, with running cost L = \tfrac12(x^{\mathsf{T}} Q x + u^{\mathsf{T}} R u) and dynamics f = A x + B u, reads

-V_t = \min_{u}\Big[\, \tfrac12\big(x^{\mathsf{T}} Q x + u^{\mathsf{T}} R u\big) + V_x^{\mathsf{T}}\big(A x + B u\big) \,\Big].

The cost is quadratic and the dynamics linear, so we guess that the value function (the cost-to-go) is itself a pure quadratic form in the state, with a symmetric, time-varying matrix P(t) carrying the coefficients:

V(x, t) = \tfrac12\, x^{\mathsf{T}} P(t)\, x, \qquad P(t) = P(t)^{\mathsf{T}}.

This is the LQ counterpart of the parabola V(x) = x^2 that HJB produced for the scalar example. From it the two derivatives HJB needs fall straight out:

V_x = P(t)\,x, \qquad V_t = \tfrac12\, x^{\mathsf{T}} \dot{P}(t)\, x.

Deriving the differential Riccati equation

Step 1 — substitute the ansatz into HJB. Replacing V_x = Px and V_t = \tfrac12 x^{\mathsf{T}}\dot{P}x,

-\tfrac12\, x^{\mathsf{T}} \dot{P}\, x = \min_{u}\Big[\, \tfrac12 x^{\mathsf{T}} Q x + \tfrac12 u^{\mathsf{T}} R u + (P x)^{\mathsf{T}}(A x + B u) \,\Big].

Step 2 — minimise over u. Only two terms in the bracket involve u — the control penalty \tfrac12 u^{\mathsf{T}} R u and the coupling (Px)^{\mathsf{T}} B u = x^{\mathsf{T}} P B u (using P^{\mathsf{T}} = P). Differentiate the bracket with respect to u and set it to zero:

\frac{\partial}{\partial u}\Big[\, \tfrac12 u^{\mathsf{T}} R u + x^{\mathsf{T}} P B u \,\Big] = R\,u + B^{\mathsf{T}} P\, x = 0.

Because R \succ 0 it is invertible, so we can solve for the minimiser outright — and since the bracket is convex in u (its u-Hessian is R \succ 0) this stationary point is the genuine minimum:

u^\* = -R^{-1} B^{\mathsf{T}} P\, x.

The optimal control is already a linear feedback in the state — the whole point of the stage, falling out at the first minimisation.

Step 3 — substitute u^\* back in. We evaluate the two u-terms at u^\* = -R^{-1}B^{\mathsf{T}}Px. The control penalty, using R^{-1} R\, R^{-1} = R^{-1}, is

\tfrac12\, u^{\*\mathsf{T}} R\, u^\* = \tfrac12\, x^{\mathsf{T}} P B\, R^{-1} R\, R^{-1} B^{\mathsf{T}} P\, x = \tfrac12\, x^{\mathsf{T}} P B R^{-1} B^{\mathsf{T}} P\, x,

and the coupling term is

x^{\mathsf{T}} P B\, u^\* = -\,x^{\mathsf{T}} P B R^{-1} B^{\mathsf{T}} P\, x.

Adding these two leaves a single -\tfrac12\, x^{\mathsf{T}} P B R^{-1} B^{\mathsf{T}} P\, x (the positive half minus the whole). Collecting everything on the right, the minimised HJB bracket becomes

-\tfrac12\, x^{\mathsf{T}} \dot{P}\, x = \tfrac12 x^{\mathsf{T}} Q x + x^{\mathsf{T}} P A x - \tfrac12\, x^{\mathsf{T}} P B R^{-1} B^{\mathsf{T}} P\, x.

Step 4 — symmetrise the lone PA term. The scalar x^{\mathsf{T}} P A x equals its own transpose x^{\mathsf{T}} A^{\mathsf{T}} P x, so we may replace it by the symmetric average \tfrac12 x^{\mathsf{T}}(P A + A^{\mathsf{T}} P) x. Now every term wears the form \tfrac12 x^{\mathsf{T}}(\cdots) x:

-\tfrac12\, x^{\mathsf{T}} \dot{P}\, x = \tfrac12\, x^{\mathsf{T}}\Big[\, A^{\mathsf{T}} P + P A - P B R^{-1} B^{\mathsf{T}} P + Q \,\Big] x.

Step 5 — strip the x's. This must hold for every state x, and both sides are symmetric quadratic forms, so the bracketed matrices must be equal. The x's and the \tfrac12's cancel, leaving a pure matrix differential equation — the differential Riccati equation:

-\dot{P} = A^{\mathsf{T}} P + P A - P B R^{-1} B^{\mathsf{T}} P + Q, \qquad P(T) = S.

The terminal condition comes from HJB's own terminal condition V(x, T) = \tfrac12 x^{\mathsf{T}} S x, matching the terminal cost of the LQ problem: at the final instant the cost-to-go is the terminal penalty, so P(T) = S.

Watching p(t) integrate backward

In one dimension the matrices are numbers. Take a = 0, b = 1, r = 1; the differential Riccati equation collapses to -\dot{p} = q - p^2, started at p(T) = S and solved backward in time. Slide the state weight q and the terminal value S and watch the curve: integrating from the right edge t = T leftward, whatever it starts at, p(t) rushes to the same steady value p_\infty = \sqrt{q} (where \dot p = 0). That plateau — reached well before t = 0 for a long horizon — is the constant the next page solves for directly.

The Riccati equation carries a terminal condition, not an initial one — we know P at the end, P(T) = S, because that is where the cost-to-go is pinned to the terminal penalty. Dynamic programming always reasons from the finish line backward, and the Riccati equation inherits that direction. The system trajectory x(t) runs forward from x(0); the cost matrix P(t) runs backward from P(T) — the two sweeps meet to give the feedback gain at every instant.