The Inverted Pendulum
Time to spend everything we have built. The inverted pendulum on a cart — balance
a pole upright by sliding the cart beneath it — is the classic benchmark of control, unstable and
irresistibly visual. We will run the full LQR pipeline on it: model, check, weight, solve, feed
back. This is the stage finale, the moment the
feedback
law meets a real machine.
The linearised model
The cart slides horizontally; the pole hinges on it. Four numbers capture the state — cart position
and velocity, pole angle and angular rate — so the
state-space
model lives in \mathbb{R}^4:
x = \begin{bmatrix} p \\ \dot{p} \\ \theta \\ \dot{\theta} \end{bmatrix} = \begin{bmatrix} \text{cart position} \\ \text{cart velocity} \\ \text{pole angle} \\ \text{pole angular velocity} \end{bmatrix}, \qquad u = \text{force on cart}.
The true dynamics are nonlinear (a \sin\theta from gravity), but balance
is about the upright equilibrium \theta = 0. Linearising there —
the very “linearise the dynamics” step that makes LQ the universal local model —
replaces \sin\theta \approx \theta and gives a linear time-invariant
system \dot{x} = Ax + Bu. With cart mass
M, pole mass m, pole half-length
\ell and gravity g, the standard structure is
A = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & -\dfrac{mg}{M} & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & \dfrac{(M+m)g}{M\ell} & 0 \end{bmatrix}, \qquad B = \begin{bmatrix} 0 \\ \dfrac{1}{M} \\ 0 \\ -\dfrac{1}{M\ell} \end{bmatrix}.
The telltale entry is (M+m)g/(M\ell) > 0 in the angular-acceleration
row: a small lean produces an acceleration that increases the lean. That positive feedback
is an eigenvalue of A in the right half-plane — the open-loop pole
falls. Our job is to move that eigenvalue into the left half-plane with feedback.
Is it controllable?
Before optimising, check that control can reach every mode — otherwise no gain can stabilise the
fall. The test is
controllability:
the matrix
\mathcal{C} = \begin{bmatrix} B & AB & A^2 B & A^3 B \end{bmatrix}
must have full rank 4. For the cart-pole it does — pushing the cart
couples through to the pole angle, so a single horizontal force can influence all four states. The
system is controllable, which is exactly the condition that guarantees the
Algebraic
Riccati Equation has a unique stabilising solution. We are cleared to apply LQR.
Running the LQR pipeline
Now the recipe, end to end.
-
Choose the weights. Penalise what matters: a diagonal
Q = \mathrm{diag}(q_p, q_{\dot p}, q_\theta, q_{\dot\theta}) with a
large q_\theta (keep the pole upright) and a modest
q_p (keep the cart near centre), and a scalar
R = \rho > 0 on the force.
-
Solve the ARE. Feed A, B, Q, R to the Algebraic
Riccati Equation and get the symmetric positive-definite
P — one call to a numerical solver, as the
4\times 4 case is far past hand algebra.
-
Form the gain.
K = R^{-1} B^{\mathsf{T}} P — here a row vector
K = [\,k_1\ k_2\ k_3\ k_4\,], one number per state.
-
Feed back. Apply
u = -Kx = -(k_1 p + k_2 \dot p + k_3 \theta + k_4 \dot\theta). The
closed loop \dot{x} = (A - BK)x now has all four eigenvalues in the
left half-plane, so a perturbed pole returns smoothly to upright.
We do not grind the 4\times 4 Riccati solution by hand — that is the
computer's job — but the structure is everything from this stage: a quadratic cost, a
Riccati matrix, a linear feedback gain, a stable closed loop. Conceptually
K comes out with a sizeable k_3 (react hard to
angle) and a smaller k_1 (nudge the cart back), and the pole balances.
-
Linearise about upright to get a controllable 4-state LTI model
\dot{x} = Ax + Bu.
-
Pick Q \succeq 0 (heavy on angle) and
R = \rho \succ 0; solve the ARE for
P; set K = R^{-1} B^{\mathsf{T}} P.
-
The feedback u = -Kx stabilises the upright equilibrium; raising
\rho trades a gentler, lower-effort controller against a slower
recovery.
Catching the fall
Knock the pole off vertical and watch the LQR feedback catch it. The curve is the pole angle
\theta(t) under u = -Kx — a damped oscillation
decaying back to upright, \theta = 0. Slide the initial lean
\theta_0 to perturb it, and slide the force weight
R = \rho: a small \rho makes
force cheap, so the gain is large and the pole snaps upright fast and tight; a large
\rho charges heavily for force, so the controller is gentle and the pole
sways back slowly. That is the aggressiveness-versus-effort dial of LQR, made physical.
Cart-pole is the rite of passage twice over. In control it is the first nonlinear, unstable plant
every student stabilises — small enough to grasp, hard enough to matter. In
reinforcement learning it is “CartPole”, the canonical first
environment: an agent that knows nothing of A,
B or a Riccati equation learns to balance the pole by trial and reward.
The two cultures meet on the same little cart — LQR computes the optimal feedback from a model;
RL discovers a policy without one, and on this linear problem it converges toward the
very same linear
feedback LQR hands you in closed form.