The Inverted Pendulum

Time to spend everything we have built. The inverted pendulum on a cart — balance a pole upright by sliding the cart beneath it — is the classic benchmark of control, unstable and irresistibly visual. We will run the full LQR pipeline on it: model, check, weight, solve, feed back. This is the stage finale, the moment the feedback law meets a real machine.

The linearised model

The cart slides horizontally; the pole hinges on it. Four numbers capture the state — cart position and velocity, pole angle and angular rate — so the state-space model lives in \mathbb{R}^4:

x = \begin{bmatrix} p \\ \dot{p} \\ \theta \\ \dot{\theta} \end{bmatrix} = \begin{bmatrix} \text{cart position} \\ \text{cart velocity} \\ \text{pole angle} \\ \text{pole angular velocity} \end{bmatrix}, \qquad u = \text{force on cart}.

The true dynamics are nonlinear (a \sin\theta from gravity), but balance is about the upright equilibrium \theta = 0. Linearising there — the very “linearise the dynamics” step that makes LQ the universal local model — replaces \sin\theta \approx \theta and gives a linear time-invariant system \dot{x} = Ax + Bu. With cart mass M, pole mass m, pole half-length \ell and gravity g, the standard structure is

A = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & -\dfrac{mg}{M} & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & \dfrac{(M+m)g}{M\ell} & 0 \end{bmatrix}, \qquad B = \begin{bmatrix} 0 \\ \dfrac{1}{M} \\ 0 \\ -\dfrac{1}{M\ell} \end{bmatrix}.

The telltale entry is (M+m)g/(M\ell) > 0 in the angular-acceleration row: a small lean produces an acceleration that increases the lean. That positive feedback is an eigenvalue of A in the right half-plane — the open-loop pole falls. Our job is to move that eigenvalue into the left half-plane with feedback.

Is it controllable?

Before optimising, check that control can reach every mode — otherwise no gain can stabilise the fall. The test is controllability: the matrix

\mathcal{C} = \begin{bmatrix} B & AB & A^2 B & A^3 B \end{bmatrix}

must have full rank 4. For the cart-pole it does — pushing the cart couples through to the pole angle, so a single horizontal force can influence all four states. The system is controllable, which is exactly the condition that guarantees the Algebraic Riccati Equation has a unique stabilising solution. We are cleared to apply LQR.

Running the LQR pipeline

Now the recipe, end to end.

Choose the weights. Penalise what matters: a diagonal Q = \mathrm{diag}(q_p, q_{\dot p}, q_\theta, q_{\dot\theta}) with a large q_\theta (keep the pole upright) and a modest q_p (keep the cart near centre), and a scalar R = \rho > 0 on the force.
Solve the ARE. Feed A, B, Q, R to the Algebraic Riccati Equation and get the symmetric positive-definite P — one call to a numerical solver, as the 4\times 4 case is far past hand algebra.
Form the gain. K = R^{-1} B^{\mathsf{T}} P — here a row vector K = [\,k_1\ k_2\ k_3\ k_4\,], one number per state.
Feed back. Apply u = -Kx = -(k_1 p + k_2 \dot p + k_3 \theta + k_4 \dot\theta). The closed loop \dot{x} = (A - BK)x now has all four eigenvalues in the left half-plane, so a perturbed pole returns smoothly to upright.

We do not grind the 4\times 4 Riccati solution by hand — that is the computer's job — but the structure is everything from this stage: a quadratic cost, a Riccati matrix, a linear feedback gain, a stable closed loop. Conceptually K comes out with a sizeable k_3 (react hard to angle) and a smaller k_1 (nudge the cart back), and the pole balances.

Linearise about upright to get a controllable 4-state LTI model \dot{x} = Ax + Bu.
Pick Q \succeq 0 (heavy on angle) and R = \rho \succ 0; solve the ARE for P; set K = R^{-1} B^{\mathsf{T}} P.
The feedback u = -Kx stabilises the upright equilibrium; raising \rho trades a gentler, lower-effort controller against a slower recovery.

Catching the fall

Knock the pole off vertical and watch the LQR feedback catch it. The curve is the pole angle \theta(t) under u = -Kx — a damped oscillation decaying back to upright, \theta = 0. Slide the initial lean \theta_0 to perturb it, and slide the force weight R = \rho: a small \rho makes force cheap, so the gain is large and the pole snaps upright fast and tight; a large \rho charges heavily for force, so the controller is gentle and the pole sways back slowly. That is the aggressiveness-versus-effort dial of LQR, made physical.

Cart-pole is the rite of passage twice over. In control it is the first nonlinear, unstable plant every student stabilises — small enough to grasp, hard enough to matter. In reinforcement learning it is “CartPole”, the canonical first environment: an agent that knows nothing of A, B or a Riccati equation learns to balance the pole by trial and reward. The two cultures meet on the same little cart — LQR computes the optimal feedback from a model; RL discovers a policy without one, and on this linear problem it converges toward the very same linear feedback LQR hands you in closed form.