The Inverted Pendulum

Time to spend everything we have built. The inverted pendulum on a cart — balance a pole upright by sliding the cart beneath it — is the classic benchmark of control, unstable and irresistibly visual. We will run the full LQR pipeline on it: model, check, weight, solve, feed back. This is the stage finale, the moment the feedback law meets a real machine.

The linearised model

The cart slides horizontally; the pole hinges on it. Four numbers capture the state — cart position and velocity, pole angle and angular rate — so the state-space model lives in \mathbb{R}^4:

x = \begin{bmatrix} p \\ \dot{p} \\ \theta \\ \dot{\theta} \end{bmatrix} = \begin{bmatrix} \text{cart position} \\ \text{cart velocity} \\ \text{pole angle} \\ \text{pole angular velocity} \end{bmatrix}, \qquad u = \text{force on cart}.

The true dynamics are nonlinear (a \sin\theta from gravity), but balance is about the upright equilibrium \theta = 0. Linearising there — the very “linearise the dynamics” step that makes LQ the universal local model — replaces \sin\theta \approx \theta and gives a linear time-invariant system \dot{x} = Ax + Bu. With cart mass M, pole mass m, pole half-length \ell and gravity g, the standard structure is

A = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & -\dfrac{mg}{M} & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & \dfrac{(M+m)g}{M\ell} & 0 \end{bmatrix}, \qquad B = \begin{bmatrix} 0 \\ \dfrac{1}{M} \\ 0 \\ -\dfrac{1}{M\ell} \end{bmatrix}.

The telltale entry is (M+m)g/(M\ell) > 0 in the angular-acceleration row: a small lean produces an acceleration that increases the lean. That positive feedback is an eigenvalue of A in the right half-plane — the open-loop pole falls. Our job is to move that eigenvalue into the left half-plane with feedback.

Is it controllable?

Before optimising, check that control can reach every mode — otherwise no gain can stabilise the fall. The test is controllability: the matrix

\mathcal{C} = \begin{bmatrix} B & AB & A^2 B & A^3 B \end{bmatrix}

must have full rank 4. For the cart-pole it does — pushing the cart couples through to the pole angle, so a single horizontal force can influence all four states. The system is controllable, which is exactly the condition that guarantees the Algebraic Riccati Equation has a unique stabilising solution. We are cleared to apply LQR.

Running the LQR pipeline

Now the recipe, end to end.

We do not grind the 4\times 4 Riccati solution by hand — that is the computer's job — but the structure is everything from this stage: a quadratic cost, a Riccati matrix, a linear feedback gain, a stable closed loop. Conceptually K comes out with a sizeable k_3 (react hard to angle) and a smaller k_1 (nudge the cart back), and the pole balances.

Catching the fall

Knock the pole off vertical and watch the LQR feedback catch it. The curve is the pole angle \theta(t) under u = -Kx — a damped oscillation decaying back to upright, \theta = 0. Slide the initial lean \theta_0 to perturb it, and slide the force weight R = \rho: a small \rho makes force cheap, so the gain is large and the pole snaps upright fast and tight; a large \rho charges heavily for force, so the controller is gentle and the pole sways back slowly. That is the aggressiveness-versus-effort dial of LQR, made physical.

Cart-pole is the rite of passage twice over. In control it is the first nonlinear, unstable plant every student stabilises — small enough to grasp, hard enough to matter. In reinforcement learning it is “CartPole”, the canonical first environment: an agent that knows nothing of A, B or a Riccati equation learns to balance the pole by trial and reward. The two cultures meet on the same little cart — LQR computes the optimal feedback from a model; RL discovers a policy without one, and on this linear problem it converges toward the very same linear feedback LQR hands you in closed form.