The Cart-Pole

The cart-pole is the "hello, world" of reinforcement learning and control. A pole is hinged on top of a cart that can only slide left or right along a track. Left alone, the pole topples. The task: keep it balanced upright by pushing the cart back and forth — like balancing a broom on your palm.

It is beloved because it is the simplest problem that is still genuinely hard: the system is unstable (do nothing and it falls), continuous, and the right action now depends on the pole's angle and how fast it's tipping.

States, actions, and reward

The cart-pole is captured by four numbers — its state:

At each instant the agent picks one of two actions — push left or right — and earns a reward of +1 for every timestep the pole stays up. An episode ends when the pole tips past a threshold angle or the cart runs off the track, so maximising total reward means balancing for as long as possible. That single, simple reward is enough for an agent to discover a balancing policy from scratch.

One problem, two philosophies

The cart-pole sits exactly on the border between two fields, and comparing how each solves it is illuminating:

If you know the cart's mass, the pole's length, and gravity, control theory hands you a provably optimal balancing law before the cart ever moves. If you don't — or the physics is too messy to write down — reinforcement learning can still learn to balance from experience alone, and the cart-pole is the standard bench where the two approaches are compared. It's a tiny gadget that captures the deepest question in autonomy: do you model the world, or learn from it?