The Cart-Pole

The cart-pole is the "hello, world" of reinforcement learning and control. A pole is hinged on top of a cart that can only slide left or right along a track. Left alone, the pole topples. The task: keep it balanced upright by pushing the cart back and forth — like balancing a broom on your palm.

It is beloved because it is the simplest problem that is still genuinely hard: the system is unstable (do nothing and it falls), continuous, and the right action now depends on the pole's angle and how fast it's tipping.

States, actions, and reward

The cart-pole is captured by four numbers — its state:

the cart's position x and velocity \dot{x};
the pole's angle \theta from vertical and its angular velocity \dot{\theta}.

At each instant the agent picks one of two actions — push left or right — and earns a reward of +1 for every timestep the pole stays up. An episode ends when the pole tips past a threshold angle or the cart runs off the track, so maximising total reward means balancing for as long as possible. That single, simple reward is enough for an agent to discover a balancing policy from scratch.

State: cart position & velocity, pole angle & angular velocity.
Actions: push the cart left or right.
Reward: +1 per timestep upright — so the goal is to balance as long as possible.

One problem, two philosophies

The cart-pole sits exactly on the border between two fields, and comparing how each solves it is illuminating:

Control theory writes down the physics, linearises it near "upright", and derives a controller — the inverted-pendulum LQR gives an exact, optimal feedback law if you know the equations.
Reinforcement learning throws the equations away and learns a policy purely from the +1 rewards, by trial and error — no model of the physics required.

If you know the cart's mass, the pole's length, and gravity, control theory hands you a provably optimal balancing law before the cart ever moves. If you don't — or the physics is too messy to write down — reinforcement learning can still learn to balance from experience alone, and the cart-pole is the standard bench where the two approaches are compared. It's a tiny gadget that captures the deepest question in autonomy: do you model the world, or learn from it?

The pole's angle alone isn't enough to act well — you also need its angular velocity. A pole that is upright but tipping fast needs a very different push from one that is upright and still.
The system is unstable: doing nothing is not a safe default. Balance is an active, constant correction, not a resting state.