The Cart-Pole
The cart-pole is the "hello, world" of
reinforcement
learning and control. A pole is hinged on top of a cart that can only slide left or
right along a track. Left alone, the pole topples. The task: keep it balanced upright
by pushing the cart back and forth — like balancing a broom on your palm.
It is beloved because it is the simplest problem that is still genuinely hard: the system
is unstable (do nothing and it falls), continuous, and the right action now depends
on the pole's angle and how fast it's tipping.
States, actions, and reward
The cart-pole is captured by four numbers — its state:
- the cart's position x and
velocity \dot{x};
- the pole's angle \theta from vertical and its
angular velocity \dot{\theta}.
At each instant the agent picks one of two actions — push left or
right — and earns a reward of +1 for
every timestep the pole stays up. An episode ends when the pole tips past a threshold angle or the
cart runs off the track, so maximising total reward means balancing for as long as possible.
That single, simple reward is enough for an agent to discover a balancing policy from scratch.
- State: cart position & velocity, pole angle & angular velocity.
- Actions: push the cart left or right.
- Reward: +1 per timestep upright — so the goal is to
balance as long as possible.
One problem, two philosophies
The cart-pole sits exactly on the border between two fields, and comparing how each solves it is
illuminating:
-
Control theory writes down the physics, linearises it near "upright", and derives
a controller — the
inverted-pendulum
LQR gives an exact, optimal feedback law if you know the equations.
-
Reinforcement learning throws the equations away and learns a policy
purely from the +1 rewards, by trial and error — no model of the physics
required.
If you know the cart's mass, the pole's length, and gravity, control theory hands you a
provably optimal balancing law before the cart ever moves. If you don't — or the physics
is too messy to write down — reinforcement learning can still learn to balance from experience
alone, and the cart-pole is the standard bench where the two approaches are compared. It's a tiny
gadget that captures the deepest question in autonomy: do you model the world, or learn from it?
-
The pole's angle alone isn't enough to act well — you also need its
angular velocity. A pole that is upright but tipping fast needs a very different push
from one that is upright and still.
-
The system is unstable: doing nothing is not a safe default. Balance is an
active, constant correction, not a resting state.