The Learning Rate

The learning rate \alpha sets how big a step gradient descent takes each round. It's the single most important dial to get right, and it's a Goldilocks problem:

Too small — the steps are tiny, so training crawls and may take forever to reach the bottom.
Too large — the steps overshoot the minimum, bouncing from wall to wall, and can even diverge, flying off to ever-worse costs.
Just right — brisk steps that settle smoothly into the minimum.

Too slow, too wild, just right

Switch between three learning rates and watch the path down the bowl. The tiny rate barely moves; the huge rate ricochets off the sides and climbs away from the answer; the middle one glides to the bottom. Same algorithm, same data — only the step size differs.

Finding a good one

There's no universal best value — it depends on the problem — so in practice you try a few (often powers of ten: 0.001, 0.01, 0.1) and watch the cost. A cost that falls steadily means a good rate; one that explodes means turn it down; one that barely moves means turn it up. Clever optimizers even adapt \alpha as they go — but the intuition you just built never changes.