The Learning Rate

The learning rate \alpha sets how big a step gradient descent takes each round. It's the single most important dial to get right, and it's a Goldilocks problem:

Too slow, too wild, just right

Switch between three learning rates and watch the path down the bowl. The tiny rate barely moves; the huge rate ricochets off the sides and climbs away from the answer; the middle one glides to the bottom. Same algorithm, same data — only the step size differs.

Finding a good one

There's no universal best value — it depends on the problem — so in practice you try a few (often powers of ten: 0.001, 0.01, 0.1) and watch the cost. A cost that falls steadily means a good rate; one that explodes means turn it down; one that barely moves means turn it up. Clever optimizers even adapt \alpha as they go — but the intuition you just built never changes.