Gradient Descent

Gradient descent is the algorithm that finds the bottom of the bowl. Imagine a ball on the cost surface: it rolls downhill. The downhill direction is given by the slope of the cost — its derivative, the gradient. So we just step the parameters in the opposite direction of the gradient, again and again:

w \leftarrow w - \alpha\,\frac{\partial J}{\partial w}.

The gradient points uphill; the minus sign turns us around to go down. The number \alpha is the learning rate — how big a step to take. Repeat until the slope flattens out: you've reached the minimum.

Roll to the bottom

Step the algorithm forward. The dot starts high on the wall, reads the slope under its feet, and takes a step downhill — bigger steps where the bowl is steep, smaller as it nears the flat bottom. Watch it settle into the minimum, where the slope is zero and learning naturally stops.

The engine of all of machine learning

This one idea — follow the slope downhill — trains almost everything: regression, logistic regression, and every neural network ever built. In higher dimensions the gradient is just the slope in every parameter direction at once, and the step adjusts them all together. It is, without much exaggeration, the algorithm behind the modern AI boom.