Gradient descent is the algorithm that finds the bottom of the bowl. Imagine a
ball on the cost surface: it rolls downhill. The downhill direction is given by the
The gradient points uphill; the minus sign turns us around to go down. The number
Step the algorithm forward. The dot starts high on the wall, reads the slope under its feet, and takes a step downhill — bigger steps where the bowl is steep, smaller as it nears the flat bottom. Watch it settle into the minimum, where the slope is zero and learning naturally stops.
This one idea — follow the slope downhill — trains almost everything: regression, logistic
regression, and every