Visualizing the Cost

The cost J is a function of the parameters. Hold the bias fixed and plot J against the weight w, and a beautiful shape appears: a smooth bowl. Because the error is squared, J is a parabola in w — one single lowest point, the best weight.

This is the secret that makes linear regression so friendly. The cost surface has no false bottoms to get stuck in — just one global minimum sitting at the bottom of the bowl. Learning is simply the journey down to it.

Move along the bowl

Slide the weight and watch the dot ride the cost curve. The bottom of the bowl is the weight that fits the data best; anywhere else, up the sides, the line is worse and the cost is higher. The steepness of the wall beside the dot is a hint of which way to go — the idea the next page turns into an algorithm.

From one knob to two (and to millions)

With both w and b free, the bowl becomes a 3-D valley over the (w, b) plane — same idea, one more dimension. Real models have millions of parameters, so the "bowl" lives in a million-dimensional space no one can picture. But the principle never changes: find the lowest point. The tool for that is gradient descent.