Visualizing the Cost
The cost
J is a function of the parameters. Hold the bias fixed and plot
J against the weight w, and a beautiful
shape appears: a smooth bowl. Because the error is squared,
J is a parabola in w — one single lowest
point, the best weight.
This is the secret that makes linear regression so friendly. The cost surface has
no false bottoms to get stuck in — just one global minimum sitting at the
bottom of the bowl. Learning is simply the journey down to it.
Move along the bowl
Slide the weight and watch the dot ride the cost curve. The bottom of the bowl is the weight that
fits the data best; anywhere else, up the sides, the line is worse and the cost is higher. The
steepness of the wall beside the dot is a hint of which way to go — the idea the next page turns
into an algorithm.
From one knob to two (and to millions)
With both w and b free, the bowl
becomes a 3-D valley over the (w, b) plane — same idea, one more
dimension. Real models have millions of parameters, so the "bowl" lives in a million-dimensional
space no one can picture. But the principle never changes: find the lowest point. The tool for
that is gradient
descent.