Generalization

The entire goal of machine learning is generalization: doing well on data the model has never seen, not just the examples it trained on. A model that nails its training data but flops on new data has learned nothing useful — it merely memorized.

The villain is overfitting: a model so flexible it bends to fit every wiggle and noise spike in the training data, capturing accidents instead of the real pattern. It looks brilliant on the training set and falls apart on the test set.

Memorize, or understand

Both curves below pass close to the training dots. The wild overfit curve threads every point exactly — but lurches wildly between them and badly misses the held-out test points. The simple straight fit ignores the noise and predicts the test points well. Toggle and watch the test error tell the real story.

Simpler is usually wiser

Good generalization is a balancing act: complex enough to capture the true pattern, simple enough to ignore the noise. That tension has a name — the bias–variance tradeoff — and managing it, with tools like regularization, is the central craft of the field. With the foundations in place, we're ready to build our first real model.