Training a Network

Now assemble the whole machine. Training a neural network is the training loop you met at the very start, with each piece now filled in:

  1. Forward — run the batch through the network to get predictions.
  2. Loss — score them against the true labels (cross-entropy or squared error).
  3. Backward — backpropagate to get every weight's gradient.
  4. Update — nudge every weight downhill by the learning rate.

Repeat for many epochs, and the network slowly sculpts itself to fit the data.

Watch a boundary form

These two classes spiral together — no straight line can separate them. Step through training and watch the network bend its decision boundary into a curve that wraps around the data, while the loss falls. A linear model could never do this; the hidden layers' non-linearity is what makes the curved boundary possible.

The craft of training

In practice, training is part science, part art: choosing the learning rate, the batch size, the network shape, the regularization, and watching the validation loss to stop before overfitting. But the core is always those four steps on repeat. Everything from a tiny classifier to a giant language model is trained by exactly this loop — just with more data, more weights, and more patience.