Multiclass Classification

Most real problems have more than two classes — digits 0–9, dozens of animal species, hundreds of product categories. Two clean strategies extend our two-class tools to many.

One-vs-rest. Train one binary classifier per class ("class 3 vs everything else"). To predict, run them all and pick the one most confident it's a match.
Softmax. Generalize the sigmoid to output a whole probability distribution over the classes at once — one number per class, all positive and summing to 1.

Three classes, three regions

Each class has a representative point (a centroid). The query is assigned to whichever class it's closest to, carving the plane into three regions. Drag the query across a border and watch its predicted class switch — the multiclass version of a decision boundary.

Softmax, the standard choice

Softmax takes the raw scores z_1, \dots, z_C and turns them into probabilities p_c = \dfrac{e^{z_c}}{\sum_j e^{z_j}}. The exponentials make every score positive and the division makes them sum to one, so the output reads directly as "how likely is each class." Paired with cross-entropy, it's the standard final layer of almost every classification neural network — the last stop of Stage C.