Multiclass Classification
Most real problems have more than two classes — digits 0–9, dozens of animal species, hundreds of
product categories. Two clean strategies extend our two-class tools to many.
- One-vs-rest. Train one binary classifier per class ("class 3 vs everything
else"). To predict, run them all and pick the one most confident it's a match.
- Softmax. Generalize the
sigmoid
to output a whole probability distribution over the classes at once — one number per class,
all positive and summing to 1.
Three classes, three regions
Each class has a representative point (a centroid). The query is assigned to whichever class it's
closest to, carving the plane into three regions. Drag the query across a border
and watch its predicted class switch — the multiclass version of a decision boundary.
Softmax, the standard choice
Softmax takes the raw scores z_1, \dots, z_C and turns them into
probabilities p_c = \dfrac{e^{z_c}}{\sum_j e^{z_j}}. The exponentials
make every score positive and the division makes them sum to one, so the output reads directly as
"how likely is each class." Paired with
cross-entropy,
it's the standard final layer of almost every classification
neural
network — the last stop of Stage C.