"Accuracy" — the fraction of predictions that are correct — is the obvious score, but it can lie. On a dataset that's 99% "not spam", a model that always says "not spam" scores 99% accuracy while catching zero spam. For imbalanced problems we need sharper tools, built from the confusion matrix: counts of true/false positives and negatives.
Each example has a true label and a model score. Slide the decision threshold: raise it and the model flags fewer positives — precision rises but recall falls; lower it and the reverse happens. The confusion-matrix counts and the three metrics update live. There's no free lunch — you trade precision against recall.
The right metric depends on the cost of each mistake. For cancer screening, a missed case (false negative) is disastrous, so you favour recall. For a spam filter, junking a real email (false positive) is annoying, so you favour precision. The F1 score — the harmonic mean of the two — balances them when both matter. Never trust accuracy alone on imbalanced data. With evaluation in hand, we're ready for the most powerful models of all: neural networks.