Accuracy, Precision, Recall

"Accuracy" — the fraction of predictions that are correct — is the obvious score, but it can lie. On a dataset that's 99% "not spam", a model that always says "not spam" scores 99% accuracy while catching zero spam. For imbalanced problems we need sharper tools, built from the confusion matrix: counts of true/false positives and negatives.

Precision = \frac{TP}{TP + FP} — of everything flagged positive, how much really was? (Avoids crying wolf.)
Recall = \frac{TP}{TP + FN} — of all the real positives, how many did we catch? (Avoids missing them.)

Move the threshold

Each example has a true label and a model score. Slide the decision threshold: raise it and the model flags fewer positives — precision rises but recall falls; lower it and the reverse happens. The confusion-matrix counts and the three metrics update live. There's no free lunch — you trade precision against recall.

Pick the metric that matters

The right metric depends on the cost of each mistake. For cancer screening, a missed case (false negative) is disastrous, so you favour recall. For a spam filter, junking a real email (false positive) is annoying, so you favour precision. The F1 score — the harmonic mean of the two — balances them when both matter. Never trust accuracy alone on imbalanced data. With evaluation in hand, we're ready for the most powerful models of all: neural networks.