Probabilistic Models
Many models draw a line and answer "cat or dog." A probabilistic model answers
something richer: "85% cat, 15% dog." Instead of committing to a hard boundary, it models
the probability of each outcome — which lets it express uncertainty, combine
evidence sensibly, and update its beliefs as new data arrives. The engine underneath is one of the
most important formulas in all of science:
Bayes' theorem.
P(H \mid D) = \frac{P(D \mid H)\,P(H)}{P(D)}.
Read it as: the probability of a hypothesis H after seeing data
D (the posterior) is the
likelihood P(D\mid H) times your
prior belief P(H), divided by how likely the data was
overall. Evidence updates belief.
Why the prior matters: a "90% accurate" test
Probabilistic reasoning routinely overturns intuition. Suppose a disease affects
1\% of people, and a test is 90\% accurate
both ways. You test positive — how worried should you be? Most people guess "90%." Bayes disagrees.
Press Run:
const prevalence = 0.01; // P(disease): only 1% of people have it (the PRIOR)
const sensitivity = 0.9; // P(positive | disease)
const specificity = 0.9; // P(negative | no disease); so false-positive rate = 0.1
// Bayes: P(disease | positive) = P(pos|dis)P(dis) / P(pos)
const posTrue = sensitivity * prevalence; // true positives
const posFalse = (1 - specificity) * (1 - prevalence); // false positives
const posterior = posTrue / (posTrue + posFalse);
console.log("P(disease | positive test) =", (posterior * 100).toFixed(1) + "%");
console.log("So even after a positive test, it's more likely you DON'T have it.");
Only about 8\%! Because the disease is rare, the many
false positives from the healthy 99\% swamp the few true
positives. Ignoring the base rate (the prior) is one of the most common reasoning
errors there is — and a probabilistic model gets it right automatically.
From Bayes to a classifier
A Naive Bayes classifier turns this into a machine-learning model. To classify an
email as spam or not, it uses Bayes' theorem on the words it contains, with one bold simplifying
assumption: that the words are independent given the class. That's rarely quite
true — hence "naive" — yet it works remarkably well, and it's fast and needs little data. It's a
classic generative model: it models how the data is produced
(P(\text{words}\mid\text{class})), rather than drawing a boundary
directly like a discriminative model.
- Predict a probability for each outcome, not just a hard label — capturing
uncertainty.
- Built on Bayes' theorem: posterior ∝ likelihood × prior.
- Bayesian inference updates beliefs as data arrives; the prior
(base rate) genuinely matters.
The Bayesian view reframes all of learning as belief-updating: start with a prior over possible
models, and each piece of data multiplies in its likelihood to sharpen the posterior. Its great
gift is honest uncertainty — instead of a single answer, you get a whole
distribution, so the model can say "I'm not sure." That matters enormously wherever a confident
wrong answer is dangerous: medical diagnosis, self-driving cars, spam filtering, and A/B testing
all lean on it.
-
Don't ignore the prior / base rate. A very accurate test for a very rare
condition still yields mostly false alarms — the "90% test" trap above.
-
Naive Bayes' independence assumption is usually false; it's a useful
approximation, not the literal truth. And an unseen word can give a probability of exactly
0 — fixed with a little smoothing.