Probabilistic Models

Many models draw a line and answer "cat or dog." A probabilistic model answers something richer: "85% cat, 15% dog." Instead of committing to a hard boundary, it models the probability of each outcome — which lets it express uncertainty, combine evidence sensibly, and update its beliefs as new data arrives. The engine underneath is one of the most important formulas in all of science: Bayes' theorem.

P(H \mid D) = \frac{P(D \mid H)\,P(H)}{P(D)}.

Read it as: the probability of a hypothesis H after seeing data D (the posterior) is the likelihood P(D\mid H) times your prior belief P(H), divided by how likely the data was overall. Evidence updates belief.

Why the prior matters: a "90% accurate" test

Probabilistic reasoning routinely overturns intuition. Suppose a disease affects 1\% of people, and a test is 90\% accurate both ways. You test positive — how worried should you be? Most people guess "90%." Bayes disagrees. Press Run:

const prevalence = 0.01; // P(disease): only 1% of people have it (the PRIOR) const sensitivity = 0.9; // P(positive | disease) const specificity = 0.9; // P(negative | no disease); so false-positive rate = 0.1 // Bayes: P(disease | positive) = P(pos|dis)P(dis) / P(pos) const posTrue = sensitivity * prevalence; // true positives const posFalse = (1 - specificity) * (1 - prevalence); // false positives const posterior = posTrue / (posTrue + posFalse); console.log("P(disease | positive test) =", (posterior * 100).toFixed(1) + "%"); console.log("So even after a positive test, it's more likely you DON'T have it.");

Only about 8\%! Because the disease is rare, the many false positives from the healthy 99\% swamp the few true positives. Ignoring the base rate (the prior) is one of the most common reasoning errors there is — and a probabilistic model gets it right automatically.

From Bayes to a classifier

A Naive Bayes classifier turns this into a machine-learning model. To classify an email as spam or not, it uses Bayes' theorem on the words it contains, with one bold simplifying assumption: that the words are independent given the class. That's rarely quite true — hence "naive" — yet it works remarkably well, and it's fast and needs little data. It's a classic generative model: it models how the data is produced (P(\text{words}\mid\text{class})), rather than drawing a boundary directly like a discriminative model.

Predict a probability for each outcome, not just a hard label — capturing uncertainty.
Built on Bayes' theorem: posterior ∝ likelihood × prior.
Bayesian inference updates beliefs as data arrives; the prior (base rate) genuinely matters.

The Bayesian view reframes all of learning as belief-updating: start with a prior over possible models, and each piece of data multiplies in its likelihood to sharpen the posterior. Its great gift is honest uncertainty — instead of a single answer, you get a whole distribution, so the model can say "I'm not sure." That matters enormously wherever a confident wrong answer is dangerous: medical diagnosis, self-driving cars, spam filtering, and A/B testing all lean on it.

Don't ignore the prior / base rate. A very accurate test for a very rare condition still yields mostly false alarms — the "90% test" trap above.
Naive Bayes' independence assumption is usually false; it's a useful approximation, not the literal truth. And an unseen word can give a probability of exactly 0 — fixed with a little smoothing.