The p-Value

A hypothesis test leaves us with a vague feeling that the data are "surprising" or "not surprising" under H_0. The p-value turns that feeling into a number. It is the probability — computed as if H_0 were true — of getting data at least as extreme as what we actually saw:

p = \mathbb{P}\bigl(\text{data this extreme, or more} \mid H_0\bigr).

Everything hangs on the conditioning bar: we live entirely inside the world where H_0 holds, and ask how often that world would throw up a result as unusual as ours.

Small p, big surprise

Read the size of p directly as surprise:

Geometrically, p is an area in the tail of the null distribution: the total probability of outcomes at least as far from the centre as the one we observed.

The p-value is the shaded tail

Here is the null sampling distribution with an observed statistic marked at z = 1.6. The shaded region beyond it is the p-value: the chance, under H_0, of landing at least that far out. The smaller that sliver of area, the more the data strain against H_0.

(This picture shades only the upper tail, a one-sided p-value. For a two-sided H_1: \mu \ne \mu_0 we would shade both tails, since "more extreme" runs in either direction.)

The misreading that will not die

The p-value is not the probability that H_0 is true. It is computed assuming H_0 — so it can say nothing about H_0's own probability. The conditioning runs \mathbb{P}(\text{data}\mid H_0), never \mathbb{P}(H_0\mid\text{data}); swapping the two is exactly the error Bayes' theorem warns about. A p of 0.03 does not mean "a 3% chance H_0 is true".