Sampling and Bias

A sample is only useful if it looks like the population it came from. We call such a sample representative: its mix of values mirrors the whole. The danger is a sample that systematically over- or under-counts some part of the population — then the statistic we compute points in the wrong direction, no matter how carefully we measure.

Randomness is the safeguard

The reliable way to get a representative sample is to choose it at random: give every member of the population a fair, equal chance of being picked. Randomness doesn't guarantee a perfect sample, but it removes any hidden tilt — on average the sample mean \bar{x} sits at the population mean \mu, and the only error left is ordinary chance, which shrinks as the sample grows.

Bias: a tilt no sample size can fix

Bias is a systematic error — a built-in tilt in how the sample is collected, so it consistently misses the target in the same direction. Two common culprits:

The crucial point: bias is not cured by collecting more data. A bigger biased sample just pins down the wrong answer more precisely. Only fixing the method — making the selection fair — removes it.

Random vs biased, side by side

Switch between a random sample — highlighted points spread evenly across the population — and a biased one that only draws from the high end. Watch the sample mean \bar{x}: random keeps it near the true mean \mu; the biased sample drags it well off-target, and gathering more of the same biased points would never bring it back.