A sample is only useful if it looks like the population it came from. We call such a sample representative: its mix of values mirrors the whole. The danger is a sample that systematically over- or under-counts some part of the population — then the statistic we compute points in the wrong direction, no matter how carefully we measure.
The reliable way to get a representative sample is to choose it at random: give
every member of the population a fair, equal chance of being picked. Randomness doesn't
guarantee a perfect sample, but it removes any hidden tilt — on average the sample mean
Bias is a systematic error — a built-in tilt in how the sample is collected, so it consistently misses the target in the same direction. Two common culprits:
The crucial point: bias is not cured by collecting more data. A bigger biased sample just pins down the wrong answer more precisely. Only fixing the method — making the selection fair — removes it.
Switch between a random sample — highlighted points spread evenly across the
population — and a biased one that only draws from the high end. Watch the
sample mean