Population and Sample

The population is the whole collection we want to know about — every voter, every light bulb on the line, every tree in the forest. Usually it is far too large (or too expensive, or too destructive) to measure in full. So we measure a sample: the part we actually get our hands on.

Almost all of statistics lives in this gap. We can see only the sample, yet the questions we care about are about the population. The whole craft is using the part to reason about the whole.

Parameters vs statistics

A number that describes the population is a parameter. It is fixed but unknown — a single true value sitting out there that we never get to read directly. The population mean and standard deviation get Greek letters:

\mu = \text{population mean}, \qquad \sigma = \text{population standard deviation}.

A number we compute from the sample is a statistic. It is knowable — we have the data — but it varies: take a different sample and you get a different value. The sample mean and standard deviation get Roman letters:

\bar{x} = \text{sample mean}, \qquad s = \text{sample standard deviation}.

The link between them is the entire point: we use the statistic \bar{x} as our best estimate of the parameter \mu. The parameter is the target; the statistic is the arrow.

One population, many samples

Below is a fixed population of points with its true mean \mu marked — a value you would normally never know. Slide to pick a different sample (the highlighted points). Each sample has its own mean \bar{x}, and you can watch that line jitter around \mu as the sample changes. The statistic chases the parameter, but it never lands exactly on it.

The population is everything; a sample is the part we measure.
A parameter (\mu, \sigma) describes the population — fixed but unknown.
A statistic (\bar{x}, s) is computed from the sample — known, but it varies sample to sample.
We use the statistic to estimate the parameter: \bar{x} estimates \mu.