What Is a Distribution?

A distribution describes how a variable's values are spread out — which values are common, which are rare, and roughly where the bulk of the data sits. It is the whole shape of the data at a glance, not just a single summary number.

You have already met one picture of a distribution: a histogram. Each bar's height tells you how many values fall in that bin. If we divide each count by the total, the bars show relative frequencies — the proportion of the data in each bin.

From bars to a smooth curve

Now imagine collecting more and more data and making the bins ever narrower. The jagged staircase of bars settles down toward a single smooth curve — the density curve of the distribution. It is the idealised shape the histogram is always reaching for.

Below, a fine-binned relative-frequency histogram of a fixed dataset is drawn behind its smooth density curve. Notice how the tops of the bars trace out the bell-shaped curve.

Area means proportion

Once the bars become a density curve, we read the data through area, not height. The area under the curve between two values is the proportion of the data that falls there — equivalently, the probability that a randomly chosen value lands in that range.

Because every value lands somewhere, the total area under the curve is 1:

\text{(area under the whole density curve)} = 1.

This is why the curve's height is called a density and not a count: a tall region means values are densely packed there, but it is the area — height times width — that turns into a proportion.

A distribution describes how a variable's values are spread — which are common, which rare.
As data grows and bins shrink, the relative-frequency histogram settles toward a smooth density curve.
Area under the curve over a range = the proportion (probability) of values in that range.
The total area under any density curve is 1.