Grouped Data and the Estimated Mean

Sometimes data arrive already sorted into classes — ranges of values — rather than as a list of exact numbers. A survey might record that 7 people are 150160 cm tall and 12 are 160170 cm, without ever writing down a single individual height.

This grouped data hides the exact values, so we cannot find the mean precisely. Instead we estimate it: we treat every value in a class as if it sat at the class midpoint, the value halfway between the lower and upper boundary. The midpoint is our best single stand-in for the whole class.

Then the estimated mean is the familiar frequency-table mean, using midpoints as the values:

\text{estimated mean} = \frac{\sum (\text{midpoint} \times \text{frequency})}{\sum \text{frequency}}

Here is a worked example for heights, in centimetres, grouped into four classes:

Class (cm) Midpoint x Frequency f x \times f
150–16015571085
160–170165121980
170–18017591575
180–1901852370
Total305010

The midpoint columns total \sum (x \times f) = 5010 and the frequencies total \sum f = 30, so the estimated mean is

\frac{5010}{30} = 167 \text{ cm.}

It is an estimate because nobody in the 150160 class is known to be exactly 155 cm — the midpoint is just our even-handed guess for them all.