Grouped Data and the Estimated Mean

Sometimes data arrive already sorted into classes — ranges of values — rather than as a list of exact numbers. A survey might record that 7 people are 150–160 cm tall and 12 are 160–170 cm, without ever writing down a single individual height.

This grouped data hides the exact values, so we cannot find the mean precisely. Instead we estimate it: we treat every value in a class as if it sat at the class midpoint, the value halfway between the lower and upper boundary. The midpoint is our best single stand-in for the whole class.

Then the estimated mean is the familiar frequency-table mean, using midpoints as the values:

\text{estimated mean} = \frac{\sum (\text{midpoint} \times \text{frequency})}{\sum \text{frequency}}

Here is a worked example for heights, in centimetres, grouped into four classes:

Class (cm)	Midpoint x	Frequency f	x \times f
150–160	155	7	1085
160–170	165	12	1980
170–180	175	9	1575
180–190	185	2	370
Total		30	5010

The midpoint columns total \sum (x \times f) = 5010 and the frequencies total \sum f = 30, so the estimated mean is

\frac{5010}{30} = 167 \text{ cm.}

It is an estimate because nobody in the 150–160 class is known to be exactly 155 cm — the midpoint is just our even-handed guess for them all.

When data are grouped into classes, take each class's midpoint as the value for every item in that class.
The estimated mean is \dfrac{\sum (f \times \text{midpoint})}{\sum f}.
It is only an estimate: the true values within a class are unknown, so the midpoint stands in for them.
The modal class is the class with the highest frequency — the grouped-data equivalent of the mode.