Shape, Skew, and Outliers

A histogram shows the shape of a distribution, and shape is the last thing the summary numbers miss. The simplest distinction is symmetry.

The direction of skew is named after the tail, not the bulk of the data.

Skew pulls the mean, but not the median

This is the key behaviour to internalise. The mean is a balance point, so it follows the long tail — a few far-away values drag it toward them. The median is a position (the middle value), so it barely moves no matter how far the tail stretches. Hence, reliably:

See it: one outlier, two reactions

Six points sit in a tidy cluster; a seventh (the highlighted one) starts among them. Drag it far to the right to create an outlier — an extreme, isolated value. Watch the two markers: the mean chases the runaway point off to the right, while the median stays put at 5. By the time the point is way out, the data are clearly right-skewed and the mean sits well above the median.

This is why a single outlier — a typo, a billionaire in an income survey — can make the mean deeply misleading, and why the median is the safer summary for skewed data.