Shape, Skew, and Outliers
A histogram shows
the shape of a distribution, and shape is the last thing the summary numbers
miss. The simplest distinction is symmetry.
- Symmetric — the two sides mirror each other; the
mean and the
median sit
together in the middle.
- Right-skewed (positively skewed) — a long tail stretches to the right.
- Left-skewed (negatively skewed) — a long tail stretches to the left.
The direction of skew is named after the tail, not the bulk of the data.
Skew pulls the mean, but not the median
This is the key behaviour to internalise. The mean is a balance point, so it
follows the long tail — a few far-away values drag it toward them. The median is a
position (the middle value), so it barely moves no matter how far the tail stretches.
Hence, reliably:
- Right-skew: \text{mean} > \text{median}.
- Left-skew: \text{mean} < \text{median}.
- Symmetric: \text{mean} \approx \text{median}.
See it: one outlier, two reactions
Six points sit in a tidy cluster; a seventh (the highlighted one) starts among them. Drag it
far to the right to create an outlier — an extreme, isolated value. Watch the
two markers: the mean chases the runaway point off to the right, while the
median stays put at 5. By the time the point is way
out, the data are clearly right-skewed and the mean sits well above the median.
This is why a single outlier — a typo, a billionaire in an income survey — can make the mean
deeply misleading, and why the median is the safer summary for skewed data.
- Distributions are symmetric or skewed; skew is named after the long tail's direction.
- The mean is pulled toward the tail; the median holds. So right-skew gives \text{mean} > \text{median}, left-skew \text{mean} < \text{median}.
- An outlier shifts the mean a lot but the median barely at all.
- For skewed data the median is the more representative centre.