Correlation

A scatter shows a relationship; the correlation coefficient r puts a single number on it. It lives in

-1 \le r \le 1,

and measures the strength and direction of the linear relationship between two variables.

So r = 1 and r = -1 are perfect straight lines (up and down); r = 0 is no linear trend at all.

Where the number comes from

Standardise each variable into z-scores — subtract the mean and divide by the standard deviation — so both axes are measured in the same unitless scale. Then r is simply the average product of the paired z-scores:

r = \frac{1}{n}\sum_{i=1}^{n} z_{x_i}\, z_{y_i}, \qquad z_{x_i} = \frac{x_i - \bar x}{s_x},\quad z_{y_i} = \frac{y_i - \bar y}{s_y}.

Read the sign off the products: a point that is above average in both x and y contributes (+)(+) > 0; below average in both gives (-)(-) > 0. Points that match this way push r up; points that disagree (high x, low y) pull it down. When agreements and disagreements cancel, r \approx 0.

Loosen the cloud

Each dot starts on the perfect line y = x and is then nudged off it by a fixed amount times the noise dial. With no noise the points are collinear and r = 1; as you add noise the cloud fattens and r slides toward 0. The live readout recomputes r from the points on screen.

Two warnings

First, r only sees straight-line structure. A relationship can be strong yet curved — a perfect parabola — and still give r \approx 0, because the rising and falling halves cancel. So r = 0 means "no linear link", not "no link".

Second, like any association, a high |r| is not causation. A tight correlation can be driven entirely by a lurking variable, or be pure coincidence.