Data and Variables

A variable is any feature that can differ from one observation to the next — a person's height, their eye colour, the number of pets they own. The recorded values are the data. Before you average, plot, or summarise anything, you must know what kind of variable you are holding: the type decides which calculations even make sense.

Two families, four kinds

Variables split into two families. Categorical data records a label or a group; numerical data records a quantity you can do arithmetic with.

The point of classifying first: the average of eye-colours is meaningless. You can count how many people have brown eyes, but "the mean eye colour" is nonsense — there is no number there to add. Type comes before computation.

See the three shapes of data

A continuous variable lives on an unbroken line — every point is a possible value. A discrete variable lands only on separate, countable spots. A categorical variable is just a set of named markers with no numeric scale at all.