Features and Labels

Data for a supervised model comes in two parts. The features are the inputs — the measurable facts you know about each example. The label is the answer you want to predict.

To predict a house price, the features might be its floor area, number of bedrooms and age; the label is the price. To filter email, the features are word counts and sender info; the label is "spam" or "not spam". The model's job is to learn the mapping features → label.

Read off an example

Each dot is one example, placed by its two features (feature 1 across, feature 2 up) and coloured by its label. Step through them and the readout names the chosen example's features and its label. Everything a model sees about the world arrives in exactly this form.

Good features are half the battle

A model is only as good as the features it's fed. Choosing and shaping them — feature engineering — is often where the real skill lies: the right feature can make a hard problem easy, while a missing one makes it impossible. Next we'll bundle an example's features into a single tidy object — a feature vector — so the maths can get to work.