The Feature Vector

Bundle an example's features together, in a fixed order, and you have a vector — the feature vector. A house described by (\text{area}, \text{bedrooms}, \text{age}) becomes

\vec{x} = \begin{bmatrix} 90 \\ 3 \\ 12 \end{bmatrix}.

This is the quiet bridge between machine learning and linear algebra: every example is a point in feature space, and the whole dataset is a cloud of such points. Once data is vectors, all the machinery — dot products, distances, matrices — is ready to use.

An example is a point

With two features we can actually see it: each example is a dot in the plane, and equally an arrow from the origin. Slide the two features and watch the point move through feature space. Real datasets have hundreds of features — hundreds of dimensions — but the idea is identical; we simply lose the ability to draw it.

A third feature — into 3D

Add a third feature and the example becomes a point in three-dimensional feature space. Drag the box to rotate it, and slide the three features to move the point; the faint dots are other examples — the dataset is a cloud of points. Past three features we can no longer draw it, yet nothing in the maths changes: distances and dot products work exactly the same in 3D, 100D, or 10,000D.

Why vectors, not just lists

Calling it a vector — not merely a list — is the point. It lets a model measure things: the distance between two examples (are they similar?), and the dot product of an example with a set of weights (its score). Those two operations — distance and dot product — are the seeds of nearly every algorithm ahead.