Dimensionality Reduction

Real data can have hundreds or thousands of features — far too many to visualise, slow to compute with, and riddled with redundancy (height in centimetres and in inches carry the same information twice). Dimensionality reduction squeezes the data down to a few features while keeping as much of its real structure as possible.

The key idea is projection: cast the high-dimensional points down onto a lower-dimensional surface — like a 3-D object throwing a 2-D shadow. Choose the surface well and the shadow keeps the data's important shape; choose badly and you flatten away what mattered.

Casting a 2-D cloud onto a line

Here a 2-D cloud is projected down to a 1-D line — two features compressed into one. Rotate the line and watch the points' shadows land on it. Some angles keep the points nicely spread out; others squash them on top of each other, destroying the structure. The spread retained is the readout to maximise.

Keep the spread

A good projection keeps the points as spread out as possible — that spread, the variance, is what carries the information. The line that retains the most variance is the best 1-D summary of the data, and finding it for any number of dimensions is exactly what principal component analysis does — using the eigenvectors you met in linear algebra.