Real data can have hundreds or thousands of
The key idea is projection: cast the high-dimensional points down onto a lower-dimensional surface — like a 3-D object throwing a 2-D shadow. Choose the surface well and the shadow keeps the data's important shape; choose badly and you flatten away what mattered.
Here a 2-D cloud is projected down to a 1-D line — two features compressed into one. Rotate the line and watch the points' shadows land on it. Some angles keep the points nicely spread out; others squash them on top of each other, destroying the structure. The spread retained is the readout to maximise.
A good projection keeps the points as spread out as possible — that spread, the
variance, is what carries the information. The line that retains the
most variance is the best 1-D summary of the data, and finding it for any number of
dimensions is exactly what