k-Means
k-means finds k clusters with a loop so simple it's
almost magic. Drop k centre points anywhere, then repeat two steps
until nothing moves:
- Assign. Colour each data point by its
nearest
centre.
- Update. Move each centre to the average position of the points assigned to
it.
Assign, update, assign, update — each pass tightens the clusters, and within a handful of rounds
the centres settle right into the heart of each clump.
Watch the centres settle
The two centres start in poor positions. Step the algorithm: points recolour to their nearest
centre, then each centre slides to the middle of its colour. Repeat, and the centres glide into
the two clusters and lock in place — the moment nothing changes, k-means has converged.
Simple, fast, imperfect
k-means is quick and scales to huge datasets, which is why it's the go-to clusterer. Its quirks:
you must choose k up front, the result depends on where the centres
start (so it's run several times and the best kept), and it assumes roughly round, similar-sized
blobs — it struggles with long or tangled shapes. Even so, it's the first clustering tool to
reach for, and a perfect example of how a trivial loop can uncover real structure.