Clustering

Step into the unsupervised world: no labels, just raw data. Clustering is the task of discovering the natural groups hiding in it — points that sit close together, separated from other clumps. The algorithm gets no answer key; it must find the structure on its own.

It's everywhere: grouping customers into segments, organising news into topics, spotting communities in a social network, compressing colours in an image. Whenever you want to ask "what natural categories are in this data?", clustering is the tool.

How many groups?

These points clearly fall into clumps — but how many? Choose the number of clusters k and the points colour by which group they join. Pick the k that matches the data's real structure; too few merges distinct groups, too many splits one group in half.

The catch: there's no right answer

Without labels, "correct" is genuinely ambiguous — different notions of similarity give different clusterings, and choosing k is often a judgement call. That freedom is the challenge of unsupervised learning. The most popular clustering method makes the idea concrete with a beautifully simple loop: k-means.