Random Forests

One deep tree overfits. But here's a beautiful fix: grow many trees, each on a slightly different slice of the data and features, and let them vote. That ensemble is a random forest, and it routinely outperforms any single tree.

Each tree overfits in its own random way, so their mistakes are scattered and largely independent. Average them and the noise cancels out while the shared, real signal survives — a wobbly committee that's collectively steady. The forest keeps the trees' flexibility but throws away most of their overfitting.

Jagged trees, smooth forest

Toggle between a single tree's boundary and the forest's. The lone tree's boundary is jagged, twitching around individual points. The forest's — the average of many such trees — is smooth and sensible, hugging the true trend instead of the noise.

The wisdom of crowds

This trick — combine many high-variance models to get one low-variance model — is called ensembling, and it's everywhere in machine learning. Random forests need little tuning, resist overfitting, and report which features matter, making them a favourite first choice on tabular data. (Their cousin gradient boosting builds the trees in sequence, each fixing the last's mistakes, and wins many competitions.) Ensembles close out the tree family — next we step back to study fitting and evaluation in general.