Overfitting a Tree

Left to grow without limit, a decision tree keeps adding cuts until every training point sits in its own tiny pure box. At that point it has 100% training accuracy — and has learned nothing but the noise. The boundary becomes a jagged tangle that wraps around individual points, the textbook face of overfitting.

The cure is to limit the tree's depth (or prune it back after growing): stop splitting once a node is good enough. A shallow tree captures the real trend; a deep one memorises the accidents.

Deeper isn't better

Increase the depth and watch the boundary bend itself around the noisy points. The training accuracy climbs toward 100%, but the test accuracy — on points the tree never saw — peaks at a modest depth and then falls. That gap between training and test is overfitting, made visible.

The sweet spot

The best depth is the one where test accuracy peaks — complex enough to catch the pattern, simple enough to ignore the noise. You find it by checking performance on held-out data, not training data. This is one instance of a universal theme, the bias–variance tradeoff — and one clever way to dodge it is to grow many shallow-ish trees and average them, the idea of random forests.