Left to grow without limit, a decision tree keeps adding cuts until every training point
sits in its own tiny pure box. At that point it has 100% training accuracy — and has learned
nothing but the noise. The boundary becomes a jagged tangle that wraps around individual points,
the textbook face of
The cure is to limit the tree's depth (or prune it back after growing): stop splitting once a node is good enough. A shallow tree captures the real trend; a deep one memorises the accidents.
Increase the depth and watch the boundary bend itself around the noisy points. The training accuracy climbs toward 100%, but the test accuracy — on points the tree never saw — peaks at a modest depth and then falls. That gap between training and test is overfitting, made visible.
The best depth is the one where test accuracy peaks — complex enough to catch the
pattern, simple enough to ignore the noise. You find it by checking performance on held-out data,
not training data. This is one instance of a universal theme, the