Which question should a tree ask first? The one that best tidies up the classes. To
measure tidiness we use entropy — the amount of disorder in a group. A box that
is all one class is perfectly pure (entropy
Slide the class mix. Entropy is highest at a perfect 50/50 split — total uncertainty — and falls to zero as the group becomes all one class. A good split is one that produces pure, low-entropy children.
Information gain is how much entropy a split removes: the parent's entropy minus
the average entropy of the children it creates. The tree greedily chooses, at each node, the cut
with the highest information gain — the question that most reduces disorder. Repeat, and a tree
grows itself. (Some trees use a near-identical measure called Gini impurity; the spirit
is the same.) Next we'll see how this greediness, left unchecked, leads straight to