Decision Trees

A decision tree classifies by asking a sequence of simple yes/no questions, like a flowchart: "Is feature 1 above 2? If yes, is feature 2 below 0?" Each question splits the data, and you follow the branches down to a leaf that gives the prediction.

Every question is a single axis-aligned cut — a straight slice across one feature — so the tree carves feature space into rectangular boxes, each labelled with a class. No dot products, no gradients; just "which box are you in?"

Carve up the space

Slide the two cuts: a vertical split on feature 1 and a horizontal split on feature 2. They divide the plane into four boxes, each coloured by the majority class of the points inside it. Find cuts that put each class neatly in its own box — that's what training a tree does, automatically.

Why trees are loved

Decision trees are interpretable — you can read the rules straight off the flowchart — and they happily mix numbers with categories, need no feature scaling, and capture non-linear, boxy boundaries a single line never could. The catch: greedily adding cuts lets a tree memorise the training data. To choose good cuts and know when to stop, we need a way to score a split — entropy and information gain.