Trees and Spanning Trees

Of all the shapes a graph can take, one is so clean, so useful, and so common that it earns its own name: the tree. A tree is a graph with no wasted edges and no way to go round in a loop — just enough connections to hold everything together, and not one more. Family trees, file systems, the branching of a river, the wiring plan that reaches every house with the least cable — all of these are trees.

Formally, a tree is a graph that is connected (you can get from any vertex to any other) and acyclic (it contains no cycle — no way to leave a vertex, wander around, and return without retracing an edge). Those two words, connected and acyclic, pull in opposite directions: connectivity wants more edges, acyclicity wants fewer. A tree is the exact point where they balance.

The one theorem to remember

Everything about trees flows from a single counting fact. Watch this tree grow: begin with one lone vertex, then add vertices one by one. To keep the graph connected, each new vertex must reach the rest — and the cheapest way is a single edge to something already there. Adding two edges would create a cycle; adding none would leave it stranded. So every new vertex costs exactly one edge.

A tree on n vertices has exactly n-1 edges.

Turn it around and it is just as handy: a tree with e edges must have e+1 vertices. This n-1 is the fingerprint of a tree — too few edges and the graph falls into disconnected pieces; too many and a cycle must appear.

Four ways to say "tree"

The definition above is only one of several equivalent portraits. For a graph on n vertices, any of the following statements implies all the others — they all pin down exactly the trees:

That last one is the soul of a tree: exactly one route between any two points. It is why a family tree never loops and why routing on a tree is unambiguous. A tree with at least two vertices also always has at least two leaves — vertices touched by a single edge — the dangling tips where a branch ends.

A tempting mistake: "a tree is so sparse, surely I can drop in any extra edge and still have a tree." Not so. A tree already has a unique path between every pair of vertices. Add an edge between two vertices u and v, and that new edge plus the path that already joined them forms a loop — you have created exactly one cycle, and the result is no longer a tree (it now has n edges, one too many).

The flip side is just as sharp: remove any edge from a tree and it splits into two disconnected pieces, because that edge was the only route across. A tree is perfectly poised — every edge is a bridge, and every non-edge would be a cycle waiting to happen.

Spanning trees: the skeleton of a graph

Now take a graph that has more than enough edges — a tangle of connections with plenty of cycles. Often we want to keep everything connected but throw away the redundancy: pick a subset of the edges that still reaches every vertex, yet contains no cycle. That subset is a spanning tree.

A spanning tree of a connected graph on n vertices is a tree that uses all n vertices and (being a tree) exactly n-1 of the graph's edges. It is the lean skeleton left after you strip out every edge that only ever gave you a second, redundant way round. To go from a connected graph with e edges down to a spanning tree, you must delete e-(n-1) edges — one for each extra loop.

The cheapest one: minimum spanning trees

Suppose the edges carry weights — the cost of laying cable between two towns, the length of a road, the price of a wire. A connected graph usually has many spanning trees, and we want the one whose edges add up to the smallest total. That is the minimum spanning tree (MST): the cheapest possible set of connections that still links everything together.

In the network above, five towns can be joined by any of seven cable routes. Step through the figure: the highlighted edges — of weights 1, 2, 3, 4 — connect all five towns for a total of

1 + 2 + 3 + 4 = 10,

and no cheaper connected choice exists. Two routes are left out: they would only add cost and close a cycle.

How the greedy algorithms think

Remarkably, you can find the MST by pure greed — always grabbing the cheapest safe edge — and still end up with the guaranteed optimum. Two classic algorithms do exactly this:

Both are greedy: they never look ahead and never backtrack, yet the special structure of trees makes local cheapness add up to global cheapness. This is the algorithm running inside every plan to wire a neighbourhood, lay a fibre network, or connect a set of computers with the least total cable.

Half true, and the half that is false will bite you. Kruskal's greed does consider edges shortest-first — but it must skip any edge that would form a cycle, even a very cheap one. So the MST is not simply "the n-1 shortest edges." Three cheap edges forming a little triangle can only contribute two of themselves to a tree; the third is refused no matter how tempting its weight.

And beware the opposite trap: the MST minimises total weight, not the distance between any two particular vertices. The unique tree-path between two towns in the MST can be far longer than the direct edge you chose to leave out. Minimum spanning tree ≠ shortest path.

Take n vertices with names on them and ask: how many distinct trees connect them? The answer, Cayley's formula, is astonishingly clean:

n^{\,n-2}.

For n=2 that is 2^0 = 1 (a single edge); for n=3 it is 3^1 = 3; for n=4 already 4^2 = 16; and by n=10 there are 10^8 = 100{,}000{,}000 labelled trees. A tiny handful of dots hides a combinatorial explosion — one of the most beautiful counting results in all of mathematics.