Of all the shapes a graph can take, one is so clean, so useful, and so common that it earns its own name: the tree. A tree is a graph with no wasted edges and no way to go round in a loop — just enough connections to hold everything together, and not one more. Family trees, file systems, the branching of a river, the wiring plan that reaches every house with the least cable — all of these are trees.
Formally, a tree is a graph that is
Everything about trees flows from a single counting fact. Watch this tree grow: begin with one lone vertex, then add vertices one by one. To keep the graph connected, each new vertex must reach the rest — and the cheapest way is a single edge to something already there. Adding two edges would create a cycle; adding none would leave it stranded. So every new vertex costs exactly one edge.
Turn it around and it is just as handy: a tree with
The definition above is only one of several equivalent portraits. For a graph on
That last one is the soul of a tree: exactly one route between any two points. It is why a family tree never loops and why routing on a tree is unambiguous. A tree with at least two vertices also always has at least two leaves — vertices touched by a single edge — the dangling tips where a branch ends.
A tempting mistake: "a tree is so sparse, surely I can drop in any extra edge and still have
a tree." Not so. A tree already has a unique path between every pair of vertices. Add an edge between
two vertices
The flip side is just as sharp: remove any edge from a tree and it splits into two disconnected pieces, because that edge was the only route across. A tree is perfectly poised — every edge is a bridge, and every non-edge would be a cycle waiting to happen.
Now take a graph that has more than enough edges — a tangle of connections with plenty of cycles. Often we want to keep everything connected but throw away the redundancy: pick a subset of the edges that still reaches every vertex, yet contains no cycle. That subset is a spanning tree.
A spanning tree of a connected graph on
Suppose the edges carry weights — the cost of laying cable between two towns, the length of a road, the price of a wire. A connected graph usually has many spanning trees, and we want the one whose edges add up to the smallest total. That is the minimum spanning tree (MST): the cheapest possible set of connections that still links everything together.
In the network above, five towns can be joined by any of seven cable routes. Step through the figure:
the highlighted edges — of weights
and no cheaper connected choice exists. Two routes are left out: they would only add cost and close a cycle.
Remarkably, you can find the MST by pure greed — always grabbing the cheapest safe edge — and still end up with the guaranteed optimum. Two classic algorithms do exactly this:
Both are greedy: they never look ahead and never backtrack, yet the special structure of trees makes local cheapness add up to global cheapness. This is the algorithm running inside every plan to wire a neighbourhood, lay a fibre network, or connect a set of computers with the least total cable.
Half true, and the half that is false will bite you. Kruskal's greed does consider edges
shortest-first — but it must skip any edge that would form a cycle, even a very cheap
one. So the MST is not simply "the
And beware the opposite trap: the MST minimises total weight, not the distance between any two particular vertices. The unique tree-path between two towns in the MST can be far longer than the direct edge you chose to leave out. Minimum spanning tree ≠ shortest path.
Take
For