Graphs

Think about the London Underground map, your friends on a social network, the roads between towns, or the web pages that link to one another. What do all of these have in common? Each is a collection of things joined by connections. Stations joined by tunnels; people joined by friendships; towns joined by roads. When a problem looks like that — stuff, and links between the stuff — the data structure you reach for is a graph.

A graph is made of just two kinds of thing:

a set of vertices (also called nodes) — the "things": the stations, the people, the towns;
a set of edges — the connections between pairs of vertices: the tunnels, the friendships, the roads.

That's the whole idea. It sounds almost too simple, yet this one structure models an astonishing range of real problems — from finding the shortest route across a city to working out who is "friends of friends" with whom. Here is a small graph of five people and who knows whom:

Alice, Bob, Cara, Dan and Eve are the vertices; a line between two of them is an edge meaning "these two are friends". Notice there is no first or last, no top or bottom — a graph has no fixed shape. All that matters is which vertices are joined to which.

Which way does an edge point? Directed vs undirected

Some connections go both ways and some go one way. Friendship on most networks is mutual — if Alice is friends with Bob, then Bob is friends with Alice — so the edge has no direction. We call such a graph undirected, and we draw its edges as plain lines.

But plenty of connections are one-directional. On a photo-sharing app, Alice can follow Bob without Bob following her back. A one-way street lets cars go from A to B but not B to A. A web page can link to another that never links back. When edges have a direction we call the graph directed (or a digraph), and we draw each edge as an arrow from one vertex to another.

In this "follows" network the arrow from Alice to Bob means "Alice follows Bob". The arrow only goes one way, so it says nothing about whether Bob follows Alice. In a directed graph the pair (A, B) is a different edge from (B, A) — order matters, exactly as it does for an ordered pair in maths.

How strong, how far? Weighted edges

So far an edge has only told us whether two vertices are connected. Often we also want to know how much: how far apart two towns are, how long a flight takes, how much a connection costs. We attach a number — a weight — to each edge. A graph with weighted edges is called a weighted graph.

Here the weight on each edge is the distance in miles between two towns. Now questions like "what is the shortest route from Ayton to Deeby?" become meaningful — you add up the weights along a route and look for the smallest total. (Algorithms such as Dijkstra's do exactly this, but that is a story for a later page.) An unweighted graph is really just a weighted one where every edge secretly has weight 1.

Storing a graph, way 1: the adjacency matrix

A picture is fine for humans, but a computer needs the graph as data. The first standard way is an adjacency matrix: a 2-D grid (an array of arrays) with one row and one column for every vertex. The cell in row i, column j answers a single question — is there an edge from vertex i to vertex j? We write a 1 if yes and a 0 if no (or, in a weighted graph, the edge's weight instead of 1).

For an undirected graph the matrix is symmetric: if there's an edge from A to B there's equally one from B to A, so cell (A,B) and cell (B,A) are both 1 — the grid mirrors across its top-left-to-bottom-right diagonal. For a directed graph it need not be symmetric: an arrow one way sets one cell without the other.

The matrix's charm is how simple and fast it is: to check "are i and j connected?" you just look at cell matrix[i][j] — an instant, single lookup. Its cost is space: a graph of n vertices always needs an n \times n grid — that is n^2 cells — whether or not those edges actually exist.

Storing a graph, way 2: the adjacency list

The second standard way turns the question around. Instead of a full grid, keep — for each vertex — a short list of just its neighbours (the vertices it has an edge to). This is an adjacency list. If a vertex has only a couple of connections, its list holds only a couple of entries; nothing is stored for the edges that don't exist.

Alice → [Bob, Cara] Bob → [Alice, Dan] Cara → [Alice] Dan → [Bob]

Let's actually build one. In TypeScript a neat way to hold an adjacency list is a Map from each vertex's name to an array of its neighbours' names. We add an edge by pushing each end onto the other's list (both ends, because this graph is undirected), then print out every vertex with its neighbours. Press Run:

// An adjacency list: each vertex maps to an array of its neighbours. const graph = new Map<string, string[]>(); function addVertex(v: string): void { if (!graph.has(v)) graph.set(v, []); } function addEdge(a: string, b: string): void { addVertex(a); addVertex(b); graph.get(a)!.push(b); // b is a neighbour of a graph.get(b)!.push(a); // and a is a neighbour of b (undirected) } addEdge("Alice", "Bob"); addEdge("Alice", "Cara"); addEdge("Bob", "Dan"); // Print each vertex and the neighbours it can reach directly. for (const [vertex, neighbours] of graph) { console.log(vertex + " → " + neighbours.join(", ")); }

Read the output: each line is one vertex followed by exactly the vertices joined to it by an edge — the graph, stored compactly. To list a vertex's neighbours you go straight to its list; nothing is wasted on the pairs that aren't connected. For a directed graph you would push only one way (graph.get(a)!.push(b) and not the reverse), and for a weighted graph each list entry would store a pair — the neighbour and the weight — instead of just a name.

So which representation should I use?

Both store the same graph, so the choice is about trade-offs. It hinges on one word: density. A graph is dense if it has close to the maximum possible edges (nearly everything connected to nearly everything), and sparse if it has relatively few (each vertex joined to just a handful of others). Most real-world graphs — road maps, social networks — are very sparse.

Adjacency matrix — always n^2 space for n vertices, but checking "is there an edge between i and j?" is a single instant lookup. Simplest to code; best when the graph is small or dense.
Adjacency list — space grows only with the number of edges actually present, so it is compact for a sparse graph; listing a vertex's neighbours is quick. Checking one specific edge means scanning that vertex's list. Best when the graph is large and sparse — which most real graphs are.

A tempting mistake is to reach for the adjacency matrix every time because it feels tidy. But consider a social network of a million users where each has only a few hundred friends. An adjacency matrix would demand a million-by-million grid — a trillion cells — almost all of them 0. That is hopeless. For a sparse graph (few edges relative to vertices) the adjacency list is almost always the right choice: it stores only the edges that exist, so its size grows with the connections you actually have, not with the square of the vertex count. Save the matrix for graphs that are small, or genuinely dense.

A second thing that catches people out: a graph can contain a cycle — a path that loops back to where it started (Alice → Bob → Cara → Alice). This is perfectly normal and allowed. It is exactly what makes a graph more general than a tree, which is a special restricted graph with no cycles. Because of cycles, an algorithm that walks a graph must usually keep a "visited" set, or it will happily go round and round forever.