Deadlock

Two people meet in the middle of a footbridge just wide enough for one. Each steps aside to let the other pass — but they both step the same way, and end up facing each other again. Then again. Neither will back up first, because "the other one should go." They can stand there forever. Nobody is broken; nobody is doing anything wrong; and yet nothing can happen.

That is a deadlock. In an operating system it looks like this: a set of processes are each holding a resource and waiting for a resource held by another process in the set — arranged in a circle, so every one of them is blocked on a neighbour that will never let go. When it happens, those processes are stuck permanently. They will not use any more CPU, they will not release what they hold, and left alone they will wait until the machine is rebooted.

Deadlock is the dark twin of the locks and semaphores we use to keep concurrent code correct. The very mechanism that stops two threads from trampling each other's data — making them wait for a lock — is exactly the mechanism that can freeze them solid. And unlike a starved process that the scheduler might eventually get to, a deadlocked process is not waiting for its turn. It is waiting for something that can never arrive.

The classic: two threads, two locks, opposite order

Here is the smallest deadlock you can write. Two threads each need both lock A and lock B. Thread 1 grabs A then reaches for B; Thread 2 grabs B then reaches for A. Read the timeline down the middle:

Thread 1 Thread 2 -------------------------- -------------------------- lock(A) ✓ got A lock(B) ✓ got B lock(B) … waiting for B lock(A) … waiting for A ⤷ Thread 1 holds A, wants B — Thread 2 holds B, wants A ⤷ each waits for the other → FROZEN FOREVER

Notice how little it takes. No bug, no crash, no bad data — just two correct threads that happened to acquire the same two locks in opposite orders, at the wrong instant. The interleaving is a matter of luck, which is what makes deadlocks so nasty: your program runs fine ten thousand times and wedges on the ten-thousand-and-first.

Impose a global lock ordering: decree that every thread must acquire A before B, never the other way round. Now Thread 2 can't grab B first — it must take A first, and if Thread 1 already holds A, Thread 2 simply waits at the very first step without holding anything. When it finally gets A, it gets B too, and finishes. The circular wait is impossible because everyone climbs the ladder in the same direction. (This is a real technique — it kills the fourth Coffman condition below.)

When exactly can it happen? The four Coffman conditions

In 1971 Edward Coffman and colleagues pinned down precisely what a deadlock needs. The beautiful part is that all four conditions must hold at the same time — knock out any single one, and deadlock becomes impossible. That is the lever every prevention strategy pulls.

A deadlock can occur only if all four of these hold simultaneously:

Mutual exclusion — at least one resource is held in a non-shareable way: only one process may use it at a time (a printer, an exclusive lock).
Hold and wait — a process is holding at least one resource while waiting to acquire additional resources currently held by others.
No preemption — a resource cannot be forcibly taken away; it is released only voluntarily, by the process holding it.
Circular wait — there is a cycle of waiting processes P_0 \to P_1 \to \cdots \to P_n \to P_0, where each P_i is waiting for a resource held by the next.

Read them back against the footbridge: the bridge holds one person at a time (mutual exclusion); each person stands their ground while wanting the other's spot (hold and wait); no one can be shoved backwards (no preemption); and each waits on the other, a cycle of length two (circular wait). Remove any one — widen the bridge, make someone give up their spot, allow a shove — and the jam clears.

Seeing the cycle: the resource-allocation graph

The cleanest way to spot a deadlock is to draw it. In a resource-allocation graph we use two kinds of node and two kinds of arrow:

a circle is a process; a square is a resource;
an assignment edge points resource → process ("this resource is held by that process");
a request edge points process → resource ("this process is waiting for that resource").

Step through the graph below. P_1 holds R_1 and wants R_2; P_2 holds R_2 and wants R_1. Watch the last arrow close the loop:

The four arrows form a directed cycle P_1 \to R_2 \to P_2 \to R_1 \to P_1. With one unit of each resource, that cycle is the deadlock — a perfect picture of the circular-wait condition. Detecting deadlock (for single-instance resources) reduces to a plain question you already know how to answer: does this directed graph contain a cycle?

It is tempting to memorise "a cycle in the resource-allocation graph means deadlock." That is only the whole truth when every resource has a single instance. If a resource type has several interchangeable units (say three identical printers), a cycle becomes necessary but not sufficient: a process outside the cycle might be holding a spare unit that it will soon release, breaking the wait and freeing everyone.

So: no cycle → definitely no deadlock (always safe). But cycle → maybe deadlock when resources have multiple instances — you have to look closer (that is exactly what detection algorithms and the banker's algorithm do). Don't over-claim from a cycle alone.

Three ways to deal with it

An operating system has three honest strategies (and one dishonest one we'll get to):

Prevention — design the system so that one of the four Coffman conditions can never hold. Kill any one pillar and deadlock is structurally impossible.
Avoidance — allow the conditions in principle, but before granting each request, check that doing so leaves the system in a safe state. Refuse (make the process wait) if it wouldn't. This is the banker's algorithm.
Detection & recovery — let deadlocks happen, but periodically search the resource-allocation graph for a cycle; when one is found, recover by aborting a process or forcibly preempting (rolling back) a resource.

Each targets a different point on the spectrum from "never risk it" to "clean up afterwards." How you choose depends on how often deadlock would occur and how expensive it is to detect versus to avoid.

Strategy	Attacks a Coffman condition?	Cost
Prevention	Yes — negates one of the four outright	Low utilisation; rigid design
Avoidance (banker's)	No — permits all four, but steers away from unsafe states	Needs each process's maximum claim up front; runtime checks
Detection & recovery	No — allows deadlock, then breaks it	Periodic graph search; recovery is disruptive

How prevention negates each condition

Prevention is delightfully concrete: pick one Coffman pillar and make it structurally impossible.

Negate mutual exclusion — make resources shareable where you can (read-only files, lock-free data structures). Not always possible: a printer really can only serve one job at a time.
Negate hold-and-wait — require a process to request all its resources at once, atomically, and hold none while waiting for the rest. (Downside: it must grab everything up front, hurting utilisation.)
Negate no-preemption — if a process holding some resources requests one it can't get, make it release everything and try again later. Works for state you can save and restore (like CPU registers); useless for a half-printed page.
Negate circular wait — impose a total ordering on resources and require every process to request them in increasing order (the lock-ordering trick from earlier). No cycle can form on a strictly increasing chain.

Avoidance: the banker's algorithm

Dijkstra's banker's algorithm imagines the OS as a cautious banker. Each customer (process) declares up front the maximum it might ever need. The banker grants a request only if, after granting it, there still exists some order in which every customer can be given the rest of what it needs, finish, and return everything. Such an order is a safe sequence, and a state that has one is a safe state.

The bookkeeping uses four tables, over n processes and resource types:

Available — units of each resource currently free;
Max — each process's declared maximum demand;
Allocation — what each process holds right now;
Need = Max - Allocation — what it might still ask for.

A worked safety check

Five processes P_0 \ldots P_4, three resource types A, B, C. Currently Available = (3, 3, 2). Here are Max and Allocation, and the Need we get by subtracting cell by cell:

Process	Max (A B C)	Allocation (A B C)	Need = Max − Alloc
P_0	7 5 3	0 1 0	7 4 3
P_1	3 2 2	2 0 0	1 2 2
P_2	9 0 2	3 0 2	6 0 0
P_3	2 2 2	2 1 1	0 1 1
P_4	4 3 3	0 0 2	4 3 1

Now the safety algorithm. Keep a running Work vector of what's free, starting at Available. Repeatedly find any unfinished process whose Need ≤ Work (component by component); pretend it runs, finishes, and hands back everything it held — so Work += Allocation. Watch Work grow:

Work = Available = (3, 3, 2) P1? Need (1,2,2) ≤ (3,3,2) ✓ Work += Alloc(2,0,0) → (5, 3, 2) P3? Need (0,1,1) ≤ (5,3,2) ✓ Work += Alloc(2,1,1) → (7, 4, 3) P4? Need (4,3,1) ≤ (7,4,3) ✓ Work += Alloc(0,0,2) → (7, 4, 5) P0? Need (7,4,3) ≤ (7,4,5) ✓ Work += Alloc(0,1,0) → (7, 5, 5) P2? Need (6,0,0) ≤ (7,5,5) ✓ Work += Alloc(3,0,2) → (10, 5, 7) All five finished → STATE IS SAFE.

Every process finished, so the state is safe and a safe sequence is \langle P_1, P_3, P_4, P_0, P_2 \rangle. (Others exist — the algorithm just needs to find one.) Had we reached a point where no unfinished process fit within Work, the state would be unsafe, and the banker would have refused the request that led there. Crucially, unsafe is not the same as deadlocked — it means deadlock has become possible, and a cautious banker refuses to go there.

The safety check, in code

Here is the whole safety algorithm on that exact example. It computes Need, grows the Work vector, and reports whether the state is safe together with a safe sequence. Press Run — then try editing available down to [1, 3, 2] and see the verdict flip:

type Vec = number[]; // The state to check (the worked example above). const available: Vec = [3, 3, 2]; const max: Vec[] = [ [7,5,3], [3,2,2], [9,0,2], [2,2,2], [4,3,3] ]; const alloc: Vec[] = [ [0,1,0], [2,0,0], [3,0,2], [2,1,1], [0,0,2] ]; const n = max.length; // processes const m = available.length; // resource types // Need[i][j] = Max[i][j] - Allocation[i][j] const need: Vec[] = max.map((row, i) => row.map((v, j) => v - alloc[i][j])); const work: Vec = [...available]; // free units, grows as processes "finish" const done: boolean[] = new Array(n).fill(false); const sequence: number[] = []; let madeProgress = true; while (madeProgress) { madeProgress = false; for (let i = 0; i < n; i++) { if (done[i]) continue; // can process i finish right now? Need[i] must fit within Work in every column. const fits = need[i].every((needJ, j) => needJ <= work[j]); if (fits) { for (let j = 0; j < m; j++) work[j] += alloc[i][j]; // it finishes, returns its resources done[i] = true; sequence.push(i); madeProgress = true; } } } if (done.every((d) => d)) { console.log("SAFE. A safe sequence is: " + sequence.map((i) => "P" + i).join(" -> ")); } else { const stuck = done.map((d, i) => d ? -1 : i).filter((i) => i >= 0); console.log("UNSAFE — deadlock is possible. Stuck processes: " + stuck.map((i) => "P" + i).join(", ")); }

Detection, and the lazy option

If you don't want the up-front cost of avoidance, you can let deadlocks form and hunt them down. Detection runs a cycle-search (for single-instance resources) or a banker-style reduction (for multiple instances) over the current allocation state, on a schedule. When it finds a deadlock, recovery breaks the cycle the only ways it can: abort one or more deadlocked processes, or preempt a resource — roll a process back to an earlier checkpoint and hand its resource to someone else. Both are disruptive, which is why detection is run only as often as deadlocks are expected.

There's a fourth strategy nobody puts on the syllabus with a straight face: the ostrich algorithm — stick your head in the sand and pretend deadlocks never happen. General-purpose operating systems like Linux and Windows largely do exactly this for user processes. Why? Because for everyday workloads deadlocks are rare, and the cost of prevention (crippled resource utilisation) or avoidance (every process must pre-declare its maximum needs — usually unknowable) is worse than the occasional hang cured by killing an app or rebooting.

The banker's algorithm and friends really shine where deadlock is unacceptable and demands are predictable — databases, real-time and safety-critical control systems. Engineering is choosing which price you'd rather pay.