Synchronization Primitives

Two ticket-booking threads look at the same seat. Both read "1 seat left." Both decide "great, I'll take it." Both write "0 seats left." Two customers, one seat, and a very awkward flight. That is a race condition: the answer depends on the exact order two threads happen to interleave — and the order is not something you control.

Here is the whole problem in one line of code. A shared counter starts at 0. Two threads each run count = count + 1 once. You would expect the counter to end at 2. Alarmingly often, it ends at 1.

let count = 0; // Thread A // Thread B count = count + 1; count = count + 1; // You expect count === 2. Sometimes count === 1. Why?

This page is about why that happens and the small set of tools — atomic operations, mutexes and semaphores — that operating systems give us to make it stop happening.

`count = count + 1` is not one action — it is three

The trouble starts with a lie the source code tells you. A single innocent-looking assignment count = count + 1 is not performed in one indivisible step by the hardware. The CPU actually does three separate things:

Read the current value of count from memory into a register;
Modify it — add one inside the register;
Write the new value back to memory.

This read–modify–write trio is three moments in time, and the operating system's scheduler is free to pause a thread between any of them and let another thread run. When both threads read the old value before either writes the new one, one of the two increments simply vanishes — the classic lost update.

A bad interleaving, step by step

Let's freeze the scheduler and watch it choose the worst possible order. Both threads want to add 1 to a counter that starts at 0. Read down the table — each row is one CPU step, and the "reg" columns are each thread's private register.

The disaster is in the timing. Thread A reads 0, and before it can write its result back, thread B also reads 0. Now both threads are computing 0 + 1 = 1. A writes 1; B writes 1 on top of it. Two increments happened, but count went up by only one. The update A made was silently clobbered.

Change the interleaving — let A fully finish (read, modify, write) before B starts — and you get the correct 2. The same code gives a different answer depending purely on scheduler timing. Code that misbehaves only for unlucky interleavings is a nightmare to debug, because it usually works.

Run the race yourself

The sandbox below runs on a single thread, so it can't produce a real race — which is actually perfect, because it means we can script the exact interleaving and make the bug happen every single time instead of one run in a thousand. The program models each thread's read/modify/write as explicit steps, then plays a bad schedule and a safe (locked) schedule so you can compare the results. Press Run ▶:

// Each thread's increment is THREE steps on a shared counter. // A "schedule" is just the order those steps actually execute in. // reg[t] is thread t's private register (what it has "read" so far). interface Machine { count: number; reg: number[]; } function step(m: Machine, action: string): void { const t = Number(action[1]); // "A0" => thread 0, "A1" => thread 1 const kind = action[0]; if (kind === "R") { // READ shared count into my register m.reg[t] = m.count; console.log(" T" + t + " read count(" + m.count + ") => reg=" + m.reg[t]); } else if (kind === "M") { // MODIFY my register m.reg[t] = m.reg[t] + 1; console.log(" T" + t + " modify reg => " + m.reg[t]); } else if (kind === "W") { // WRITE my register back to shared count m.count = m.reg[t]; console.log(" T" + t + " write count <- " + m.reg[t]); } } function run(schedule: string[]): number { const m: Machine = { count: 0, reg: [0, 0] }; for (const a of schedule) step(m, a); return m.count; } // BAD interleaving: both threads READ before either WRITES. console.log("Unsynchronised (interleaved) schedule:"); const bad = run(["R0", "R1", "M0", "M1", "W0", "W1"]); console.log("=> final count = " + bad + " (expected 2!)"); console.log(""); // SAFE: a lock forces thread 0 to finish its whole critical section // before thread 1 may begin. The steps never interleave. console.log("With a mutex (serialised) schedule:"); const good = run(["R0", "M0", "W0", "R1", "M1", "W1"]); console.log("=> final count = " + good + " (correct)");

Same three-step increment, same two threads — only the order differs. The lock's entire job is to forbid the first schedule and guarantee the second.

The critical-section problem

The stretch of code that touches shared data — our read–modify–write on count — is called a critical section. The fix has a name too: mutual exclusion, meaning at most one thread is inside the critical section at a time. If A is mid-increment, B must wait outside until A is done.

But "just block everyone else" is too blunt to be correct on its own. A good solution to the critical-section problem must satisfy three requirements at once:

Mutual exclusion — if one thread is executing in its critical section, no other thread may be in its critical section at the same time.
Progress — if no thread is in the critical section and some threads want to enter, the choice of who goes next cannot be postponed forever; only threads actually competing take part in the decision.
Bounded waiting — there is a limit on how many times other threads can jump ahead of a waiting thread after it has asked to enter. No one is starved out indefinitely.

Mutual exclusion alone gives correctness; progress and bounded waiting give fairness and liveness. A lock that guarantees exclusion but lets one unlucky thread wait forever has solved half the problem and created another.

Atomic operations: the hardware foundation

Here is the chicken-and-egg twist. To protect the counter we want a lock. But a lock is itself just a variable in shared memory — locked = true is its own read–modify–write, so it can race exactly like the counter did. You cannot build mutual exclusion out of ordinary reads and writes without help. That help comes from the CPU, in the form of atomic instructions — operations the hardware guarantees happen as one indivisible step, with no possible interleaving in the middle.

The two workhorses are test-and-set and compare-and-swap. Each does a read and a write as a single unbreakable unit:

// TEST-AND-SET: atomically read the old value AND write true, // returning what was there before. The whole function is ONE hardware step. function testAndSet(lock: { held: boolean }): boolean { const old = lock.held; // read ┐ lock.held = true; // write ┴ these two happen atomically — indivisible return old; } // A spin-lock built from it: keep trying until we were the one who // flipped it from false to true. function acquire(lock: { held: boolean }): void { while (testAndSet(lock)) { // it was already true -> someone else holds it -> spin and retry } } function release(lock: { held: boolean }): void { lock.held = false; } // COMPARE-AND-SWAP (CAS): atomically, "if the value equals `expected`, // replace it with `next`." Returns whether the swap happened. function compareAndSwap(cell: { v: number }, expected: number, next: number): boolean { if (cell.v === expected) { // compare ┐ cell.v = next; // swap ┴ atomic: no thread can sneak in between return true; } return false; }

Because the read-and-write is fused into one instruction, two threads calling testAndSet can never both see false — exactly one wins, the other sees true and spins. That single indivisible step is the bedrock every higher-level primitive is built on. CAS is the more general of the two and underlies most lock-free data structures and the atomic counters in modern languages.

On a single-core machine you could get mutual exclusion by telling the CPU "don't switch threads right now" (disabling interrupts) around the critical section. It works — but it's a sledgehammer: it stops all scheduling, it only protects one core (another core keeps running happily on a multi-core chip), and you'd never hand that power to ordinary user programs. Atomic instructions like CAS give you the exact, minimal guarantee you need — one memory location, one indivisible step — and they work across every core.

The primitives: mutex and semaphore

Spinning in a loop wastes the CPU. So the operating system wraps atomic instructions into two friendlier tools that also sleep a waiting thread instead of burning cycles.

A mutex (short for mutual exclusion) is a lock with exactly two operations, lock() and unlock(). At most one thread can hold it. Anyone else calling lock() is put to sleep until the owner calls unlock(). It is the natural guard for a critical section:

mutex.lock(); // enter — blocks if another thread holds the lock count = count + 1; // critical section: now safely exclusive mutex.unlock(); // leave — wakes one waiter

A semaphore is more general. It holds an integer count and has two operations, historically wait() (also called P or down) and signal() (also called V or up):

wait() decrements the count; if the count would go negative, the thread blocks until someone signals.
signal() increments the count, waking a blocked thread if one is waiting.

Think of the count as the number of available permits. A semaphore initialised to 3 lets up to three threads through at once — perfect for a resource pool of, say, three database connections. Initialise it to 1 and it behaves like a mutex (a "binary semaphore"), allowing just one thread in. That extra generality — counting several permits, and being able to signal from a different thread than the one that waited — is exactly what a plain mutex can't do.

Mutex — a lock owned by one thread; the same thread that locks must unlock. Use it to protect a critical section.
Semaphore — a counter of permits; any thread may wait or signal. Use it to count a resource pool or to signal between threads.

They feel similar because a semaphore initialised to 1 gives mutual exclusion, just like a mutex. But two differences matter. First, a semaphore counts — set it to N and it admits N threads, which a mutex can never do. Second, a mutex has an owner: whoever locks it must be the one to unlock it. A semaphore has no owner — one thread can wait and a completely different thread can signal, which is precisely how you use it to signal an event from producer to consumer. Reaching for a semaphore when you only wanted a lock is a common way to end up with subtle, ownerless bugs.

Locks introduce their own hazard: deadlock. Suppose thread A locks mutex 1 and then tries to lock mutex 2, while thread B locks mutex 2 and then tries to lock mutex 1. Each holds what the other needs, and each waits forever. Nothing is racing now — the program has simply frozen solid. Deadlock needs four conditions to hold at once (mutual exclusion, hold-and-wait, no preemption, and a circular wait), and avoiding it — for instance by always acquiring locks in a fixed global order — is the next big chapter in concurrency. Synchronisation primitives cure the race; using them carelessly invites the deadlock.

count = count + 1 is not one action — it is three

A bad interleaving, step by step

Run the race yourself

The critical-section problem

Atomic operations: the hardware foundation

The primitives: mutex and semaphore

`count = count + 1` is not one action — it is three