Two ticket-booking threads look at the same seat. Both read "1 seat left." Both decide "great,
I'll take it." Both write "0 seats left." Two customers, one seat, and a very awkward flight. That
is a race condition: the answer depends on the exact order two
Here is the whole problem in one line of code. A shared counter starts at
count = count + 1 once. You would
expect the counter to end at
This page is about why that happens and the small set of tools — atomic operations, mutexes and semaphores — that operating systems give us to make it stop happening.
count = count + 1 is not one action — it is three
The trouble starts with a lie the source code tells you. A single innocent-looking assignment
count = count + 1 is not performed in one indivisible step by the
hardware. The CPU actually does three separate things:
count from memory into a register;This read–modify–write trio is three moments in time, and the operating system's scheduler is free to pause a thread between any of them and let another thread run. When both threads read the old value before either writes the new one, one of the two increments simply vanishes — the classic lost update.
Let's freeze the scheduler and watch it choose the worst possible order. Both threads want to add
The disaster is in the timing. Thread A reads count went up
by only one. The update A made was silently clobbered.
Change the interleaving — let A fully finish (read, modify, write) before B starts — and you get
the correct
The sandbox below runs on a single thread, so it can't produce a real race — which is actually perfect, because it means we can script the exact interleaving and make the bug happen every single time instead of one run in a thousand. The program models each thread's read/modify/write as explicit steps, then plays a bad schedule and a safe (locked) schedule so you can compare the results. Press Run ▶:
Same three-step increment, same two threads — only the order differs. The lock's entire job is to forbid the first schedule and guarantee the second.
The stretch of code that touches shared data — our read–modify–write on count — is
called a critical section. The fix has a name too: mutual
exclusion, meaning at most one thread is inside the critical section at a time.
If A is mid-increment, B must wait outside until A is done.
But "just block everyone else" is too blunt to be correct on its own. A good solution to the critical-section problem must satisfy three requirements at once:
Mutual exclusion alone gives correctness; progress and bounded waiting give fairness and liveness. A lock that guarantees exclusion but lets one unlucky thread wait forever has solved half the problem and created another.
Here is the chicken-and-egg twist. To protect the counter we want a lock. But a lock is itself just
a variable in shared memory — locked = true is its own read–modify–write, so it can
race exactly like the counter did. You cannot build mutual exclusion out of ordinary reads and
writes without help. That help comes from the CPU, in the form of
atomic instructions — operations the hardware guarantees happen as one
indivisible step, with no possible interleaving in the middle.
The two workhorses are test-and-set and compare-and-swap. Each does a read and a write as a single unbreakable unit:
Because the read-and-write is fused into one instruction, two threads calling
testAndSet can never both see false — exactly one wins, the other sees
true and spins. That single indivisible step is the bedrock every higher-level
primitive is built on. CAS is the more general of the two and underlies most
lock-free data structures and the atomic counters in modern languages.
On a single-core machine you could get mutual exclusion by telling the CPU "don't switch threads right now" (disabling interrupts) around the critical section. It works — but it's a sledgehammer: it stops all scheduling, it only protects one core (another core keeps running happily on a multi-core chip), and you'd never hand that power to ordinary user programs. Atomic instructions like CAS give you the exact, minimal guarantee you need — one memory location, one indivisible step — and they work across every core.
Spinning in a loop wastes the CPU. So the operating system wraps atomic instructions into two friendlier tools that also sleep a waiting thread instead of burning cycles.
A mutex (short for mutual exclusion) is a lock with exactly two
operations, lock() and unlock(). At
most one thread can hold it. Anyone else calling lock() is put to sleep until the owner
calls unlock(). It is the natural guard for a critical section:
A semaphore is more general. It holds an integer count and has two operations,
historically wait() (also called P or down) and
signal() (also called V or up):
wait() decrements the count; if the count would go negative, the thread
blocks until someone signals.
signal() increments the count, waking a blocked thread if one is waiting.
Think of the count as the number of available permits. A semaphore initialised to
wait or
signal. Use it to count a resource pool or to signal between threads.
They feel similar because a semaphore initialised to wait and a completely
different thread can signal, which is precisely how you use it to signal an
event from producer to consumer. Reaching for a semaphore when you only wanted a lock is a
common way to end up with subtle, ownerless bugs.
Locks introduce their own hazard: deadlock. Suppose thread A locks mutex 1 and then tries to lock mutex 2, while thread B locks mutex 2 and then tries to lock mutex 1. Each holds what the other needs, and each waits forever. Nothing is racing now — the program has simply frozen solid. Deadlock needs four conditions to hold at once (mutual exclusion, hold-and-wait, no preemption, and a circular wait), and avoiding it — for instance by always acquiring locks in a fixed global order — is the next big chapter in concurrency. Synchronisation primitives cure the race; using them carelessly invites the deadlock.