Thinking Concurrently

Imagine you're cooking a Sunday roast. The potatoes need 40 minutes in the oven, the gravy needs 10 minutes on the hob, and the table needs laying. You don't stand and stare at the potatoes for 40 minutes, then start the gravy, then lay the table. You put the potatoes in, and while they cook you make the gravy and lay the table. Same jobs, far less total time — because you stopped doing them strictly one after another.

That is concurrent thinking: structuring a problem so that several tasks are in progress at the same time rather than lined up in a single queue. It is one of the pillars of computational thinking, and it's how everything from a web server handling a thousand visitors to a phone playing music while you scroll actually works. On this page we'll pin down what concurrency really means, see how it differs from true parallelism, work out which problems split and which stubbornly don't, and meet the classic trap that catches everyone: the race condition.

Concurrency is not (quite) parallelism

These two words are used loosely in everyday speech, but at A-level the distinction is worth getting exactly right, because it's a favourite exam question.

Concurrency is about structure: a system deals with several tasks that are all on the go at once, making progress over the same period. It does not require more than one processor. A single core can be concurrent by interleaving — doing a slice of task A, switching to a slice of task B, back to A, and so on, so fast that all of them appear to advance together.
Parallelism is about execution: two or more tasks running literally at the same instant, which needs genuinely separate hardware — multiple cores (or multiple machines). At any single moment, more than one instruction is actually being carried out.

A neat one-liner from the programming folklore captures it:

Concurrency is dealing with many things at once (a structure for making progress on multiple tasks over a period).
Parallelism is doing many things at once (multiple tasks physically executing at the same instant).
All parallelism is concurrency, but concurrency does not require parallel hardware — one core can be concurrent by rapidly switching between tasks.

Think of one chess grandmaster playing 20 opponents on 20 boards. She is concurrent: she moves at board 1, walks to board 2, makes a move, walks on — all 20 games are in progress, but only one move is being made at any instant. Now give her 20 clones, one per board: that's parallel — 20 moves happening at the very same moment.

Seeing it on a timeline

The clearest way to feel the difference is to draw time running left to right and colour in when each task is actually working. Step through the three approaches below for the same three tasks (A, B, C):

Notice two things. First, concurrent (one core) and sequential take a similar total time when every task is pure computation — interleaving on a single core doesn't create extra work capacity, it just shares the one core (its real payoff comes when tasks spend time waiting, e.g. for a disk or the network, so the core can get on with someone else's work meanwhile). Second, parallel is the one that genuinely finishes sooner, because three cores really are doing three things at the same instant.

Because most tasks spend a surprising amount of time blocked — waiting for a key press, a file to load, or a web reply. While one task waits, the scheduler hands the core to another that's ready to run. So even a single core stays busy and everything appears to advance together. This is why concurrency helps enormously for I/O-bound work (lots of waiting) even without extra cores, but adds little to CPU-bound work (flat-out computation) unless you have real parallel hardware.

Which problems split — and which don't

The whole payoff of parallelism depends on one question: can the work be broken into pieces that don't need each other? Independent sub-tasks are the dream; chains of dependencies are the enemy.

Splits beautifully (embarrassingly parallel). Suppose you have 1,000 holiday photos to shrink to thumbnails. Photo 37's thumbnail does not depend on photo 36's in any way — each is its own little island of work. Hand 250 photos to each of four cores and you finish in roughly a quarter of the time. Other examples: rendering different frames of an animation, searching different chunks of a huge file, adding up different sections of a list.

Refuses to split (inherently sequential). Now suppose each step needs the previous step's result. Computing a running bank balance: to know the balance after transaction 5 you must already have the balance after transaction 4. There is no way to start step 5 before step 4 finishes — throwing more cores at it changes nothing. The same is true of many recurrences, like the Fibonacci sequence where each term is the sum of the two before it.

// SPLITS WELL: each thumbnail is independent — hand them out to many cores. for (const photo of photos) { makeThumbnail(photo); // does not depend on any other photo } // DOES NOT SPLIT: each balance needs the one before it — strictly step by step. let balance = 0; for (const t of transactions) { balance = balance + t; // needs the PREVIOUS value of balance }

A useful mental test: ask "to start this piece, do I need the answer from a different piece?" If every piece answers "no", the problem is ripe for parallelism. If the pieces form a chain of "yes", it's sequential by nature and no amount of hardware will speed the chain itself up.

The catch: sharing data

Independent tasks are easy. The trouble starts the moment two concurrent tasks touch the same piece of data. Because you can no longer predict exactly when each task runs (the scheduler interleaves them, or separate cores race along together), the order in which they read and write shared data becomes unpredictable — and different orders can give different answers.

Consider two tasks both trying to add 1 to a shared counter that starts at 10. Adding 1 looks like a single action, but underneath it is really three steps: read the value, add one, write it back. If those steps from the two tasks get interleaved unluckily, one update can silently overwrite the other:

Both tasks ran "increment", so the counter should read 12. But because both read 10 before either wrote, they both write 11 — and one increment is lost. This is a race condition: a bug whose appearance depends on timing.

The fix is to coordinate access so the read–add–write happens without interruption. The commonest tool is a lock (also called a mutex, for "mutual exclusion"): a task must acquire the lock before touching the shared data and release it after, so only one task can be in that critical section at a time. The other task simply waits its turn.

// Without coordination — a race: the two updates can clobber each other. counter = counter + 1; // With a lock — only one task is inside the critical section at a time. lock.acquire(); // wait here until it's my turn counter = counter + 1; // read–add–write, safe from interruption lock.release(); // let the next task in

Locks make shared data safe, but they cost something: tasks now sometimes wait for each other, which eats into the speed-up parallelism promised. Used carelessly they cause their own headaches — most famously deadlock, where task A holds lock 1 and waits for lock 2 while task B holds lock 2 and waits for lock 1, so both wait forever. Designing concurrent programs is largely the art of sharing as little as possible, and coordinating what you must share carefully.

Two traps trip up almost everyone.

1. Race conditions. When two tasks update shared data without coordination, the result depends on the exact timing of their steps — and can simply be wrong. Worse, it's often wrong only occasionally: the code passes every test on your machine, then fails once in a thousand runs in production, because the unlucky interleaving is rare. "It works on my computer" is no proof a concurrent program is correct. Whenever tasks share writable data, you must coordinate access (e.g. with a lock).

2. Not everything speeds up. Adding cores only helps the part of a job that can run in parallel. If a task is strictly step-by-step — each step needing the previous step's result — it cannot be parallelised at all, and extra cores sit idle. Even a mostly parallel job is limited by its sequential leftover: this is the intuition behind Amdahl's law. If one tenth of a job must run sequentially, then even with infinite cores you can never make the whole job more than about 10× faster — that stubborn sequential tenth sets a hard ceiling on the speed-up.

Amdahl's law: the ceiling on speed-up

Let's make that ceiling concrete. Suppose a fraction p of a job can be perfectly parallelised and the remaining 1 - p is stuck being sequential. With N cores, the best possible speed-up is:

S(N) = \dfrac{1}{(1 - p) + \dfrac{p}{N}}

The parallel part shrinks as you add cores (the p/N term), but the sequential part (1 - p) never budges. Push N to infinity and p/N vanishes, leaving a hard limit of 1 / (1 - p). So if 90% is parallel (p = 0.9), the ceiling is 1/0.1 = 10\times — no matter how many cores you buy. The lesson: the sequential bottleneck, not the number of cores, is usually what limits real-world speed-up.

Absolutely — if your workload is genuinely parallel. A render farm chewing through thousands of independent frames, or a server juggling thousands of independent requests, keeps every core busy and scales almost linearly. Amdahl's law isn't saying "cores are pointless"; it's saying "measure the sequential fraction first." A job that's 99% parallel can reach nearly 100\times; a job that's 50% parallel tops out at 2\times however much hardware you throw at it.