You already know how a CPU runs a program: it takes each instruction and pushes it through the
That works — but look at what the CPU's hardware is doing while it does it. The three stages are handled by different parts of the processor. So while the execute hardware is busy running instruction 1, the fetch hardware and the decode hardware are just sitting there idle. Two-thirds of the machine is doing nothing at any given moment. That is a waste — and getting rid of that waste is the whole idea of pipelining.
Pipelining means overlapping the stages of consecutive instructions: while instruction 1 is being executed, instruction 2 is being decoded, and instruction 3 is being fetched — all at the same moment. Nobody stands idle. The instructions flow through the CPU in a steady stream, so more of them finish per second.
Imagine you have three loads of washing to do, and each load goes through three machines in order: the washer, then the dryer, then the folding table. Say each stage takes one hour.
The slow, one-at-a-time way: wash load 1, dry load 1, fold load 1 — then start load 2. Each
load takes 3 hours and you never touch a load until the one before it is completely finished. Three
loads take
That's exactly what a smart person does. The moment load 1 moves from the washer to the dryer, load 2
goes into the washer. An hour later, load 1 is being folded, load 2 is drying, and load 3 is
washing — all three machines running at once. Now three loads take only
A pipelined CPU is doing exactly this. Fetch = washer, decode = dryer, execute = folding. Each stage has its own hardware, so each one can be working on a different instruction at the same time.
Below, time runs left to right in equal clock cycles, and each row is one instruction. Watch how, once the pipeline fills up, every column has all three stages busy at once — a fetch, a decode and an execute happening together, each on a different instruction. Step through it and compare the pipelined schedule with the slow one-at-a-time schedule underneath.
Notice two things. First, in the pipelined version each instruction still takes the same three cycles to pass all the way through — pipelining did not make one instruction quicker. Second, after the pipeline fills, a brand-new instruction completes every single cycle instead of every three cycles. That is where the speed-up comes from: they finish three times more often.
Let's put numbers on it. Suppose each stage takes one cycle, so the pipeline has
And the more instructions you run, the better it gets. The three "fill the pipeline" cycles are a
one-off cost paid at the very start; spread over a program of millions of instructions, it becomes a
rounding error, and a
The lovely steady stream only works if the CPU always knows which instruction comes next so it can fetch it early. Sometimes it can't, and the pipeline hits a hazard — a situation that stops the overlap working cleanly.
The most important one at A-level is the branch (a jump, such as the if-decision or a loop). At a branch the program can carry on in one of two directions, and the CPU won't know which until that branch instruction has actually executed. But by then it has already fetched and started decoding the instructions sitting right after the branch — on the assumption the program just runs straight on. If the branch jumps somewhere else instead, those half-processed instructions are wrong. The CPU must flush them (throw them away) and stall (wait, doing nothing useful) while it re-fills the pipeline from the correct place. Those wasted cycles are called a pipeline bubble.
Real CPUs fight back with clever tricks such as branch prediction — guessing which way a branch will go and speculatively fetching that way — but a wrong guess still costs a flush. Hazards are the reason a 5-stage pipeline never quite gives you a full 5× speed-up in practice.
Pipelining improves throughput — how many instructions finish per second — not latency, the time for one single instruction to get through. A pipelined instruction still takes just as long (or slightly longer) to travel end-to-end as it would on its own; you've simply arranged for lots of them to be in flight at once. It's like a motorway: adding lanes doesn't make your car faster, but far more cars get past per minute. Never say "pipelining makes each instruction quicker" — say "pipelining makes instructions finish more often."
And remember the catch: hazards, above all a branch (jump), break
the flow. When the CPU doesn't yet know which instruction is next, it may fetch the wrong ones and
then have to flush the pipeline and stall — losing some of the
benefit. The theoretical
A CPU has a 3-stage pipeline (fetch, decode, execute), each stage taking one clock
cycle. It runs a straight run of
Now suppose instruction 3 is a branch that jumps elsewhere, and the two instructions the CPU had
already fetched behind it must be flushed, costing
If a