I/O and Interrupts

Right now your CPU is executing something like a few billion instructions every second. Now press a key on your keyboard. From the CPU's point of view, the gap between two keystrokes — even a fast typist's — is an eternity: tens of millions of instructions could run in the time it takes your finger to come back down. A spinning disk, a network card, a printer: to the processor they are all impossibly, painfully slow.

This is the central problem of input/output. The CPU is a sprinter; the devices it talks to move like glaciers. If the fast thing has to wait around for the slow thing, you have thrown away billions of instructions' worth of work. The whole art of I/O is: how does a fast CPU deal with slow devices without wasting all its time waiting? That one question is what this page is about, and the answer — interrupts — is one of the most important ideas in how a computer actually works.

How the CPU touches a device

The CPU never wires straight into a keyboard or a disk platter. Between them sits a device controller — a little piece of dedicated hardware that speaks the device's language on one side and the CPU's language on the other. The controller exposes a handful of device registers: tiny storage slots that the CPU can read and write to give commands and check on progress. Typically there are three kinds:

How does the CPU reach those registers? On almost all modern machines the trick is memory-mapped I/O: each device register is assigned an ordinary memory address. To send a command, the CPU just does a normal store to that address; to read the status, a normal load. The device controller watches the address bus and answers when its addresses appear. So talking to a printer looks, in the instruction stream, exactly like writing to memory — no special I/O instructions needed.

Two ways to wait: polling vs interrupts

Suppose the CPU has asked the disk for a block of data. The data will arrive — but not for a while. There are two fundamentally different strategies for finding out when it is ready.

Polling (busy-waiting). The CPU sits in a tight loop, reading the status register over and over, asking "ready yet? ready yet? ready yet?" until the flag flips:

while (status_register.busy) { // do nothing — just ask again } // finally! now read the data

It works, and it is simple, but look at the cost: while the device is idle the CPU burns every single cycle asking a question whose answer is "no". Millions of instructions spent achieving nothing. It is like standing at the toaster refusing to do anything else until the toast pops.

Interrupt-driven I/O. Instead, the CPU says "start the read, and tap me on the shoulder when you're done" — then goes off and runs other programs. When the data is ready, the device raises an interrupt: an electrical signal (an IRQ, interrupt request) that yanks the CPU's attention away, just long enough to deal with the device, after which the CPU carries on exactly where it left off. The toast now shouts "I'm done!" and you spend the waiting time doing the dishes.

The efficiency argument is decisive. If a device takes N units of time to become ready, polling wastes work proportional to N — all of it. Interrupts cost a small, fixed amount of overhead once, and the CPU does useful work for the rest of the wait. On a multitasking operating system, that reclaimed time isn't idle — it runs other processes.

A very natural mistake: "interrupts must be slower than polling — there's all that saving and restoring and jumping around." Per event, an interrupt genuinely does cost more than a single status check. But that misses the point entirely. Polling doesn't do one check — it does millions, back to back, the whole time the device is idle, and every one of them is wasted. The interrupt pays its overhead once and leaves the CPU free the rest of the time.

The honest exception: if a device is almost always ready instantly (very fast, very high throughput), the interrupt overhead can dominate, and a short spin of polling wins. Real systems sometimes do both — spin briefly, then fall back to interrupts. But for the slow devices that dominate everyday computing, interrupt-driven I/O is the clear winner.

The interrupt cycle, step by step

When an interrupt fires, the CPU performs a precise little dance. The key insight is that the interrupt arrives at an unpredictable moment, in the middle of some other program — so the hardware must be able to duck out, handle the device, and return so seamlessly that the interrupted program never even notices. Step through the cycle:

The heart of it is save context → run the handler → restore context. The context is the CPU's registers and program counter — the exact "where was I". Because it is saved and restored perfectly, the interrupted program resumes as if nothing happened, blissfully unaware it was ever paused. The code that actually deals with the device is the Interrupt Service Routine (ISR), also called the interrupt handler.

How does the CPU know which handler to run? Each interrupt type has a number, and the CPU looks that number up in the interrupt vector table — an array in memory holding the address of each device's ISR. Keyboard interrupt? Look up entry, say, 1; find the address of the keyboard handler; jump there. It is exactly a lookup table from "which device shouted" to "which code handles it".

The trouble with big transfers — and how DMA fixes it

Interrupts solve the waiting problem, but there's a second problem hiding underneath. So far, whenever a byte is ready, the CPU itself copies it from the device register into memory. That's called programmed I/O, and for a single keystroke it's fine. But imagine reading a 4 KB disk block, one word at a time: the device signals, the CPU copies a word, the device signals, the CPU copies a word… thousands of times. Even with interrupts, the CPU is now a glorified bucket brigade, doing nothing but shovelling bytes between two places. That's a colossal waste of a fast processor.

Direct Memory Access (DMA) is the fix. There is a dedicated DMA controller whose entire job is to move blocks of data between a device and memory on its own. The CPU sets up the transfer once — "read 4 KB from the disk into memory starting at address X" — and then walks away to do other work. The DMA controller does all the byte-by-byte copying itself, directly to memory, without bothering the CPU. Only when the whole block is done does it raise one interrupt to say "finished".

Count the interrupts. Copying n words with programmed I/O costs on the order of n interrupts (or n busy CPU copies); with DMA it costs exactly one, no matter how big the block. That is why every real disk, network card, and GPU uses DMA for bulk transfers — the CPU is freed to compute while data streams into memory beside it.

The name "Direct Memory Access" trips people up. It does not mean the CPU reaches into memory more directly, and it does not mean the CPU does the copy. It's the opposite: the DMA controller — a separate device — accesses memory directly, instead of the CPU. The whole point is to take the copying job off the CPU. During a DMA transfer the CPU can run ordinary program code; it is notified just once, at the end.

They share the same memory bus, so they can't both use it at the exact same instant. The DMA controller "steals" bus cycles when it needs them — a technique literally called cycle stealing. Each steal delays the CPU by a whisker, but the CPU keeps running between steals. The net effect is still an enormous win: a tiny, occasional slowdown instead of the CPU being fully occupied copying every single word by hand.

Putting it together

Here is the whole story in one breath. A device is slow, so the CPU doesn't wait for it (interrupts instead of polling). A transfer is big, so the CPU doesn't copy it (DMA instead of programmed I/O). In both cases the design principle is identical: don't make the fast, expensive CPU wait on or babysit the slow, cheap device. Set the work in motion, walk away, and get one tap on the shoulder when it's done.

A modern disk read is all three ideas at once: the CPU issues the command through memory-mapped registers, a DMA controller streams the block straight into RAM, and a single interrupt signals completion — all while the CPU happily runs other programs.