Intermediate Representation

A compiler has a hard job: it must accept many source languages and produce code for many target machines. C, Rust, Swift and Fortran on the front; x86, ARM, RISC-V and WebAssembly on the back. If every front end talked directly to every back end, you would need one translator for every pairing. Instead, real compilers pass everything through a single shared middle language — the intermediate representation, or IR.

The IR is the compiler's lingua franca: the "waist of the hourglass" where many front ends pour in and many back ends flow out. Once semantic analysis has checked the abstract syntax tree and confirmed the program is meaningful, the compiler lowers that tree into a flat, simple, machine-independent form. This page is about the most classic IR of all: three-address code.

Why an IR at all? The m×n → m+n argument

Suppose you support m source languages and n target machines. Wiring each front end straight to each back end costs m × n translators — and adding one new CPU means writing m brand-new back ends, one per language. That does not scale.

Route everything through a shared IR and the arithmetic changes completely. Each front end lowers to the IR (that is m pieces) and each back end lowers from the IR (that is n pieces): m + n in total. Add a new CPU and you write one back end — every existing language retargets to it for free. This is exactly why the IR exists, and it is called retargetability.

Decouple the ends. The front end need not know the target CPU; the back end need not know the source language. They meet only at the IR.
Retargetability (m + n, not m × n). m front ends plus n back ends share one middle language, so each new language or machine is one new piece, not a whole row or column.
A good altitude for optimisation. The IR sits below the source (no nested expressions, no syntactic sugar) but above the machine (no registers, no stack layout). At that height, optimisations like constant folding and dead-code elimination are written once and benefit every language and every target.

You can — tiny compilers do exactly that. But you pay for it. Every optimisation you want (removing a redundant computation, folding 2 + 3 into 5, deleting unreachable code) has to be re-implemented for every source language and re-tuned for every instruction set. An IR gives you one place to write those passes. The famous modern example is LLVM IR: Clang (C/C++), Rust, Swift and dozens of other languages all lower to it, the same optimiser runs on all of them, and back ends emit x86, ARM, RISC-V and WebAssembly from that single representation. That is the m + n dream made real.

Three-address code

The defining feature of three-address code (TAC) is right there in the name. Each instruction has at most one operator and refers to at most three addresses: typically two source operands and one destination. The canonical shape is:

x = y \;\;\text{op}\;\; z

where x, y and z are addresses — a program variable, a constant, or a compiler-generated temporary. Temporaries (written t1, t2, t3, …) are fresh names the compiler invents to hold the result of one small step. They are the glue that lets a big expression be chopped into single-operator pieces.

Because every instruction does only one thing, a rich, deeply nested source expression must be flattened (linearised) into a straight-line sequence of these tiny instructions. That flattening is the heart of IR generation.

Worked example: flattening `a = b * c + d`

Take the assignment a = b * c + d. In the source it is a single expression, but its AST has two internal operator nodes: a multiply feeding an add. Following the usual precedence, b * c happens first, then the sum. We walk the tree, and every internal node gets its own temporary:

Source: a = b * c + d Step 1 — compute the multiply into a temp: t1 = b * c Step 2 — add d, into a second temp: t2 = t1 + d Step 3 — store the result into a: a = t2

Notice how each line has at most one operator, and the temporaries carry results forward: t1 holds b * c, then t2 holds t1 + d. Read top to bottom, it is a recipe a machine can follow one step at a time. A bigger expression such as x = (a + b) * (c - d) flattens the same way — one temp for a + b, one for c - d, one for the product:

Source: x = (a + b) * (c - d) t1 = a + b t2 = c - d t3 = t1 * t2 x = t3

The rule is simple and mechanical: one temporary per internal (operator) node of the tree. Leaves — plain variables and constants — need no temp; they are already addresses.

Building an IR generator

Flattening is easy to automate. Model the expression as a tiny AST — a number, a variable, or a binary operation with a left and right child — and write emit, a recursive function that lowers a node to TAC. For a leaf it just returns the address (the number or variable name). For an operation it lowers both children first (post-order, just like an evaluator), allocates a fresh temporary, appends one instruction, and returns that temp as the address holding its result. Press Run:

// A tiny expression AST: a number, a variable, or a binary operation. type Expr = | { kind: "num"; value: number } // a constant leaf | { kind: "var"; name: string } // a variable leaf | { kind: "op"; op: "+" | "-" | "*" | "/"; left: Expr; right: Expr }; // an operation const code: string[] = []; // the emitted three-address instructions, in order let tempCount = 0; // counter for fresh temporaries t1, t2, t3, ... function newTemp(): string { tempCount += 1; return "t" + tempCount; } // emit: lower an Expr to TAC and RETURN the address holding its value. function emit(node: Expr): string { switch (node.kind) { case "num": // a leaf: the constant IS its own address return String(node.value); case "var": // a leaf: the variable name IS its own address return node.name; case "op": { // lower BOTH children first, then this operator const left = emit(node.left); const right = emit(node.right); const t = newTemp(); // one fresh temp per operator node code.push(t + " = " + left + " " + node.op + " " + right); return t; // the temp now holds the result } } } // Build the AST for (a + b) * (c - d) const expr: Expr = { kind: "op", op: "*", left: { kind: "op", op: "+", left: { kind: "var", name: "a" }, right: { kind: "var", name: "b" } }, right: { kind: "op", op: "-", left: { kind: "var", name: "c" }, right: { kind: "var", name: "d" } }, }; const result = emit(expr); // flatten the tree into `code` code.push("x = " + result); // store into the destination variable x console.log("Three-address code for x = (a + b) * (c - d):"); for (const line of code) console.log(" " + line); // t1 = a + b // t2 = c - d // t3 = t1 * t2 // x = t3

That short function is a real IR generator. The recursion mirrors the tree, and because children are lowered before their parent, the temporaries are produced in exactly the order a machine must execute them. Change the AST and re-run: any expression, however nested, comes out as flat three-address code.

Control flow: labels and jumps

Straight-line TAC handles arithmetic, but real programs branch and loop. The IR expresses control flow the way the machine does — with labels (named positions in the code) and jumps. There are two kinds of jump: an unconditional one (goto L, always taken) and a conditional one (if t goto L, taken only when the temporary t is true).

So a source-level if lowers into a test, a conditional jump, and some labels. Here is if (a < b) max = b; else max = a; as three-address code:

Source: if (a < b) max = b; else max = a; t1 = a < b // compare into a temporary (1 = true, 0 = false) if t1 goto L1 // conditional jump to the "then" part max = a // the else branch falls through to here goto L2 // skip over the then branch L1: max = b // the then branch L2: // both branches rejoin here

Every high-level construct — while, for, switch, short-circuit && — reduces to this same vocabulary of comparisons, labels and (conditional or unconditional) jumps. The nested, indented shape of the source melts into a flat list, which is exactly what a CPU wants: it has no if statement, only "compare, then maybe jump".

Other flavours of IR

Three-address code is the classic, but it is not the only IR. Compilers pick a form that suits the work at hand, and often use several in sequence:

SSA (static single assignment). A refined TAC where every variable is assigned exactly once. Reusing a name means making a new version (x1, x2), which makes it far easier to track where each value came from. LLVM IR is in SSA form, and modern optimisers love it.
Stack / bytecode IR. Instead of naming temporaries, operands live on a stack: push a; push b; mul. The JVM and Python's bytecode work this way — compact and easy to interpret, at the cost of being a little further from register machines.
Tree / high-level IR. Some compilers keep an intermediate tree that is lower than the AST but still hierarchical, useful for pattern-matched instruction selection before the final flattening.

They all serve the same purpose: a stable, machine-independent middle ground where the front end can put the program down and the back end (and the optimiser) can pick it up.