Compiler Phases: Front, Middle, and Back End

You write x = a + b * 2; and, moments later, a processor is toggling billions of transistors to make it happen. Between those two worlds sits an enormous gap: your source is a string of characters pitched at humans, while the machine wants a stream of numbered opcodes pitched at silicon. A compiler crosses that gap — and, crucially, it does not do it in one heroic leap. It works like a factory assembly line, handing the program down a sequence of phases, each transforming one well-defined representation into the next.

This page is about that pipeline as a whole — the shape of the whole journey, not the internals of any one station. Traditionally the phases are grouped into three parts: the front end (which understands the source language), the middle end (which improves a neutral intermediate form), and the back end (which emits code for a particular machine). Getting this map in your head first makes every later topic — lexing, parsing, optimisation, register allocation — click into its proper slot.

One long conveyor belt of representations

The single most useful idea here is that each phase consumes one representation of the program and produces another. The program is never destroyed; it is repeatedly re-encoded into a form that makes the next job easy. Follow the goods down the belt:

Lexer (scanner). Consumes the raw character stream; produces a flat list of tokens — the words and punctuation of the language (ID(x), =, ID(a), +, NUM(2)). Whitespace and comments vanish. The lexer is essentially a bank of finite-state machines, one per token kind, run over the text.
Parser. Consumes the token stream; produces a parse tree, which is then distilled into an Abstract Syntax Tree (AST) that captures nesting and precedence. a + b * 2 becomes a tree with + at the root and * below it — the structure the flat tokens only implied.
Semantic analysis. Consumes the AST; produces an annotated (decorated) AST. It walks the tree building a symbol table, checks types, resolves names to declarations, and flags errors like "undeclared variable" or "can't add a string to an int". Nothing about the tree's shape changes — it gets labelled with meaning.
IR generation. Consumes the annotated AST; produces intermediate representation (IR) — a low-level, machine-independent code, often three-address code like t1 = b * 2; t2 = a + t1; x = t2;. This is the neutral lingua franca that sits between all source languages and all target machines.
Optimiser. Consumes IR; produces better IR — same meaning, less work. Constant folding, dead-code elimination, common-subexpression elimination and friends run here. Everything about the program's behaviour is preserved; only its cost changes.
Code generator. Consumes optimised IR; produces target code — assembly or machine instructions for a specific CPU, including the gritty business of register allocation and instruction selection.

Read the middle column of that list top to bottom and you have the whole story: characters → tokens → parse tree / AST → annotated AST → IR → optimised IR → target code. Six transformations, each simple because the previous one already did its part.

The three ends

Group those six phases and the classic three-part split appears:

Front end — lexer, parser, semantic analysis. Everything that depends on the source language. Its output is language-flavoured meaning captured as an annotated AST (or fresh IR). It answers: "is this a legal C / Rust / Swift program, and what does it say?"
Middle end — IR generation and optimisation. Language-and machine-neutral. It works purely on the shared IR, so the same optimiser benefits every language that lowers to it. It answers: "how do I make this cheaper without changing what it does?"
Back end — code generation, register allocation, instruction scheduling. Everything that depends on the target machine. It answers: "what x86-64 / ARM / RISC-V instructions realise this IR?"

Because it turns a multiplication into an addition. Suppose you want to support m source languages on n target machines. Build a separate, monolithic compiler for each pairing and you need m × n of them — 5 languages × 4 chips is 20 whole compilers to write and maintain. Route everything through one shared IR instead, and you need only m front ends (one per language, each lowering to the IR) plus n back ends (one per machine, each starting from the IR): just m + n = 9 components. Add a brand-new language and you write one front end and instantly target every existing machine; add a new chip and every existing language can target it for free.

This is exactly why real toolchains are built this way. GCC has front ends for C, C++, Fortran, Go and more, all meeting at its GIMPLE/RTL internal forms before fanning out to dozens of back ends. LLVM makes the IR the star of the show: Clang (C/C++), Rust, Swift and Julia all emit LLVM IR, and a single set of back ends compiles that IR to x86, ARM, RISC-V, WebAssembly and beyond. The IR is the pinch point that makes the whole ecosystem retargetable.

Why bother with the middle at all?

You could imagine a compiler that skips the IR — parse straight to machine code. Some tiny compilers do. But the IR earns its keep three times over. First, it is where optimisation lives: rewriting three-address code is far easier than rewriting either a syntax tree or raw assembly. Second, it is the decoupling layer that gives you the m + n win above. Third, it is portable reasoning — analyses like liveness and constant propagation are written once against the IR and apply to every language and machine.

The front end depends only on the source language; the back end depends only on the target machine; the middle end depends on neither.
They communicate exclusively through a shared intermediate representation.
Supporting m languages and n machines therefore costs m + n components (m front ends + n back ends), not m × n whole compilers.
Adding a language means writing one front end; adding a machine means writing one back end.

A worked pass down the belt

Watch a single statement descend through the phases. Each arrow is one transformation; notice how the representation gets steadily lower-level while the meaning is preserved end to end.

Source (characters): x = a + b * 2; Lexer → tokens: ID(x) ASSIGN ID(a) PLUS ID(b) STAR NUM(2) SEMI Parser → AST: (=) / \ x (+) / \ a (*) / \ b 2 Semantic → annotated AST: every name resolved via the symbol table; types checked (a,b,x : int); 2 : int literal IR (three-address code): t1 = b * 2 t2 = a + t1 x = t2 Optimised IR: t1 = b * 2 ; (nothing constant-foldable here, x = a + t1 ; but t2 was a needless copy → removed) Back end → target (x86-64): mov eax, [b] imul eax, 2 ; or: lea eax, [eax*2] add eax, [a] mov [x], eax

Same statement, seven costumes. The front end changed text into structured meaning; the middle end tidied the IR; the back end chose real instructions and registers. No single phase is doing anything heroic — that is the point.

A "phase" is not a "pass". A phase is a logical stage (lexing, parsing, optimising). A pass is one physical traversal over the whole program. These are independent: a single-pass compiler can run several phases interleaved as it sweeps through once, while an optimiser might make many passes over the IR all within the one "optimisation" phase. Don't equate the two — the pipeline diagram shows phases, not passes.
The lexer does not parse. The lexer only chops characters into a flat list of tokens; it has no idea about nesting, precedence, or whether the brackets balance. Recognising structure — that b * 2 binds tighter than a + … — is the parser's job. A lexer is regular; a parser is (context-free and) more powerful.
Semantic analysis does not restructure the tree. It decorates the AST with types and resolved names and rejects illegal programs; it doesn't rebuild the tree the way the parser does. "Legal syntax" (parser) and "legal meaning" (semantic analysis) are different gates — x = "hi" + 3; can parse perfectly and still fail semantic analysis.
The IR is not assembly. IR is deliberately machine-independent: it has unlimited virtual registers (t1, t2, …) and no notion of a specific CPU. Turning those virtuals into a machine's finite real registers is the back end's register-allocation job.