The Surface Code
Every real quantum chip is a graveyard of good intentions: qubits dephase, gates misfire, and a
pristine state rots in microseconds. To compute for real we must build a fault-tolerant
memory out of these leaky parts — and the design the whole industry has converged on looks,
of all things, like a chessboard. The surface code tiles a flat 2D
grid with qubits and checks them with the simplest possible measurements: little local questions asked
of a handful of touching neighbours. From nothing but that local chatter it protects a single
logical qubit hidden in the global shape of the grid. It is not the most efficient
code known, but it is the most forgiving: it tolerates the highest error rate of any
practical scheme, which is exactly why Google and IBM build their hardware around it. This page shows
how a code made of purely local rules can defend a fundamentally non-local secret.
Qubits on a grid, checks that only look next door
The surface code is a
stabilizer code,
so it is defined entirely by a set of commuting Pauli operators — the stabilizers, or
checks — that we measure over and over. What makes it special is where those checks
live: on a two-dimensional lattice.
- Lay down a square grid. Put one physical (data) qubit on every edge.
- Each vertex of the grid carries a star check — a product of Pauli
X over the qubits touching that vertex.
- Each face (the little square "plaquette") carries a plaquette check
— a product of Pauli Z over the qubits bordering that face.
Because a code is called topological when its stabilizers are local — each one
touches only a few neighbouring qubits and nothing far away — the surface code is the archetypal
topological stabilizer code. That locality is the whole game: to measure a check you
only ever entangle a handful of adjacent qubits, which is precisely what a flat chip of nearest-neighbour
couplers can actually do.
The two check types: stars and plaquettes
On the interior of the lattice every check has weight four — it involves exactly four
data qubits. For a vertex v and a face p,
A_v \;=\; \prod_{e \,\ni\, v} X_e \qquad\text{(star / vertex, all $X$)}, \qquad\qquad B_p \;=\; \prod_{e \,\in\, \partial p} Z_e \qquad\text{(plaquette / face, all $Z$)}.
These all have to commute for them to be simultaneous stabilizers, and they do — for a
beautiful geometric reason. A star and a plaquette either share no edge, or they share
exactly two. Where they overlap, an X and a
Z anticommute; but two such overlaps flip the sign twice, so the operators
commute overall. Geometry does the bookkeeping that would otherwise be a page of algebra.
Measuring every A_v and B_p yields the
syndrome: a list of \pm 1 outcomes. On a clean state every
check reads +1. An error somewhere will flip some checks to
-1 — and, crucially, it does so without measuring the data itself, so
the encoded information is never disturbed.
Picture the lattice
Here is a small patch. Data qubits are the dots on the edges. Step forward to light up one star check
(four Xs round a vertex), then one plaquette check (four
Zs round a face), then a single X error and the two
syndrome defects it creates.
Worked example 1: one X error makes a pair of defects
Suppose a single data qubit suffers a bit-flip X error. Which checks notice?
A star check is all X, and X commutes with
X, so no star check is disturbed. But a plaquette check is
all Z, and our flipped qubit sits on two plaquettes — the faces to
either side of its edge. On each of those, the error's X meets the check's
Z once and anticommutes, flipping that plaquette's outcome:
B_p \, X_e \;=\; -\, X_e \, B_p \quad\text{for each of the two faces } p \text{ touching edge } e.
So the syndrome shows exactly two plaquette checks reading -1
— a pair of defects — while everything else stays +1. The
error is invisible except at its two endpoints. (A Z error is the mirror
image: it flips the two star checks at the ends of its edge. A Y = iXZ
error trips both a star pair and a plaquette pair.) That endpoint-only signature is the key to reading
errors off the grid.
Errors are chains; the decoder pairs up the defects
Now flip a whole row of qubits — a chain of X
errors. Every plaquette in the middle of the chain gets flipped twice (once by each of
its two flipped edges), so it reads +1 again. Only the two ends
of the chain are flipped an odd number of times. The upshot is a rule with enormous consequences:
the syndrome only ever reveals the endpoints of an error chain, never the chain itself.
The job of the classical decoder is to look at the lit-up defects and guess a chain that
could have produced them. Since defects come in pairs, it pairs them up — and the standard tool is
minimum-weight perfect matching (MWPM): connect the defects with the shortest total set
of chains, on the bet that fewer errors is more likely than more. Apply the matched correction and, if
the guess is right, the data is restored. If it is wrong, it is usually only wrong by a small closed loop
— and a loop of Xs (or Zs) around trivial faces is a
product of stabilizers, which does nothing to the encoded state. Being wrong "up to a stabilizer"
is being right.
The logical qubit hides in the global shape
So where is the protected information? Not in any single qubit — in the lattice's
topology. The encoded logical operators are strings of Paulis
that run all the way across the patch, from one boundary to the opposite one (or around
a hole):
\bar X \;=\; \prod_{e \,\in\, \gamma} X_e, \qquad \bar Z \;=\; \prod_{e \,\in\, \gamma'} Z_e,
where \gamma and \gamma' are non-contractible paths
spanning the lattice. Because these operators are non-local — they cross the whole grid —
no local error can accidentally apply one. To corrupt the logical qubit, noise would have to conspire into
a chain that stretches clear across the patch and evades the decoder. Local noise is powerless
against a globally-defined secret; that is the entire point of a topological code.
Worked example 2: distance grows with the lattice
The code distance d is the length of the shortest
logical operator — the fewest single-qubit errors that can flip the encoded qubit while fooling the
decoder. On a surface-code patch of side d, that shortest error chain is one
that reaches from a boundary all the way to the far boundary: it must be about d
qubits long.
A distance-d code corrects any error touching up to
\lfloor (d-1)/2 \rfloor qubits. So take a
d = 3 patch: a single error (weight 1) always produces a defect pair the
decoder can shorten away, so one error is fully corrected. It takes a coordinated
chain of about d = 3 errors, spanning the patch, to sneak an
undetectable logical flip past the matching. Now enlarge the grid to
d = 5, then 7, then 11:
the shortest fatal chain gets longer every time, so the logical error rate is driven down
exponentially in d (as long as the physical error rate sits
below threshold). You buy more protection simply by making the chessboard bigger. The
number of physical qubits grows as
n \;\sim\; \mathcal{O}(d^2),
— roughly one qubit per tile — so a distance-11 patch is already a couple of hundred physical qubits for
a single logical one.
Step back and admire the trick. Nature hands us qubits that only talk to their immediate neighbours on a
flat chip, and errors that strike locally and at random. The surface code answers in kind: it asks only
local questions — "do the four qubits round this square agree?" — never a global one.
Yet the answers to all those parochial little questions, taken together, pin down the endpoints
of every error, and the thing they collectively protect is a pattern that spans the entire board. It is
the same reason you can tell a chessboard has been tampered with by checking squares two at a time,
without ever needing to see the whole board at once. Local checks, global secret: a memory that is robust
precisely because the information it holds is written in the geometry, not in any one square.
That is why, when engineers sketch the most promising quantum memory we know how to build, it comes out
looking like a chessboard.
The headline number for the surface code is its threshold: as long as each physical
component fails less often than roughly 1% of the time, making the lattice bigger makes
the logical qubit better. That ~1% is the most generous threshold of any practical code, and it
is the reason the surface code won the hardware race — a 1-in-100 error budget is something real
superconducting and trapped-ion chips can plausibly meet. But do not mistake "forgiving" for "efficient".
The virtue here is locality and a high threshold, not qubit economy. Because
n \sim \mathcal{O}(d^2), a genuinely useful logical qubit needs
hundreds to thousands of physical qubits — which is why fault-tolerant machines are
quoted in the millions of qubits.
Second trap: a logical operation is not a single-qubit gate. A logical
\bar X or \bar Z is a whole string spanning
the patch, and logical gates are performed by non-local gymnastics — braiding defects, lattice surgery,
deforming boundaries — not by poking one qubit. The locality that makes the surface code so easy to
check is exactly what makes it fiddly to compute with. Making all of this actually
reduce the error rate — rather than adding more moving parts than it fixes — is the promise of the
threshold theorem.
- What it is. A topological stabilizer code on a 2D lattice: one
physical qubit per edge, protecting a logical qubit encoded in the lattice's global topology.
- Two check types. Star checks
A_v = \prod_{e \ni v} X_e on the vertices (all X)
and plaquette checks B_p = \prod_{e \in \partial p} Z_e on
the faces (all Z). Weight-four in the bulk, and all commuting.
- Why hardware loves it. Every check is local and low-weight,
touching only nearest-neighbour qubits — a perfect fit for a flat 2D chip.
- Errors are chains. The syndrome reveals only a chain's endpoints
(its defects); a classical decoder (e.g. minimum-weight perfect matching) pairs the
defects to infer the correction. An X error flips two plaquette defects; a
Z error flips two star defects.
- Logical = non-local. Logical \bar X, \bar Z are Pauli
strings spanning the lattice, so local noise cannot apply them by accident.
- Bigger is safer. The distance d grows with the lattice;
the logical error rate falls exponentially in d below threshold, at a cost
of n \sim \mathcal{O}(d^2) physical qubits.
- Headline. A fault-tolerance threshold of about 1% — the most
forgiving practical code known — which is why Google and IBM target it. Its virtue is locality and
threshold, not efficiency.