Quantum Channels and Kraus Operators
In an ideal, perfectly isolated quantum computer, a state evolves by a
unitary:
|\psi\rangle \mapsto U|\psi\rangle, or in
density-matrix
language \rho \mapsto U\rho\,U^\dagger. Unitaries are reversible, they preserve
purity, and nothing leaks out. That is the physics of a closed system.
Real qubits are not closed. As we saw with
decoherence and noise,
a qubit is forever whispering to its environment — stray photons, phonons, fluctuating fields. The qubit
plus its environment evolve unitarily together, but the qubit alone does not. When we
trace the environment away, what is left is a more general kind of evolution — one that can turn a pure
state mixed, shrink coherences, and inject randomness. That evolution is a quantum channel,
and this page gives you the one compact formula that captures every kind of noise at once.
What a quantum channel is
A quantum channel is a map \varepsilon that takes a density
matrix in and returns a density matrix out, \rho \mapsto \varepsilon(\rho). To
be a legal physical process it must satisfy three conditions — together abbreviated
CPTP:
- Linear — \varepsilon(a\rho_1 + b\rho_2) = a\,\varepsilon(\rho_1) + b\,\varepsilon(\rho_2),
so mixtures map to mixtures.
- Completely positive (CP) — it keeps every density matrix positive semidefinite (no
negative probabilities), and crucially it does so even when the qubit is entangled with a
bystander. "Completely" is the extra insurance that applying the channel to one half of an
entangled pair still yields a valid joint state.
- Trace-preserving (TP) — \operatorname{Tr}\big(\varepsilon(\rho)\big) = 1,
so probabilities still sum to one; nothing is lost.
A unitary \rho \mapsto U\rho\,U^\dagger is the simplest special case of a
channel. But channels are strictly more general: they describe irreversible, noisy evolution
that no single unitary ever could.
The Kraus (operator-sum) representation
The three CPTP words sound abstract — but there is a beautiful theorem that turns them into concrete
arithmetic. Every quantum channel can be written as a sum of "sandwich" terms:
\varepsilon(\rho) = \sum_k K_k\,\rho\,K_k^\dagger, \qquad \text{with}\quad \sum_k K_k^\dagger K_k = I.
The matrices K_k are the Kraus operators (or
operation elements), and this is the operator-sum representation. The
constraint \sum_k K_k^\dagger K_k = I is the completeness
condition — it is exactly what makes the map trace-preserving. A single unitary is the
one-operator case, K_0 = U (and indeed U^\dagger U = I).
Here is the physical picture that makes Kraus operators unforgettable. Read each
K_k as a possible error branch: with probability
p_k = \operatorname{Tr}\!\big(K_k\,\rho\,K_k^\dagger\big)
the environment "chooses" branch k, and conditioned on that, the qubit is
transformed to the normalised state K_k\rho K_k^\dagger / p_k. Because we do
not know which branch actually happened, the output is the probability-weighted mixture of all of them —
which is precisely \sum_k K_k\rho K_k^\dagger. The completeness condition
\sum_k K_k^\dagger K_k = I guarantees \sum_k p_k = 1:
some branch always happens.
A channel is a tree of error branches
The clearest way to hold the operator sum in your head is a branching tree. The input
\rho splits into one branch per Kraus operator; each branch carries a weight
p_k = \operatorname{Tr}(K_k\rho K_k^\dagger); and the channel's output is the
recombined mixture. Below is the two-branch tree for the bit-flip channel — step through
its branches:
Either nothing happens (the identity branch, weight 1-p) or an
X error strikes (weight p). Every named noise
process below is just a different tree — different operators hanging off \rho.
Worked example 1: the bit-flip channel on |0\rangle
The bit-flip channel applies an X with probability
p and does nothing with probability 1-p. Its Kraus
operators are
K_0 = \sqrt{1-p}\;I, \qquad K_1 = \sqrt{p}\;X.
First check completeness: K_0^\dagger K_0 + K_1^\dagger K_1 = (1-p)I + p\,X^\dagger X = (1-p)I + pI = I,
since X^\dagger X = I. Good. Now feed in \rho = |0\rangle\langle0| = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}:
\varepsilon(\rho) = (1-p)\,I\,\rho\,I + p\,X\,\rho\,X = (1-p)\,|0\rangle\langle0| + p\,X|0\rangle\langle0|X.
Because X|0\rangle = |1\rangle, the second term is
p\,|1\rangle\langle1|, so
\varepsilon\big(|0\rangle\langle0|\big) = (1-p)\,|0\rangle\langle0| + p\,|1\rangle\langle1| = \begin{bmatrix} 1-p & 0 \\ 0 & p \end{bmatrix}.
A pure |0\rangle has become a classical mixture: it reads
0 with probability 1-p and 1
with probability p — exactly a noisy bit.
The standard single-qubit channels
Four channels appear again and again. Each is defined by a short list of Kraus operators — commit these
to memory and you can model almost any single-qubit noise:
| Channel | Kraus operators | Models |
| Bit-flip |
K_0 = \sqrt{1-p}\,I,\quad K_1 = \sqrt{p}\,X |
random X errors |
| Phase-flip |
K_0 = \sqrt{1-p}\,I,\quad K_1 = \sqrt{p}\,Z |
random Z errors (dephasing) |
| Depolarizing |
\sqrt{1-\tfrac{3p}{4}}\,I,\;\; \sqrt{\tfrac{p}{4}}\,X,\;\; \sqrt{\tfrac{p}{4}}\,Y,\;\; \sqrt{\tfrac{p}{4}}\,Z |
uniform noise, \rho \to (1-p)\rho + p\,\tfrac{I}{2} |
| Amplitude damping |
K_0 = \begin{bmatrix} 1 & 0 \\ 0 & \sqrt{1-\gamma} \end{bmatrix},\quad K_1 = \begin{bmatrix} 0 & \sqrt{\gamma} \\ 0 & 0 \end{bmatrix} |
T_1 energy decay |1\rangle \to |0\rangle |
The depolarizing channel replaces the state with the maximally mixed
\tfrac{I}{2} with probability p; using the identity
\tfrac{I}{2} = \tfrac14\big(\rho + X\rho X + Y\rho Y + Z\rho Z\big) gives the
symmetric Pauli Kraus set above. Amplitude damping is different in character: it is
not a random Pauli at all. Its K_1 irreversibly drops
|1\rangle down to |0\rangle — the qubit emitting a
photon and relaxing to its ground state — which is why it, and not the flips, models
T_1 decay. (Check its completeness:
K_0^\dagger K_0 + K_1^\dagger K_1 = \operatorname{diag}(1, 1-\gamma) + \operatorname{diag}(0, \gamma) = I.)
Worked example 2: the phase-flip channel eats coherence
The phase-flip channel has Kraus operators
K_0 = \sqrt{1-p}\,I and K_1 = \sqrt{p}\,Z, so
\varepsilon(\rho) = (1-p)\rho + p\,Z\rho Z. Apply it to
|{+}\rangle = \tfrac{1}{\sqrt2}(|0\rangle + |1\rangle), whose density matrix is
\rho_{+} = \begin{bmatrix} \tfrac12 & \tfrac12 \\[2pt] \tfrac12 & \tfrac12 \end{bmatrix}.
Since Z|{+}\rangle = |{-}\rangle, we have
Z\rho_{+}Z = |{-}\rangle\langle{-}| = \begin{bmatrix} \tfrac12 & -\tfrac12 \\[2pt] -\tfrac12 & \tfrac12 \end{bmatrix}, and so
\varepsilon(\rho_{+}) = (1-p)\begin{bmatrix} \tfrac12 & \tfrac12 \\[2pt] \tfrac12 & \tfrac12 \end{bmatrix} + p\begin{bmatrix} \tfrac12 & -\tfrac12 \\[2pt] -\tfrac12 & \tfrac12 \end{bmatrix} = \begin{bmatrix} \tfrac12 & \tfrac12(1-2p) \\[2pt] \tfrac12(1-2p) & \tfrac12 \end{bmatrix}.
The diagonal is untouched — measuring in the computational basis still gives 50/50 — but the
off-diagonal coherences have shrunk by a factor of (1-2p).
At p = \tfrac12 they vanish entirely and |{+}\rangle
has decayed to the useless maximally mixed \tfrac{I}{2}. This is
dephasing written as one channel — the quiet death of a superposition.
Step back and admire what the operator sum buys you. Bit-flips, phase-flips, depolarization,
T_1 decay, and every exotic noise process in between are all the
same object — a channel \varepsilon(\rho) = \sum_k K_k\rho K_k^\dagger
— differing only in the short list of matrices K_k. You never need a bespoke
formalism per error; you just write down the Kraus operators and turn the crank. That uniformity is what
makes quantum error correction tractable: a
code only has to defend against the Kraus operators of the channel, and since the standard channels'
operators are built from Paulis, correcting bit-flips and phase-flips turns out to correct
everything — a result known as the discretization of errors.
Summary
- an open system evolves by a quantum channel
\varepsilon: a CPTP map (completely positive,
trace-preserving) on density matrices — strictly more general than a unitary;
- every channel has a Kraus / operator-sum form
\varepsilon(\rho) = \sum_k K_k\,\rho\,K_k^\dagger with the
completeness condition \sum_k K_k^\dagger K_k = I;
- each K_k is an error branch occurring with probability
p_k = \operatorname{Tr}(K_k\rho K_k^\dagger), and completeness makes the
p_k sum to one;
- standard channels: bit-flip \{\sqrt{1-p}\,I, \sqrt{p}\,X\},
phase-flip \{\sqrt{1-p}\,I, \sqrt{p}\,Z\},
depolarizing (Paulis), and amplitude damping for
T_1 decay;
- noise turns pure states mixed and shrinks coherences — the phase-flip scales them by
(1-2p).
Two traps. First, Kraus operators are not unique: many different operator sets give the
exact same channel. Any unitary mixing K'_j = \sum_k u_{jk} K_k (with
u unitary) produces an identical map, and you can even change the
number of operators. So never speak of "the" Kraus operators of a channel as if they were
physical — the channel \varepsilon is what is real; a Kraus set is just one
way to write it down.
Second, do not confuse a channel with "apply a random Pauli." The completeness condition
\sum_k K_k^\dagger K_k = I is what keeps probabilities normalised — drop it
and you no longer have a valid channel. And while bit-flip, phase-flip, and depolarizing are
random-Pauli channels, a general channel is more powerful: amplitude damping has a
non-unitary K_1 = \begin{bmatrix} 0 & \sqrt{\gamma} \\ 0 & 0 \end{bmatrix}
that no probabilistic mixture of Paulis can reproduce. "Random Pauli" is a special case, not the whole
story.