Quantum Channels and Kraus Operators

In an ideal, perfectly isolated quantum computer, a state evolves by a unitary: |\psi\rangle \mapsto U|\psi\rangle, or in density-matrix language \rho \mapsto U\rho\,U^\dagger. Unitaries are reversible, they preserve purity, and nothing leaks out. That is the physics of a closed system.

Real qubits are not closed. As we saw with decoherence and noise, a qubit is forever whispering to its environment — stray photons, phonons, fluctuating fields. The qubit plus its environment evolve unitarily together, but the qubit alone does not. When we trace the environment away, what is left is a more general kind of evolution — one that can turn a pure state mixed, shrink coherences, and inject randomness. That evolution is a quantum channel, and this page gives you the one compact formula that captures every kind of noise at once.

What a quantum channel is

A quantum channel is a map \varepsilon that takes a density matrix in and returns a density matrix out, \rho \mapsto \varepsilon(\rho). To be a legal physical process it must satisfy three conditions — together abbreviated CPTP:

Linear — \varepsilon(a\rho_1 + b\rho_2) = a\,\varepsilon(\rho_1) + b\,\varepsilon(\rho_2), so mixtures map to mixtures.
Completely positive (CP) — it keeps every density matrix positive semidefinite (no negative probabilities), and crucially it does so even when the qubit is entangled with a bystander. "Completely" is the extra insurance that applying the channel to one half of an entangled pair still yields a valid joint state.
Trace-preserving (TP) — \operatorname{Tr}\big(\varepsilon(\rho)\big) = 1, so probabilities still sum to one; nothing is lost.

A unitary \rho \mapsto U\rho\,U^\dagger is the simplest special case of a channel. But channels are strictly more general: they describe irreversible, noisy evolution that no single unitary ever could.

The Kraus (operator-sum) representation

The three CPTP words sound abstract — but there is a beautiful theorem that turns them into concrete arithmetic. Every quantum channel can be written as a sum of "sandwich" terms:

\varepsilon(\rho) = \sum_k K_k\,\rho\,K_k^\dagger, \qquad \text{with}\quad \sum_k K_k^\dagger K_k = I.

The matrices K_k are the Kraus operators (or operation elements), and this is the operator-sum representation. The constraint \sum_k K_k^\dagger K_k = I is the completeness condition — it is exactly what makes the map trace-preserving. A single unitary is the one-operator case, K_0 = U (and indeed U^\dagger U = I).

Here is the physical picture that makes Kraus operators unforgettable. Read each K_k as a possible error branch: with probability

p_k = \operatorname{Tr}\!\big(K_k\,\rho\,K_k^\dagger\big)

the environment "chooses" branch k, and conditioned on that, the qubit is transformed to the normalised state K_k\rho K_k^\dagger / p_k. Because we do not know which branch actually happened, the output is the probability-weighted mixture of all of them — which is precisely \sum_k K_k\rho K_k^\dagger. The completeness condition \sum_k K_k^\dagger K_k = I guarantees \sum_k p_k = 1: some branch always happens.

A channel is a tree of error branches

The clearest way to hold the operator sum in your head is a branching tree. The input \rho splits into one branch per Kraus operator; each branch carries a weight p_k = \operatorname{Tr}(K_k\rho K_k^\dagger); and the channel's output is the recombined mixture. Below is the two-branch tree for the bit-flip channel — step through its branches:

Either nothing happens (the identity branch, weight 1-p) or an X error strikes (weight p). Every named noise process below is just a different tree — different operators hanging off \rho.

Worked example 1: the bit-flip channel on |0\rangle

The bit-flip channel applies an X with probability p and does nothing with probability 1-p. Its Kraus operators are

K_0 = \sqrt{1-p}\;I, \qquad K_1 = \sqrt{p}\;X.

First check completeness: K_0^\dagger K_0 + K_1^\dagger K_1 = (1-p)I + p\,X^\dagger X = (1-p)I + pI = I, since X^\dagger X = I. Good. Now feed in \rho = |0\rangle\langle0| = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}:

\varepsilon(\rho) = (1-p)\,I\,\rho\,I + p\,X\,\rho\,X = (1-p)\,|0\rangle\langle0| + p\,X|0\rangle\langle0|X.

Because X|0\rangle = |1\rangle, the second term is p\,|1\rangle\langle1|, so

\varepsilon\big(|0\rangle\langle0|\big) = (1-p)\,|0\rangle\langle0| + p\,|1\rangle\langle1| = \begin{bmatrix} 1-p & 0 \\ 0 & p \end{bmatrix}.

A pure |0\rangle has become a classical mixture: it reads 0 with probability 1-p and 1 with probability p — exactly a noisy bit.

The standard single-qubit channels

Four channels appear again and again. Each is defined by a short list of Kraus operators — commit these to memory and you can model almost any single-qubit noise:

Channel	Kraus operators	Models
Bit-flip	K_0 = \sqrt{1-p}\,I,\quad K_1 = \sqrt{p}\,X	random X errors
Phase-flip	K_0 = \sqrt{1-p}\,I,\quad K_1 = \sqrt{p}\,Z	random Z errors (dephasing)
Depolarizing	\sqrt{1-\tfrac{3p}{4}}\,I,\;\; \sqrt{\tfrac{p}{4}}\,X,\;\; \sqrt{\tfrac{p}{4}}\,Y,\;\; \sqrt{\tfrac{p}{4}}\,Z	uniform noise, \rho \to (1-p)\rho + p\,\tfrac{I}{2}
Amplitude damping	K_0 = \begin{bmatrix} 1 & 0 \\ 0 & \sqrt{1-\gamma} \end{bmatrix},\quad K_1 = \begin{bmatrix} 0 & \sqrt{\gamma} \\ 0 & 0 \end{bmatrix}	T_1 energy decay \|1\rangle \to \|0\rangle

The depolarizing channel replaces the state with the maximally mixed \tfrac{I}{2} with probability p; using the identity \tfrac{I}{2} = \tfrac14\big(\rho + X\rho X + Y\rho Y + Z\rho Z\big) gives the symmetric Pauli Kraus set above. Amplitude damping is different in character: it is not a random Pauli at all. Its K_1 irreversibly drops |1\rangle down to |0\rangle — the qubit emitting a photon and relaxing to its ground state — which is why it, and not the flips, models T_1 decay. (Check its completeness: K_0^\dagger K_0 + K_1^\dagger K_1 = \operatorname{diag}(1, 1-\gamma) + \operatorname{diag}(0, \gamma) = I.)

Worked example 2: the phase-flip channel eats coherence

The phase-flip channel has Kraus operators K_0 = \sqrt{1-p}\,I and K_1 = \sqrt{p}\,Z, so \varepsilon(\rho) = (1-p)\rho + p\,Z\rho Z. Apply it to |{+}\rangle = \tfrac{1}{\sqrt2}(|0\rangle + |1\rangle), whose density matrix is \rho_{+} = \begin{bmatrix} \tfrac12 & \tfrac12 \\[2pt] \tfrac12 & \tfrac12 \end{bmatrix}. Since Z|{+}\rangle = |{-}\rangle, we have Z\rho_{+}Z = |{-}\rangle\langle{-}| = \begin{bmatrix} \tfrac12 & -\tfrac12 \\[2pt] -\tfrac12 & \tfrac12 \end{bmatrix}, and so

\varepsilon(\rho_{+}) = (1-p)\begin{bmatrix} \tfrac12 & \tfrac12 \\[2pt] \tfrac12 & \tfrac12 \end{bmatrix} + p\begin{bmatrix} \tfrac12 & -\tfrac12 \\[2pt] -\tfrac12 & \tfrac12 \end{bmatrix} = \begin{bmatrix} \tfrac12 & \tfrac12(1-2p) \\[2pt] \tfrac12(1-2p) & \tfrac12 \end{bmatrix}.

The diagonal is untouched — measuring in the computational basis still gives 50/50 — but the off-diagonal coherences have shrunk by a factor of (1-2p). At p = \tfrac12 they vanish entirely and |{+}\rangle has decayed to the useless maximally mixed \tfrac{I}{2}. This is dephasing written as one channel — the quiet death of a superposition.

Step back and admire what the operator sum buys you. Bit-flips, phase-flips, depolarization, T_1 decay, and every exotic noise process in between are all the same object — a channel \varepsilon(\rho) = \sum_k K_k\rho K_k^\dagger — differing only in the short list of matrices K_k. You never need a bespoke formalism per error; you just write down the Kraus operators and turn the crank. That uniformity is what makes quantum error correction tractable: a code only has to defend against the Kraus operators of the channel, and since the standard channels' operators are built from Paulis, correcting bit-flips and phase-flips turns out to correct everything — a result known as the discretization of errors.

Summary

an open system evolves by a quantum channel \varepsilon: a CPTP map (completely positive, trace-preserving) on density matrices — strictly more general than a unitary;
every channel has a Kraus / operator-sum form \varepsilon(\rho) = \sum_k K_k\,\rho\,K_k^\dagger with the completeness condition \sum_k K_k^\dagger K_k = I;
each K_k is an error branch occurring with probability p_k = \operatorname{Tr}(K_k\rho K_k^\dagger), and completeness makes the p_k sum to one;
standard channels: bit-flip \{\sqrt{1-p}\,I, \sqrt{p}\,X\}, phase-flip \{\sqrt{1-p}\,I, \sqrt{p}\,Z\}, depolarizing (Paulis), and amplitude damping for T_1 decay;
noise turns pure states mixed and shrinks coherences — the phase-flip scales them by (1-2p).

Two traps. First, Kraus operators are not unique: many different operator sets give the exact same channel. Any unitary mixing K'_j = \sum_k u_{jk} K_k (with u unitary) produces an identical map, and you can even change the number of operators. So never speak of "the" Kraus operators of a channel as if they were physical — the channel \varepsilon is what is real; a Kraus set is just one way to write it down.

Second, do not confuse a channel with "apply a random Pauli." The completeness condition \sum_k K_k^\dagger K_k = I is what keeps probabilities normalised — drop it and you no longer have a valid channel. And while bit-flip, phase-flip, and depolarizing are random-Pauli channels, a general channel is more powerful: amplitude damping has a non-unitary K_1 = \begin{bmatrix} 0 & \sqrt{\gamma} \\ 0 & 0 \end{bmatrix} that no probabilistic mixture of Paulis can reproduce. "Random Pauli" is a special case, not the whole story.