Time-Independent Perturbation Theory

Here is an awkward secret of quantum mechanics: almost no real Hamiltonian can be solved exactly. The Schrödinger equation yields clean answers for a handful of idealised systems — a particle in a box, a harmonic spring, a lone hydrogen atom — and then reality intrudes. Put that hydrogen atom in a weak electric field, add a magnetic field, let the spring stiffen slightly at large stretch, and the tidy eigenvalue equation stops having a closed-form solution. Nature is almost never one of the solvable problems.

But it is very often a solvable problem plus a small nudge. The atom in the field is just the free atom with a little extra energy; the stiffening spring is the perfect oscillator with a small correction. Perturbation theory is the art of trading a problem you cannot solve for one you can, plus a power series of corrections you can compute by hand. It turns "unsolvable" into "solvable to as many decimal places as you have patience for" — and it is the single most-used approximation method in all of quantum physics.

The setup: a solvable Hamiltonian plus a small term

Split the Hamiltonian into a part you have already diagonalised and a small extra piece:

\hat{H} = \hat{H}_0 + \lambda \hat{H}',

where \hat{H}_0 is the unperturbed Hamiltonian whose problem is completely solved,

\hat{H}_0\,|n^{(0)}\rangle = E_n^{(0)}\,|n^{(0)}\rangle,

and \lambda\hat{H}' is the small perturbation. The bookkeeping parameter \lambda (between 0 and 1) tracks the "size" of the nudge: at \lambda = 0 we have the solved problem back, and at \lambda = 1 we have the full one. The whole method rests on one hopeful but usually reasonable assumption: that the true energies and states are smooth power series in \lambda, so we can expand

E_n = E_n^{(0)} + \lambda E_n^{(1)} + \lambda^2 E_n^{(2)} + \cdots, \qquad |n\rangle = |n^{(0)}\rangle + \lambda\,|n^{(1)}\rangle + \lambda^2\,|n^{(2)}\rangle + \cdots.

The superscripts count the order: E_n^{(1)} is the first-order energy correction, E_n^{(2)} the second-order, and so on. Substitute these series into \hat{H}|n\rangle = E_n|n\rangle and collect terms with equal powers of \lambda — each power gives one equation, and out drop the corrections one order at a time. The result is a small toolkit of formulas, which we now assemble. This is the Rayleigh–Schrödinger scheme, and for now we assume the unperturbed levels are non-degenerate (each energy has one state); the degenerate case gets a warning of its own below.

First order: the energy correction is just an average

The very first correction is the one you will use most, and it is astonishingly cheap to compute. Collecting the terms linear in \lambda and projecting onto \langle n^{(0)}| gives the headline result of the entire subject:

E_n^{(1)} = \langle n^{(0)}|\,\hat{H}'\,|n^{(0)}\rangle.

Read that in words: the first-order shift of a level is simply the expectation value of the perturbation in the unperturbed state. You do not have to solve any new eigenvalue problem, or find any new wavefunction — you already have |n^{(0)}\rangle, so you just sandwich \hat{H}' between it and its conjugate and do an integral. For the cost of one expectation value you get the leading correction to the energy. This is why perturbation theory is a workhorse: the hardest-looking corrections often reduce to an average you can do on the back of an envelope.

The state also shifts. Projecting the same first-order equation onto a different unperturbed state \langle m^{(0)}| (with m \neq n) and solving gives the first-order correction to the ket as a weighted sum over all the other states:

|n^{(1)}\rangle = \sum_{m \neq n} \frac{\langle m^{(0)}|\,\hat{H}'\,|n^{(0)}\rangle}{E_n^{(0)} - E_m^{(0)}}\;|m^{(0)}\rangle.

Each other state |m^{(0)}\rangle gets mixed in by an amount set by two things: how strongly the perturbation connects it to our state (the matrix element \langle m^{(0)}|\hat{H}'|n^{(0)}\rangle in the numerator), and how close it is in energy (the gap E_n^{(0)} - E_m^{(0)} in the denominator). Nearby, strongly-coupled states mix in a lot; distant, weakly-coupled ones barely register. Notice the sum pointedly excludes m = n — the term that would divide by zero — and that exclusion is the source of a classic mistake we flag later.

Second order: states repel

First order is often enough, but sometimes it vanishes (if the perturbation has no diagonal part, E_n^{(1)} = 0) and you must go further. Continuing to the terms in \lambda^2 gives the second-order energy correction:

E_n^{(2)} = \sum_{m \neq n} \frac{\bigl|\langle m^{(0)}|\,\hat{H}'\,|n^{(0)}\rangle\bigr|^2}{E_n^{(0)} - E_m^{(0)}}.

Every numerator here is a squared magnitude, so it is never negative. That single fact has a beautiful consequence for the ground state. For the lowest level, every other state lies above it, so every denominator E_0^{(0)} - E_m^{(0)} is negative — which makes every term negative, and therefore E_0^{(2)} \leq 0 always. The second-order shift of the ground state is guaranteed to push it down.

There is a memorable way to say this: levels repel. When two states are coupled by a perturbation, second order pushes the lower one down and the upper one up, as if they were shoving each other apart. The closer they start (smaller gap) and the more strongly they couple (bigger matrix element), the harder they shove. You will see exactly this repulsion in the interactive figure below, where the true lower level bends away from a coupled partner.

When is a perturbation actually "small"?

Every formula above has an energy gap in a denominator, and that is where the method lives or dies. For the series to converge — for each correction to be smaller than the last — the perturbation must be weak compared with the spacing of the levels:

\bigl|\langle m^{(0)}|\hat{H}'|n^{(0)}\rangle\bigr| \;\ll\; \bigl|E_n^{(0)} - E_m^{(0)}\bigr| \qquad\text{for every } m \neq n.

"Small" is not an absolute statement about the perturbation — it is a comparison. A perturbation of a given strength can be perfectly negligible for well-separated levels and catastrophically large for levels that nearly touch. What matters is the dimensionless ratio of coupling to gap. When it is tiny, two or three orders of perturbation theory give you results good to many decimal places; when it approaches 1, the series limps or diverges and you need a different tool.

The most dramatic failure is degeneracy. If two unperturbed states share the same energy, E_n^{(0)} = E_m^{(0)}, the gap is zero and the formulas divide by zero — a nonsense infinity. The cure (developed on its own page) is degenerate perturbation theory: within the degenerate subspace you must first diagonalise \hat{H}' itself, choosing the "good" combinations of states that the perturbation does not mix, and only then apply the non-degenerate formulas to what remains. Keep that signpost in mind; the machinery on this page assumes every level stands alone.

Watch the approximation hug — then peel away

There is one problem simple enough to solve both exactly and by perturbation theory, so you can lay them side by side: a two-level system. Take

\hat{H} = \begin{pmatrix} E_1 & \lambda V \\ \lambda V & E_2 \end{pmatrix} = \underbrace{\begin{pmatrix} E_1 & 0 \\ 0 & E_2 \end{pmatrix}}_{\hat{H}_0} + \lambda\underbrace{\begin{pmatrix} 0 & V \\ V & 0 \end{pmatrix}}_{\hat{H}'}.

The unperturbed levels are just E_1 and E_2, and the perturbation is a pure off-diagonal coupling V. Because \hat{H}' has no diagonal part, the first-order shift vanishes, E_1^{(1)} = \langle 1|\hat{H}'|1\rangle = 0 — so the perturbative estimate of the lower level is flat to first order. The interesting physics is all at second order:

E_1^{(2)} = \frac{|\langle 2|\hat{H}'|1\rangle|^2}{E_1 - E_2} = \frac{V^2}{E_1 - E_2} \;\;\Rightarrow\;\; E_1 \approx E_1^{(0)} - \frac{(\lambda V)^2}{E_2 - E_1}.

Meanwhile this 2\times 2 can be diagonalised exactly; the lower eigenvalue is

E_- = \frac{E_1 + E_2}{2} - \sqrt{\left(\frac{E_1 - E_2}{2}\right)^{2} + (\lambda V)^2}.

Now compare the three curves below as you turn up the coupling \lambda along the horizontal axis. The flat line is the first-order estimate (no shift). The downward parabola is the second-order estimate. And the smooth curve is the exact truth. For small \lambda the parabola hugs the exact curve beautifully — that is perturbation theory working. As \lambda grows, the exact curve levels off (a square root flattens) while the parabola keeps diving, and the two peel apart. That divergence is the series breaking down, and you can trigger it sooner by shrinking the gap \Delta or raising the coupling V with the sliders — exactly the validity condition made visible.

This one picture is the whole method in miniature: a cheap approximation that is excellent while the perturbation is small and honest about failing when it is not.

Where this shows up: Stark, Zeeman, and a stubborn spring

Perturbation theory is not a toy — it is how much of atomic physics is actually computed. Three canonical examples:

In each case the recipe is identical: identify the solvable \hat{H}_0, write the nudge as \hat{H}', and turn the crank. The physics — polarisation, line-splitting, anharmonicity — falls out as an expectation value.

Worked examples

Example 1 — a constant push in a box. Take the infinite square well and add a small constant potential \hat{H}' = V_0 everywhere inside (a uniform energy offset). The first-order shift of any level is

E_n^{(1)} = \langle n^{(0)}|V_0|n^{(0)}\rangle = V_0\,\langle n^{(0)}|n^{(0)}\rangle = V_0,

using normalisation \langle n^{(0)}|n^{(0)}\rangle = 1. Every level simply rises by V_0 — which is exactly right, since adding a constant to the Hamiltonian shifts all energies by that constant and changes no wavefunction. Perturbation theory reproduces the obvious answer, a good sanity check.

Example 2 — second order in the two-level system. For the coupled pair \hat{H}_0 = \mathrm{diag}(E_1, E_2) with off-diagonal coupling V, the only "other" state for level 1 is level 2, so the second-order sum has a single term:

E_1^{(2)} = \frac{|\langle 2|\hat{H}'|1\rangle|^2}{E_1 - E_2} = \frac{V^2}{E_1 - E_2} = -\,\frac{V^2}{E_2 - E_1}.

With E_2 > E_1 this is negative — the lower level is pushed down, the repulsion promised above. Expand the exact E_- for small V using \sqrt{a^2 + \epsilon} \approx a + \epsilon/2a and you recover precisely this term — perturbation theory is the Taylor expansion of the exact answer.

Example 3 — reading a validity ratio. Suppose two levels sit at E_1 = 2 and E_2 = 10 (gap \Delta = 8) and are coupled by a matrix element of size |V| = 1. The smallness ratio is |V|/\Delta = 1/8 = 0.125 — comfortably below 1, so two orders of perturbation theory will be excellent. Now push the upper level down to E_2 = 3: the gap collapses to 1, the ratio becomes 1, and the series is in serious trouble even though the coupling V never changed. Smallness is always relative to the gap.

Suppose two unperturbed states |a\rangle and |b\rangle share the same energy, E_a^{(0)} = E_b^{(0)}. Try the state-correction formula and you hit a term with denominator E_a^{(0)} - E_b^{(0)} = 0 — division by zero, an infinite "correction." The formula has not just become inaccurate; it has become meaningless.

The deeper reason is that when levels are degenerate, the unperturbed problem doesn't tell you which basis to use — any combination of |a\rangle and |b\rangle is an equally good eigenstate of \hat{H}_0. The perturbation breaks the tie. Degenerate perturbation theory handles this by first building the little matrix of \hat{H}' within the degenerate subspace and diagonalising it; its eigenvectors are the "good" states the perturbation actually picks out, and its eigenvalues are the first-order shifts. Only after choosing that basis do the ordinary formulas make sense again. A zero in a denominator is nature's way of telling you to stop and diagonalise first.

The classic trap. Students write the first-order state correction |n^{(1)}\rangle = \sum_{m \neq n} \frac{\langle m^{(0)}|\hat{H}'|n^{(0)}\rangle}{E_n^{(0)} - E_m^{(0)}}\,|m^{(0)}\rangle and quietly let the sum include m = n — after all, isn't |n^{(0)}\rangle part of the answer? It is not in this sum. The m = n term would put the vanishing gap E_n^{(0)} - E_n^{(0)} = 0 in the denominator — an immediate blow-up. The sum runs strictly over m \neq n.

Where did the missing |n^{(0)}\rangle component go? It is fixed by normalisation, not by this formula. The standard convention (intermediate normalisation) sets \langle n^{(0)}|n^{(1)}\rangle = 0, so the correction has no component along the original state at first order — the perturbation tilts the state toward its neighbours, and how much of the original survives is settled separately by keeping the total length equal to one. Never divide by a zero gap; the excluded term is excluded for a reason.