The Partition Function
You have already met the Boltzmann
distribution: a system in contact with a heat bath at temperature
T visits a microstate of energy E_i with a weight
proportional to the Boltzmann factor
e^{-E_i/kT}. That single exponential is enough to compare states —
a state at twice the energy gap is e^{-\varepsilon/kT} times as likely as the
ground state, and so on. But notice what it does not give you: an absolute
probability. "Proportional to" is a promise with a missing constant. To turn the weights into genuine
probabilities that add up to one, you must divide by their sum.
That sum has a name — the partition function, written
Z (from the German Zustandssumme, "sum over states"):
Z = \sum_{i}\, e^{-E_i/kT} = \sum_i e^{-\beta E_i}, \qquad \beta \equiv \frac{1}{kT}.
At first glance Z looks like a mere bookkeeping device, the boring denominator
that makes probabilities normalise. It is the single most important object in statistical mechanics.
The astonishing fact this page is built around is that Z knows
everything: once you can write down Z(T,V,N) for a system, every
thermodynamic quantity you could want — its energy, its entropy, its pressure, its heat capacity, its
free energy, even the size of its fluctuations — falls out by differentiation. No new
physics, no extra measurements: just calculus applied to one function. The whole thermodynamics of a
system is folded up inside Z, and the derivatives unfold it.
The definition, carefully
The sum runs over every microstate i the system can occupy —
not over energy values, but over states. That distinction matters the moment two or more states share the
same energy. If an energy level E_j has degeneracy
g_j (that many distinct states at that one energy), you may group the sum by
level instead of by state:
Z = \sum_{\text{states }i} e^{-\beta E_i} = \sum_{\text{levels }j} g_j\, e^{-\beta E_j}.
Each level simply contributes its Boltzmann factor g_j times over. Forgetting
those g_j is the single most common slip in a partition-function calculation.
Once you have Z, the absolute probability of finding the system in a particular
microstate i is the Boltzmann weight divided by the total:
-
Probability of a microstate.
P_i = \dfrac{e^{-\beta E_i}}{Z}.
The denominator Z is exactly what guarantees
\sum_i P_i = 1.
-
Probability of a level. With degeneracy,
P_j = \dfrac{g_j\, e^{-\beta E_j}}{Z}.
A quick sanity check: at very high temperature (\beta \to 0) every Boltzmann
factor tends to 1, so Z counts the accessible states
and each becomes equally likely. At very low temperature (\beta \to \infty) only
the ground-state term survives, and the system freezes into its lowest level. Z
interpolates smoothly between "count the states" and "occupy the ground state".
Z knows everything: extracting the thermodynamics
Here is why Z is the master key. Start with the mean energy — the average of
E_i weighted by P_i:
\langle E\rangle = \sum_i E_i\, P_i = \frac{1}{Z}\sum_i E_i\, e^{-\beta E_i}.
Now look at the derivative of Z with respect to \beta.
Differentiating e^{-\beta E_i} pulls down a factor -E_i,
so
\frac{\partial Z}{\partial \beta} = -\sum_i E_i\, e^{-\beta E_i} = -Z\,\langle E\rangle.
Divide by Z and recognise the left-hand side as a logarithmic derivative. The
mean energy is just the \beta-slope of \ln Z:
-
Mean energy.
\langle E\rangle = -\dfrac{\partial \ln Z}{\partial \beta} = kT^2\,\dfrac{\partial \ln Z}{\partial T}.
-
Energy fluctuations & heat capacity.
\langle E^2\rangle - \langle E\rangle^2 = \dfrac{\partial^2 \ln Z}{\partial \beta^2},
and C_V = \dfrac{\partial \langle E\rangle}{\partial T} = \dfrac{1}{kT^2}\big(\langle E^2\rangle - \langle E\rangle^2\big).
-
Helmholtz free energy — the bridge to thermodynamics:
F = -kT\ln Z.
-
Entropy & pressure, straight from F:
S = -\dfrac{\partial F}{\partial T} = k\ln Z + \dfrac{\langle E\rangle}{T},
and p = -\dfrac{\partial F}{\partial V} = kT\,\dfrac{\partial \ln Z}{\partial V}.
Read that list again and appreciate the economy: one function, Z, and a handful
of derivatives regenerate the entire thermodynamic apparatus. The free energy
F = -kT\ln Z is the linchpin — it is the quantity that connects the microscopic
sum to the macroscopic world, because in thermodynamics dF = -S\,dT - p\,dV, and
those two partial derivatives are exactly the entropy and pressure written above.
The fluctuation formula deserves a second glance. It says the spread of the energy — how much the
system's energy jitters about its mean — is the second \beta-derivative of
\ln Z, and that same spread controls the heat capacity. Response (how much energy
you must add to warm the system) and fluctuation (how much the energy wobbles on its own) are two faces of
one derivative. That link is your first taste of the fluctuation–dissipation theorem.
Factorisation: why Z scales so kindly
Real systems have enormous numbers of particles, so summing over all microstates sounds hopeless.
It usually isn't, thanks to one beautiful property. Suppose a system splits into independent
parts, so its total energy is a sum E = E^{(1)} + E^{(2)} + \dots with no
interaction terms. Then the exponential of a sum is a product of exponentials, and the sum over the whole
factorises into a product of separate sums:
Z = \sum_{\text{all states}} e^{-\beta(E^{(1)} + E^{(2)} + \dots)} = \left(\sum_a e^{-\beta E^{(1)}_a}\right)\left(\sum_b e^{-\beta E^{(2)}_b}\right)\cdots = z_1\, z_2\cdots
Independent subsystems multiply their partition functions. In particular, for
N identical, independent, distinguishable particles each with the same
single-particle partition function z,
Z = z^N.
This is why \ln Z (and hence F, S,
\langle E\rangle) is extensive — proportional to
N — because \ln Z = N\ln z. You solve the physics for
one particle and multiply.
One caveat, which the next course makes precise: if the particles are genuinely
indistinguishable (identical atoms in a gas, say), counting each arrangement
N! times overcounts the states, and you must divide it back out:
Z = z^N/N!. That factor cures the classical "Gibbs paradox" of entropy — but that
is a story for the ideal gas. Hold it as a forward pointer for now.
Worked example 1 — the two-level system
The cleanest possible example: a single object with just two states, a ground state at energy
0 and an excited state at energy \varepsilon. (Think of a
spin in a magnetic field, or a two-state impurity in a crystal.) The single-particle partition function is a
sum of just two terms:
z = e^{-\beta\cdot 0} + e^{-\beta\varepsilon} = 1 + e^{-\beta\varepsilon}.
Apply the master formula for the mean energy. First \ln z = \ln\!\big(1 + e^{-\beta\varepsilon}\big),
then differentiate with respect to \beta:
\langle E\rangle = -\frac{\partial \ln z}{\partial \beta} = -\frac{-\varepsilon\, e^{-\beta\varepsilon}}{1 + e^{-\beta\varepsilon}} = \frac{\varepsilon\, e^{-\beta\varepsilon}}{1 + e^{-\beta\varepsilon}} = \frac{\varepsilon}{e^{\beta\varepsilon} + 1}.
Look at the limits. At low temperature (\beta\varepsilon \gg 1) the denominator is
huge and \langle E\rangle \to 0: everything sits in the ground state. At high
temperature (\beta\varepsilon \to 0) the denominator tends to
2 and \langle E\rangle \to \varepsilon/2: the two states
are equally populated, so the average energy saturates at the midpoint. It cannot climb past
\varepsilon/2 — a two-level system has a ceiling on the energy it can soak up.
Differentiate once more (with respect to T) to get the heat capacity, and you find
the famous Schottky anomaly: a heat capacity that is zero at both low and high
temperature and rises to a hump in between, peaking when kT is comparable to the gap
(around kT \approx 0.42\,\varepsilon):
C_V = \frac{\partial \langle E\rangle}{\partial T} = k\,(\beta\varepsilon)^2\,\frac{e^{\beta\varepsilon}}{\big(e^{\beta\varepsilon} + 1\big)^2}.
Why the peak? At low T there is not enough thermal energy to promote anything, so
adding heat changes nothing. At high T the level is already half full and cannot
take more, so again adding heat barely raises the mean energy. Only in the crossover, when
kT matches \varepsilon, does the population shift rapidly
with temperature — and that is where the system drinks up heat. Drag the slider below to watch the hump slide
as you change the gap.
Worked example 2 — the harmonic oscillator (a geometric-series trick)
A quantum harmonic oscillator has an infinite ladder of equally spaced levels,
E_n = \left(n + \tfrac12\right)\hbar\omega for n = 0, 1, 2, \dots.
An infinite sum sounds daunting — until you spot that it is a geometric series. Writing
x = e^{-\beta\hbar\omega} and pulling out the zero-point piece:
z = \sum_{n=0}^{\infty} e^{-\beta(n+\frac12)\hbar\omega} = e^{-\beta\hbar\omega/2}\sum_{n=0}^{\infty} x^{n} = \frac{e^{-\beta\hbar\omega/2}}{1 - e^{-\beta\hbar\omega}}.
(If you measure energies from the ground state and drop the zero-point term, this is simply
z = \dfrac{1}{1 - e^{-\beta\hbar\omega}}.) Feed it through
\langle E\rangle = -\partial \ln z/\partial \beta and, after a short differentiation,
the whole thing collapses to the celebrated Planck result:
\langle E\rangle = \frac{\hbar\omega}{2} + \frac{\hbar\omega}{e^{\beta\hbar\omega} - 1}.
The second term, \hbar\omega/(e^{\beta\hbar\omega}-1), is exactly the Bose–Einstein
occupation that Planck was forced to introduce to explain blackbody radiation — here it drops out of one
geometric series and one derivative. At high temperature it reduces to kT (the
classical equipartition value); at low temperature it vanishes exponentially, leaving only the zero-point
energy. A single partition function reproduces both the classical and the quantum regimes.
A concrete numerical check
Numbers make it stick. Take a small system with a three-level spectrum (in units where
k = 1 and T = 1): a non-degenerate ground state at
E = 0, a triply degenerate level at E = 1, and a doubly
degenerate level at E = 2.5. The partition function is
Z = 1\cdot e^{0} + 3\cdot e^{-1} + 2\cdot e^{-2.5} \approx 1 + 1.104 + 0.164 = 2.268.
and the mean energy is the weighted sum of energies divided by Z:
\langle E\rangle = \frac{0 + 1\cdot 1.104 + 2.5\cdot 0.164}{2.268} \approx \frac{1.513}{2.268} \approx 0.667.
Notice how the degeneracies g_j = 1, 3, 2 pull weight toward the middle level even
though it is not the lowest in energy — three doorways into a room make it easier to enter. Here is the same
computation in code; press Run and change the spectrum to build intuition:
interface Level { E: number; g: number; }
const kT = 1.0;
const spectrum: Level[] = [
{ E: 0.0, g: 1 }, // ground state, non-degenerate
{ E: 1.0, g: 3 }, // triply degenerate
{ E: 2.5, g: 2 }, // doubly degenerate
];
let Z = 0;
let energySum = 0;
for (const { E, g } of spectrum) {
const weight = g * Math.exp(-E / kT); // g * Boltzmann factor
Z += weight;
energySum += E * weight;
}
console.log("Z =", Z.toFixed(4));
console.log("<E> =", (energySum / Z).toFixed(4));
console.log("P(ground) =", (1 / Z).toFixed(4));
Three slips catch almost everyone the first time. First, the minus sign and the variable:
the mean energy is \langle E\rangle = -\partial \ln Z/\partial \beta, a derivative
with respect to \beta = 1/kT, not with respect to
T. If you differentiate with respect to T instead, you
pick up an extra chain-rule factor and the sign flips — the correct temperature form is
\langle E\rangle = +kT^2\,\partial \ln Z/\partial T. Keep the whole calculation in
\beta and the signs look after themselves.
Second, states versus levels: the raw sum is over microstates,
Z = \sum_i e^{-\beta E_i}. The moment you regroup by energy level you
must insert the degeneracy, Z = \sum_j g_j e^{-\beta E_j}. Dropping
g_j silently undercounts states and gives wrong probabilities every time.
Third, Z alone is not physical. Its numerical value depends on
where you set the zero of energy (shift every E_i by a constant and
Z just rescales by an overall factor), and it can even carry awkward dimensions in
the classical continuum version. What carries physics is \ln Z and its
derivatives — energy differences, entropy, heat capacity. Never quote "the value of
Z" as if it meant something on its own.
The symbol Z is a fossil of its German origin: Zustandssumme, literally
"sum over states". That name is far more descriptive than the English "partition function", which came from
the way Z partitions, or apportions, the total probability among the available
states. The German is worth remembering because it tells you exactly what the object is — you are
summing a weight over every state the system can be in.
There is a lovely deeper fact hiding here. Because Z = \sum_i e^{-\beta E_i} is
essentially a Laplace transform of the density of states, differentiating it with respect to
\beta brings down powers of the energy — one derivative gives you
\langle E\rangle, two derivatives give you the variance
\langle E^2\rangle - \langle E\rangle^2, three the skewness, and so on.
\ln Z is what a mathematician would call a cumulant-generating
function: it secretly encodes not just the average energy but the full statistics of every
fluctuation. In the classical continuum limit the same object becomes an integral over phase space,
Z = \dfrac{1}{h^{3N} N!}\displaystyle\int e^{-\beta H(\mathbf{p},\mathbf{q})}\, d^{3N}p\; d^{3N}q,
with Planck's constant setting the size of a "cell" of states and the N! fixing
the indistinguishability overcount. Same idea, sum turned into integral.