The Partition Function

You have already met the Boltzmann distribution: a system in contact with a heat bath at temperature T visits a microstate of energy E_i with a weight proportional to the Boltzmann factor e^{-E_i/kT}. That single exponential is enough to compare states — a state at twice the energy gap is e^{-\varepsilon/kT} times as likely as the ground state, and so on. But notice what it does not give you: an absolute probability. "Proportional to" is a promise with a missing constant. To turn the weights into genuine probabilities that add up to one, you must divide by their sum.

That sum has a name — the partition function, written Z (from the German Zustandssumme, "sum over states"):

Z = \sum_{i}\, e^{-E_i/kT} = \sum_i e^{-\beta E_i}, \qquad \beta \equiv \frac{1}{kT}.

At first glance Z looks like a mere bookkeeping device, the boring denominator that makes probabilities normalise. It is the single most important object in statistical mechanics. The astonishing fact this page is built around is that Z knows everything: once you can write down Z(T,V,N) for a system, every thermodynamic quantity you could want — its energy, its entropy, its pressure, its heat capacity, its free energy, even the size of its fluctuations — falls out by differentiation. No new physics, no extra measurements: just calculus applied to one function. The whole thermodynamics of a system is folded up inside Z, and the derivatives unfold it.

The definition, carefully

The sum runs over every microstate i the system can occupy — not over energy values, but over states. That distinction matters the moment two or more states share the same energy. If an energy level E_j has degeneracy g_j (that many distinct states at that one energy), you may group the sum by level instead of by state:

Z = \sum_{\text{states }i} e^{-\beta E_i} = \sum_{\text{levels }j} g_j\, e^{-\beta E_j}.

Each level simply contributes its Boltzmann factor g_j times over. Forgetting those g_j is the single most common slip in a partition-function calculation.

Once you have Z, the absolute probability of finding the system in a particular microstate i is the Boltzmann weight divided by the total:

Probability of a microstate. P_i = \dfrac{e^{-\beta E_i}}{Z}. The denominator Z is exactly what guarantees \sum_i P_i = 1.
Probability of a level. With degeneracy, P_j = \dfrac{g_j\, e^{-\beta E_j}}{Z}.

A quick sanity check: at very high temperature (\beta \to 0) every Boltzmann factor tends to 1, so Z counts the accessible states and each becomes equally likely. At very low temperature (\beta \to \infty) only the ground-state term survives, and the system freezes into its lowest level. Z interpolates smoothly between "count the states" and "occupy the ground state".

Z knows everything: extracting the thermodynamics

Here is why Z is the master key. Start with the mean energy — the average of E_i weighted by P_i:

\langle E\rangle = \sum_i E_i\, P_i = \frac{1}{Z}\sum_i E_i\, e^{-\beta E_i}.

Now look at the derivative of Z with respect to \beta. Differentiating e^{-\beta E_i} pulls down a factor -E_i, so

\frac{\partial Z}{\partial \beta} = -\sum_i E_i\, e^{-\beta E_i} = -Z\,\langle E\rangle.

Divide by Z and recognise the left-hand side as a logarithmic derivative. The mean energy is just the \beta-slope of \ln Z:

Mean energy. \langle E\rangle = -\dfrac{\partial \ln Z}{\partial \beta} = kT^2\,\dfrac{\partial \ln Z}{\partial T}.
Energy fluctuations & heat capacity. \langle E^2\rangle - \langle E\rangle^2 = \dfrac{\partial^2 \ln Z}{\partial \beta^2}, and C_V = \dfrac{\partial \langle E\rangle}{\partial T} = \dfrac{1}{kT^2}\big(\langle E^2\rangle - \langle E\rangle^2\big).
Helmholtz free energy — the bridge to thermodynamics: F = -kT\ln Z.
Entropy & pressure, straight from F: S = -\dfrac{\partial F}{\partial T} = k\ln Z + \dfrac{\langle E\rangle}{T}, and p = -\dfrac{\partial F}{\partial V} = kT\,\dfrac{\partial \ln Z}{\partial V}.

Read that list again and appreciate the economy: one function, Z, and a handful of derivatives regenerate the entire thermodynamic apparatus. The free energy F = -kT\ln Z is the linchpin — it is the quantity that connects the microscopic sum to the macroscopic world, because in thermodynamics dF = -S\,dT - p\,dV, and those two partial derivatives are exactly the entropy and pressure written above.

The fluctuation formula deserves a second glance. It says the spread of the energy — how much the system's energy jitters about its mean — is the second \beta-derivative of \ln Z, and that same spread controls the heat capacity. Response (how much energy you must add to warm the system) and fluctuation (how much the energy wobbles on its own) are two faces of one derivative. That link is your first taste of the fluctuation–dissipation theorem.

Factorisation: why Z scales so kindly

Real systems have enormous numbers of particles, so summing over all microstates sounds hopeless. It usually isn't, thanks to one beautiful property. Suppose a system splits into independent parts, so its total energy is a sum E = E^{(1)} + E^{(2)} + \dots with no interaction terms. Then the exponential of a sum is a product of exponentials, and the sum over the whole factorises into a product of separate sums:

Z = \sum_{\text{all states}} e^{-\beta(E^{(1)} + E^{(2)} + \dots)} = \left(\sum_a e^{-\beta E^{(1)}_a}\right)\left(\sum_b e^{-\beta E^{(2)}_b}\right)\cdots = z_1\, z_2\cdots

Independent subsystems multiply their partition functions. In particular, for N identical, independent, distinguishable particles each with the same single-particle partition function z,

Z = z^N.

This is why \ln Z (and hence F, S, \langle E\rangle) is extensive — proportional to N — because \ln Z = N\ln z. You solve the physics for one particle and multiply.

One caveat, which the next course makes precise: if the particles are genuinely indistinguishable (identical atoms in a gas, say), counting each arrangement N! times overcounts the states, and you must divide it back out: Z = z^N/N!. That factor cures the classical "Gibbs paradox" of entropy — but that is a story for the ideal gas. Hold it as a forward pointer for now.

Worked example 1 — the two-level system

The cleanest possible example: a single object with just two states, a ground state at energy 0 and an excited state at energy \varepsilon. (Think of a spin in a magnetic field, or a two-state impurity in a crystal.) The single-particle partition function is a sum of just two terms:

z = e^{-\beta\cdot 0} + e^{-\beta\varepsilon} = 1 + e^{-\beta\varepsilon}.

Apply the master formula for the mean energy. First \ln z = \ln\!\big(1 + e^{-\beta\varepsilon}\big), then differentiate with respect to \beta:

\langle E\rangle = -\frac{\partial \ln z}{\partial \beta} = -\frac{-\varepsilon\, e^{-\beta\varepsilon}}{1 + e^{-\beta\varepsilon}} = \frac{\varepsilon\, e^{-\beta\varepsilon}}{1 + e^{-\beta\varepsilon}} = \frac{\varepsilon}{e^{\beta\varepsilon} + 1}.

Look at the limits. At low temperature (\beta\varepsilon \gg 1) the denominator is huge and \langle E\rangle \to 0: everything sits in the ground state. At high temperature (\beta\varepsilon \to 0) the denominator tends to 2 and \langle E\rangle \to \varepsilon/2: the two states are equally populated, so the average energy saturates at the midpoint. It cannot climb past \varepsilon/2 — a two-level system has a ceiling on the energy it can soak up.

Differentiate once more (with respect to T) to get the heat capacity, and you find the famous Schottky anomaly: a heat capacity that is zero at both low and high temperature and rises to a hump in between, peaking when kT is comparable to the gap (around kT \approx 0.42\,\varepsilon):

C_V = \frac{\partial \langle E\rangle}{\partial T} = k\,(\beta\varepsilon)^2\,\frac{e^{\beta\varepsilon}}{\big(e^{\beta\varepsilon} + 1\big)^2}.

Why the peak? At low T there is not enough thermal energy to promote anything, so adding heat changes nothing. At high T the level is already half full and cannot take more, so again adding heat barely raises the mean energy. Only in the crossover, when kT matches \varepsilon, does the population shift rapidly with temperature — and that is where the system drinks up heat. Drag the slider below to watch the hump slide as you change the gap.

Worked example 2 — the harmonic oscillator (a geometric-series trick)

A quantum harmonic oscillator has an infinite ladder of equally spaced levels, E_n = \left(n + \tfrac12\right)\hbar\omega for n = 0, 1, 2, \dots. An infinite sum sounds daunting — until you spot that it is a geometric series. Writing x = e^{-\beta\hbar\omega} and pulling out the zero-point piece:

z = \sum_{n=0}^{\infty} e^{-\beta(n+\frac12)\hbar\omega} = e^{-\beta\hbar\omega/2}\sum_{n=0}^{\infty} x^{n} = \frac{e^{-\beta\hbar\omega/2}}{1 - e^{-\beta\hbar\omega}}.

(If you measure energies from the ground state and drop the zero-point term, this is simply z = \dfrac{1}{1 - e^{-\beta\hbar\omega}}.) Feed it through \langle E\rangle = -\partial \ln z/\partial \beta and, after a short differentiation, the whole thing collapses to the celebrated Planck result:

\langle E\rangle = \frac{\hbar\omega}{2} + \frac{\hbar\omega}{e^{\beta\hbar\omega} - 1}.

The second term, \hbar\omega/(e^{\beta\hbar\omega}-1), is exactly the Bose–Einstein occupation that Planck was forced to introduce to explain blackbody radiation — here it drops out of one geometric series and one derivative. At high temperature it reduces to kT (the classical equipartition value); at low temperature it vanishes exponentially, leaving only the zero-point energy. A single partition function reproduces both the classical and the quantum regimes.

A concrete numerical check

Numbers make it stick. Take a small system with a three-level spectrum (in units where k = 1 and T = 1): a non-degenerate ground state at E = 0, a triply degenerate level at E = 1, and a doubly degenerate level at E = 2.5. The partition function is

Z = 1\cdot e^{0} + 3\cdot e^{-1} + 2\cdot e^{-2.5} \approx 1 + 1.104 + 0.164 = 2.268.

and the mean energy is the weighted sum of energies divided by Z:

\langle E\rangle = \frac{0 + 1\cdot 1.104 + 2.5\cdot 0.164}{2.268} \approx \frac{1.513}{2.268} \approx 0.667.

Notice how the degeneracies g_j = 1, 3, 2 pull weight toward the middle level even though it is not the lowest in energy — three doorways into a room make it easier to enter. Here is the same computation in code; press Run and change the spectrum to build intuition:

interface Level { E: number; g: number; } const kT = 1.0; const spectrum: Level[] = [ { E: 0.0, g: 1 }, // ground state, non-degenerate { E: 1.0, g: 3 }, // triply degenerate { E: 2.5, g: 2 }, // doubly degenerate ]; let Z = 0; let energySum = 0; for (const { E, g } of spectrum) { const weight = g * Math.exp(-E / kT); // g * Boltzmann factor Z += weight; energySum += E * weight; } console.log("Z =", Z.toFixed(4)); console.log("<E> =", (energySum / Z).toFixed(4)); console.log("P(ground) =", (1 / Z).toFixed(4));

Three slips catch almost everyone the first time. First, the minus sign and the variable: the mean energy is \langle E\rangle = -\partial \ln Z/\partial \beta, a derivative with respect to \beta = 1/kT, not with respect to T. If you differentiate with respect to T instead, you pick up an extra chain-rule factor and the sign flips — the correct temperature form is \langle E\rangle = +kT^2\,\partial \ln Z/\partial T. Keep the whole calculation in \beta and the signs look after themselves.

Second, states versus levels: the raw sum is over microstates, Z = \sum_i e^{-\beta E_i}. The moment you regroup by energy level you must insert the degeneracy, Z = \sum_j g_j e^{-\beta E_j}. Dropping g_j silently undercounts states and gives wrong probabilities every time.

Third, Z alone is not physical. Its numerical value depends on where you set the zero of energy (shift every E_i by a constant and Z just rescales by an overall factor), and it can even carry awkward dimensions in the classical continuum version. What carries physics is \ln Z and its derivatives — energy differences, entropy, heat capacity. Never quote "the value of Z" as if it meant something on its own.

The symbol Z is a fossil of its German origin: Zustandssumme, literally "sum over states". That name is far more descriptive than the English "partition function", which came from the way Z partitions, or apportions, the total probability among the available states. The German is worth remembering because it tells you exactly what the object is — you are summing a weight over every state the system can be in.

There is a lovely deeper fact hiding here. Because Z = \sum_i e^{-\beta E_i} is essentially a Laplace transform of the density of states, differentiating it with respect to \beta brings down powers of the energy — one derivative gives you \langle E\rangle, two derivatives give you the variance \langle E^2\rangle - \langle E\rangle^2, three the skewness, and so on. \ln Z is what a mathematician would call a cumulant-generating function: it secretly encodes not just the average energy but the full statistics of every fluctuation. In the classical continuum limit the same object becomes an integral over phase space, Z = \dfrac{1}{h^{3N} N!}\displaystyle\int e^{-\beta H(\mathbf{p},\mathbf{q})}\, d^{3N}p\; d^{3N}q, with Planck's constant setting the size of a "cell" of states and the N! fixing the indistinguishability overcount. Same idea, sum turned into integral.